Repository: aws/aws-neuron-sdk Branch: master Commit: 371eabc8a739 Files: 1636 Total size: 10.3 MB Directory structure: gitextract_u554eb9v/ ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ ├── bug-report.yml │ │ ├── config.yml │ │ ├── documentation.yml │ │ └── feature-request.yml │ ├── pull_request_template.md │ ├── stale_issue_mark_close_workflow.yml │ └── workflows/ │ ├── acknowledge-new-issue.yml │ └── auto-label-issues.yml ├── .gitignore ├── .readthedocs.yml ├── CODEOWNERS ├── CONTRIBUTING.md ├── Dockerfile ├── LICENSE-DOCUMENTATION ├── LICENSE-SAMPLECODE ├── LICENSE-SUMMARY-DOCS-SAMPLES ├── Makefile ├── README.md ├── _backup-setup/ │ └── neuron-setup/ │ ├── multiframework/ │ │ ├── multi-framework-ubuntu22-neuron-dlami.rst │ │ └── multi-framework-ubuntu24-neuron-dlami.rst │ └── pytorch/ │ ├── neuron/ │ │ ├── amazon-linux/ │ │ │ ├── torch-neuron-al2-base-dlami.rst │ │ │ ├── torch-neuron-al2-pytorch-dlami.rst │ │ │ ├── torch-neuron-al2.rst │ │ │ └── torch-neuron-al2023.rst │ │ └── ubuntu/ │ │ ├── torch-neuron-ubuntu20-base-dlami.rst │ │ ├── torch-neuron-ubuntu20-pytorch-dlami.rst │ │ ├── torch-neuron-ubuntu20.rst │ │ └── torch-neuron-ubuntu22.rst │ └── neuronx/ │ ├── amazon-linux/ │ │ ├── torch-neuronx-al2-base-dlami.rst │ │ ├── torch-neuronx-al2-pytorch-dlami.rst │ │ ├── torch-neuronx-al2.rst │ │ └── torch-neuronx-al2023.rst │ └── ubuntu/ │ ├── torch-neuronx-ubuntu20-base-dlami.rst │ ├── torch-neuronx-ubuntu20-pytorch-dlami.rst │ ├── torch-neuronx-ubuntu20.rst │ ├── torch-neuronx-ubuntu22.rst │ └── torch-neuronx-ubuntu24.rst ├── _content-types/ │ ├── conceptual-deep-dive.rst │ ├── model-card.rst │ ├── procedural-how-to.rst │ ├── procedural-tutorial.ipynb │ ├── reference-kernel-api.rst │ └── release-notes-templates/ │ ├── compiler.rst │ ├── containers.rst │ ├── dlami.rst │ ├── index.rst │ ├── nki.rst │ ├── nx-jax.rst │ ├── nx-pytorch.rst │ ├── nxd-core.rst │ ├── nxd-inference.rst │ ├── nxd-training.rst │ ├── runtime.rst │ └── tools.rst ├── _ext/ │ ├── archive.py │ ├── df_tables.py │ ├── local_documenter.py │ ├── neuron_tag.py │ ├── release-notes-automation-spec.md │ ├── release-notes-context.md │ ├── sphinx_plotly_directive.py │ └── symlink.py ├── _static/ │ └── css/ │ ├── custom.css │ └── custom.css.new ├── _templates/ │ ├── recentposts.html │ ├── search-field.html │ ├── search-google.html │ └── search.html ├── _utilities/ │ ├── JIRA_SETUP_QUICKSTART.md │ ├── add_meta.py │ ├── audit_frameworks.py │ ├── check_urls.sh │ ├── create_sitemap.py │ ├── format_build_logs.py │ ├── inject_archive_meta.py │ ├── metadata_schema.yaml │ ├── migrate_setup_content.py │ ├── old-nki-apis.txt │ └── setup_jira_token.sh ├── about-neuron/ │ ├── amazonq-getstarted.rst │ ├── announcements/ │ │ ├── index.rst │ │ ├── neuron1.x/ │ │ │ ├── announce-eol-mx-before-1-5.rst │ │ │ ├── announce-eol-pt-1-5.rst │ │ │ ├── announce-eol-pt-before-1-8.rst │ │ │ ├── announce-eol-tf-before-2-5.rst │ │ │ ├── announce-eol-tf-before-2-7.rst │ │ │ ├── announcements.rst │ │ │ ├── eol-ncgs-env_2.rst │ │ │ ├── eol-pt-15.rst │ │ │ └── eol-tf-21-24.rst │ │ └── neuron2.x/ │ │ ├── announce-component-change.rst │ │ ├── announce-correction-neuron-driver-support-inf1.rst │ │ ├── announce-deprecation-containers-rtd.rst │ │ ├── announce-deprecation-nxd-path-trace-api.rst │ │ ├── announce-deprecation-transformer-flag.rst │ │ ├── announce-eol-megatron-lm.rst │ │ ├── announce-eol-python-3-7.rst │ │ ├── announce-eol-ubuntu-18.rst │ │ ├── announce-eos-al2.rst │ │ ├── announce-eos-beta-pytorch-neuroncore-placement-apis.rst │ │ ├── announce-eos-bf16-vars.rst │ │ ├── announce-eos-block-dimension-nki.rst │ │ ├── announce-eos-dlami-ubuntu-22-04.rst │ │ ├── announce-eos-dlami.rst │ │ ├── announce-eos-inf1-virtual-environments.rst │ │ ├── announce-eos-jax-neuronx-nki-call.rst │ │ ├── announce-eos-megatronlm-2-13.rst │ │ ├── announce-eos-mllama-checkpoint.rst │ │ ├── announce-eos-multiframework-dlamis-inf1.rst │ │ ├── announce-eos-nemo.rst │ │ ├── announce-eos-neuron-det.rst │ │ ├── announce-eos-neuron-driver-support-inf1.rst │ │ ├── announce-eos-neuron-profiler-2.rst │ │ ├── announce-eos-neuron-profiler-v230.rst │ │ ├── announce-eos-neuron-profiler.rst │ │ ├── announce-eos-neurondevice-version.rst │ │ ├── announce-eos-neurondevice.rst │ │ ├── announce-eos-nxd-examples.rst │ │ ├── announce-eos-nxdt-nxd-core-training.rst │ │ ├── announce-eos-probuf.rst │ │ ├── announce-eos-pt-versions.rst │ │ ├── announce-eos-pt2.rst │ │ ├── announce-eos-python38.rst │ │ ├── announce-eos-pytorch-1-1-3.rst │ │ ├── announce-eos-pytorch-1-9.rst │ │ ├── announce-eos-pytorch-2-1.rst │ │ ├── announce-eos-pytorch-2-7-2-8-v229.rst │ │ ├── announce-eos-pytorch-2-7-2-8.rst │ │ ├── announce-eos-pytorch-profiling-api.rst │ │ ├── announce-eos-tensorboard-tools.rst │ │ ├── announce-eos-tensorflow-2-8-9.rst │ │ ├── announce-eos-tensorflow-inf2.rst │ │ ├── announce-eos-tensorflow1-x.rst │ │ ├── announce-eos-torch-neuron.rst │ │ ├── announce-eos-torch-neuronx-nki-jit.rst │ │ ├── announce-eos-u20-dlamis.rst │ │ ├── announce-eos-xla-bf16.rst │ │ ├── announce-intent-eol-nemo-arg.rst │ │ ├── announce-intent-eos-opt.rst │ │ ├── announce-intent-eos-pt-version.rst │ │ ├── announce-intent-eos-pt2-6.rst │ │ ├── announce-intent-eos-tensorflow-tutorial-inf.rst │ │ ├── announce-intent-eos-tnx.rst │ │ ├── announce-intent-maintenance-tnx.rst │ │ ├── announce-maintenance-mxnet.rst │ │ ├── announce-maintenance-nxdi-nxd-core-inference.rst │ │ ├── announce-maintenance-nxdt-nxd-core-training.rst │ │ ├── announce-maintenance-tf.rst │ │ ├── announce-moving-samples.rst │ │ ├── announce-nki-library-namespace-changes-2-28.rst │ │ ├── announce-nki-namespace-migration.rst │ │ ├── announce-no-longer-support-neuron-det.rst │ │ ├── announce-no-longer-support-nxd-examples.rst │ │ ├── announce-no-longer-support-pytorch-113.rst │ │ ├── announce-no-longer-support-pytorch-2-1.rst │ │ ├── announce-no-longer-support-pytorch-2-7-2-8.rst │ │ ├── announce-no-longer-support-tensorflow-inf2.rst │ │ ├── announce-no-longer-support-u20-dlc-dlami.rst │ │ ├── announce-no-support-al2.rst │ │ ├── announce-no-support-device-version.rst │ │ ├── announce-no-support-jax-neuronx-nki-call.rst │ │ ├── announce-no-support-llama3-2-checkpoint.rst │ │ ├── announce-no-support-nemo-megatron.rst │ │ ├── announce-no-support-neurondevice.rst │ │ ├── announce-no-support-nki-jit-torch.rst │ │ ├── announce-no-support-tensorboard-plugin.rst │ │ ├── announce-no-support-tensorflow1-x.rst │ │ ├── announce-no-support-tensorflow2-10.rst │ │ ├── announce-no-support-tf-versions.rst │ │ ├── announce-no-support-torch-neuron-versions.rst │ │ ├── announce-no-support-ubuntu-20-base.rst │ │ ├── announce-no-support-vllm-v0.rst │ │ ├── announce-nxdi-changes.rst │ │ ├── announce-package-change.rst │ │ ├── announce-python38-no-longer-support.rst │ │ ├── announce-transition-pytorch-trainium.rst │ │ ├── announcement-end-of-support-neuronxcc-nki.rst │ │ ├── announcement-end-of-support-nxdt-nxd-core.rst │ │ ├── announcement-end-of-support-parallel-model-trace.rst │ │ ├── announcement-end-of-support-pytorch-2-6.rst │ │ ├── announcement-end-of-support-vllm-v0.rst │ │ ├── announcement-nki-library-kernel-migration.rst │ │ ├── announcement-nki-library-namespace-changes.rst │ │ ├── announcement-python-3-9-eol.rst │ │ ├── dlami-neuron-2.10.rst │ │ ├── dlami-neuron-2.12.rst │ │ ├── dlami-pytorch-introduce.rst │ │ ├── end-of-support-pt2.rst │ │ ├── github-changes.rst │ │ ├── gpg-expiration.rst │ │ ├── neuron-rtd-eol.rst │ │ ├── neuron2-intro.rst │ │ ├── neuron230-packages-changes.rst │ │ ├── neuron250-packages-changes.rst │ │ ├── release-neuron2.4.rst │ │ ├── sm-training-dlc-2.9.1.rst │ │ └── sm-training-trn1-introduce.rst │ ├── appnotes/ │ │ ├── index.rst │ │ ├── mxnet-neuron/ │ │ │ └── flex-eg.rst │ │ ├── neuron-cc/ │ │ │ └── mixed-precision.rst │ │ ├── neuron1x/ │ │ │ ├── important-neuronx-dkms.txt │ │ │ └── introducing-libnrt.rst │ │ ├── neuronx-cc/ │ │ │ └── neuronx-cc-training-mixed-precision.rst │ │ ├── neuronx-distributed/ │ │ │ ├── introducing-nxd-inference.rst │ │ │ └── introducing-nxdt-training.rst │ │ ├── perf/ │ │ │ └── neuron-cc/ │ │ │ ├── parallel-ncgs.rst │ │ │ └── performance-tuning.rst │ │ ├── torch-neuron/ │ │ │ ├── bucketing-app-note.rst │ │ │ ├── index.rst │ │ │ ├── rcnn-app-note.rst │ │ │ └── torch-neuron-dataparallel-app-note.rst │ │ ├── torch-neuronx/ │ │ │ ├── index.rst │ │ │ ├── introducing-pytorch-2-6.rst │ │ │ ├── introducing-pytorch-2-7.rst │ │ │ ├── introducing-pytorch-2-8.rst │ │ │ ├── introducing-pytorch-2-9.rst │ │ │ ├── introducing-pytorch-2-x.rst │ │ │ ├── migration-from-xla-downcast-bf16.rst │ │ │ ├── torch-neuronx-dataparallel-app-note.rst │ │ │ └── torch-neuronx-graph-partitioner-app-note.rst │ │ └── transformers-neuronx/ │ │ └── generative-llm-inference-with-neuron.rst │ ├── arch/ │ │ ├── glossary.rst │ │ ├── index.rst │ │ ├── neuron-features/ │ │ │ ├── custom-c++-operators.rst │ │ │ ├── data-types.rst │ │ │ ├── index.rst │ │ │ ├── logical-neuroncore-config.rst │ │ │ ├── neuron-caching.rst │ │ │ ├── neuroncore-batching.rst │ │ │ ├── neuroncore-pipeline.rst │ │ │ └── rounding-modes.rst │ │ └── neuron-hardware/ │ │ ├── inf1-arch.rst │ │ ├── inf2-arch.rst │ │ ├── inferentia.rst │ │ ├── inferentia2.rst │ │ ├── neuron-core-v1.rst │ │ ├── neuron-core-v2.rst │ │ ├── neuron-core-v3.rst │ │ ├── neuron-core-v4.rst │ │ ├── trainium.rst │ │ ├── trainium2.rst │ │ ├── trainium3.rst │ │ ├── trn1-arch.rst │ │ ├── trn2-arch.rst │ │ └── trn3-arch.rst │ ├── benchmarks/ │ │ ├── index.rst │ │ ├── inf1/ │ │ │ ├── data.csv │ │ │ ├── index.rst │ │ │ ├── instance_prices.csv │ │ │ ├── latency_data_encoder.csv │ │ │ ├── throughput_data_cnn.csv │ │ │ └── throughput_data_encoder.csv │ │ ├── inf2/ │ │ │ ├── inf2-performance.rst │ │ │ ├── inf2_instance_prices.csv │ │ │ ├── latency_data_decoder.csv │ │ │ ├── latency_data_encoder.csv │ │ │ ├── latency_data_encoder_decoder.csv │ │ │ ├── latency_data_vision.csv │ │ │ ├── latency_data_vision_cnn.csv │ │ │ ├── latency_data_vision_dit.csv │ │ │ ├── latency_data_vision_sd.csv │ │ │ ├── latency_data_vision_transformers.csv │ │ │ ├── throughput_data_decoder.csv │ │ │ ├── throughput_data_encoder.csv │ │ │ ├── throughput_data_encoder_decoder.csv │ │ │ ├── throughput_data_vision.csv │ │ │ ├── throughput_data_vision_cnn.csv │ │ │ ├── throughput_data_vision_dit.csv │ │ │ ├── throughput_data_vision_sd.csv │ │ │ └── throughput_data_vision_transformers.csv │ │ └── trn1/ │ │ ├── latency_data_decoder.csv │ │ ├── latency_data_encoder.csv │ │ ├── latency_data_encoder_decoder.csv │ │ ├── throughput_data_decoder.csv │ │ ├── throughput_data_encoder.csv │ │ ├── throughput_data_encoder_decoder.csv │ │ ├── training_data_decoder.csv │ │ ├── training_data_encoder.csv │ │ ├── training_data_vision_transformers.csv │ │ ├── trn1-inference-performance.rst │ │ ├── trn1-training-performance.rst │ │ ├── trn1_instance_prices.csv │ │ └── trn1_trn1n_nlp_data.csv │ ├── beta-participation.rst │ ├── calculator/ │ │ └── neuron-calculator.rst │ ├── faq/ │ │ ├── contributing-faq.rst │ │ ├── index.rst │ │ ├── inference/ │ │ │ ├── neuron-faq.rst │ │ │ └── trouble-shooting-faq.rst │ │ ├── neuron2-intro-faq.rst │ │ ├── onnx-faq.rst │ │ ├── roadmap-faq.rst │ │ └── training/ │ │ └── neuron-training.rst │ ├── faq.rst │ ├── index.rst │ ├── models/ │ │ ├── index.rst │ │ ├── inference-inf1-samples.rst │ │ ├── inference-inf2-trn1-samples.rst │ │ └── training-trn1-samples.rst │ ├── monitoring-tools.rst │ ├── news-and-blogs/ │ │ ├── CONTRIBUTING.md │ │ ├── JIRA-INTEGRATION-DESIGN.md │ │ ├── README.md │ │ ├── article-template.yaml │ │ ├── index.rst │ │ ├── news-and-blogs.yaml │ │ └── validate_articles.py │ ├── oss/ │ │ └── index.rst │ ├── profiling-tools.rst │ ├── quick-start/ │ │ ├── _specs/ │ │ │ └── REFACTORING_NOTES.md │ │ ├── docs-quicklinks.rst │ │ ├── github-samples.rst │ │ ├── index.rst │ │ ├── inference-quickstart.rst │ │ ├── mxnet-neuron.rst │ │ ├── tab-inference-tensorflow-neuron.rst │ │ ├── tensorflow-neuron.rst │ │ ├── torch-neuron-tab-training.rst │ │ ├── torch-neuron.rst │ │ ├── training-quickstart.rst │ │ └── user-guide-quickstart.rst │ ├── sdk-policy.rst │ ├── security.rst │ ├── troubleshooting.rst │ ├── what-is-neuron.rst │ └── whats-new.rst ├── archive/ │ ├── helper-tools/ │ │ ├── index.rst │ │ ├── tutorial-neuron-check-model.rst │ │ └── tutorial-neuron-gatherinfo.rst │ ├── index.rst │ ├── mxnet-neuron/ │ │ ├── api-compilation-python-api.rst │ │ ├── api-reference-guide.rst │ │ ├── api-reference-guide.txt │ │ ├── developer-guide.rst │ │ ├── developer-guide.txt │ │ ├── ec2-then-ec2-devflow.rst │ │ ├── index.rst │ │ ├── inference-mxnet-neuron.rst │ │ ├── inference-mxnet-neuron.txt │ │ ├── misc-mxnet-neuron.rst │ │ ├── misc-mxnet-neuron.txt │ │ ├── mxnet-neuron-setup.rst │ │ ├── mxnet-neuron-setup.txt │ │ ├── neo-then-hosting-devflow.rst │ │ ├── setup/ │ │ │ ├── mxnet-install-prev-al2.rst │ │ │ ├── mxnet-install-prev-al2023.rst │ │ │ ├── mxnet-install-prev-u20.rst │ │ │ ├── mxnet-install-prev-u22.rst │ │ │ ├── mxnet-install.rst │ │ │ ├── mxnet-neuron-al2-base-dlami.rst │ │ │ ├── mxnet-neuron-al2.rst │ │ │ ├── mxnet-neuron-al2023.rst │ │ │ ├── mxnet-neuron-ubuntu20-base-dlami.rst │ │ │ ├── mxnet-neuron-ubuntu20.rst │ │ │ ├── mxnet-neuron-ubuntu22.rst │ │ │ ├── mxnet-update-u20.rst │ │ │ ├── mxnet-update.rst │ │ │ ├── prev-releases/ │ │ │ │ ├── neuron-1.14.2-mxnet-install.rst │ │ │ │ ├── neuron-1.15.0-mxnet-install.rst │ │ │ │ ├── neuron-1.15.1-mxnet-install.rst │ │ │ │ ├── neuron-1.15.2-mxnet-install.rst │ │ │ │ ├── neuron-1.16.3-mxnet-install.rst │ │ │ │ ├── neuron-1.17.2-mxnet-install.rst │ │ │ │ ├── neuron-1.18.0-mxnet-install.rst │ │ │ │ └── neuron-1.19.0-mxnet-install.rst │ │ │ └── setup-inference │ │ ├── troubleshooting-guide.rst │ │ └── tutorials/ │ │ ├── mxnet-tutorial-setup.rst │ │ ├── tutorial-model-serving.rst │ │ ├── tutorials-mxnet-computervision.rst │ │ ├── tutorials-mxnet-neuron.rst │ │ ├── tutorials-mxnet-neuron.txt │ │ ├── tutorials-mxnet-nlp.rst │ │ └── tutorials-mxnet-utilizing-neuron-capabilities.rst │ ├── neuronperf/ │ │ ├── index.rst │ │ ├── neuronperf_api.rst │ │ ├── neuronperf_benchmark_guide.rst │ │ ├── neuronperf_compile_guide.rst │ │ ├── neuronperf_evaluate_guide.rst │ │ ├── neuronperf_examples.rst │ │ ├── neuronperf_faq.rst │ │ ├── neuronperf_framework_notes.rst │ │ ├── neuronperf_install.rst │ │ ├── neuronperf_model_index_guide.rst │ │ ├── neuronperf_overview.rst │ │ ├── neuronperf_terminology.rst │ │ ├── neuronperf_troubleshooting.rst │ │ ├── rn.rst │ │ ├── setup.cfg │ │ ├── setup.py │ │ ├── test_resnet50_pt.py │ │ └── test_simple_pt.py │ ├── src/ │ │ └── benchmark/ │ │ └── pytorch/ │ │ ├── bert-base-cased_benchmark.py │ │ ├── bert-base-cased_compile.py │ │ ├── bert-base-uncased_benchmark.py │ │ ├── bert-base-uncased_compile.py │ │ ├── distilbert-base-uncased-finetuned-sst-2-english_benchmark.py │ │ ├── distilbert-base-uncased-finetuned-sst-2-english_compile.py │ │ ├── distilbert-base-uncased_benchmark.py │ │ ├── distilbert-base-uncased_compile.py │ │ ├── distilroberta-base_benchmark.py │ │ ├── distilroberta-base_compile.py │ │ ├── hf-google-vit_benchmark.py │ │ ├── hf-openai-clip_benchmark.py │ │ ├── hf_pretrained_wav2vec2_conformer_relpos_benchmark.py │ │ ├── hf_pretrained_wav2vec2_conformer_rope_benchmark.py │ │ ├── inf2_benchmark.py │ │ ├── opt_benchmark.py │ │ ├── perceiver-multimodal_benchmark.py │ │ ├── perceiver-multimodal_compile.py │ │ ├── perceiver-vision_benchmark.py │ │ ├── perceiver-vision_compile.py │ │ ├── pixart_alpha_benchmark.py │ │ ├── pixart_sigma_benchmark.py │ │ ├── resnet50_benchmark.py │ │ ├── resnet50_compile.py │ │ ├── resnet_benchmark.py │ │ ├── resnet_compile.py │ │ ├── sd2_512_benchmark.py │ │ ├── sd2_512_compile.py │ │ ├── sd2_768_benchmark.py │ │ ├── sd2_768_compile.py │ │ ├── sd2_inpainting_benchmark.py │ │ ├── sd2_inpainting_inference.py │ │ ├── sd_15_512_benchmark.py │ │ ├── sd_15_512_compile.py │ │ ├── sd_4x_upscaler_benchmark.py │ │ ├── sd_4x_upscaler_compile.py │ │ ├── sdxl_base_1024_benchmark.py │ │ ├── sdxl_base_1024_compile.py │ │ ├── sdxl_base_and_refiner_1024_benchmark.py │ │ ├── sdxl_base_and_refiner_1024_compile.py │ │ ├── unet_benchmark.py │ │ ├── unet_compile.py │ │ ├── vgg_benchmark.py │ │ └── vgg_compile.py │ ├── tensorboard/ │ │ └── getting-started-tensorboard-neuron-plugin.rst │ ├── tensorflow/ │ │ ├── index.rst │ │ ├── setup-legacy-inf1-tensorflow.rst │ │ ├── tensorflow-neuron/ │ │ │ ├── additional-examples.rst │ │ │ ├── additional-examples.txt │ │ │ ├── api-auto-replication-api.rst │ │ │ ├── api-compilation-python-api.rst │ │ │ ├── api-reference-guide.rst │ │ │ ├── api-reference-guide.txt │ │ │ ├── api-tfn-analyze-model-api.rst │ │ │ ├── api-tracing-python-api.rst │ │ │ ├── dlc-then-ec2-devflow.rst │ │ │ ├── dlc-then-ecs-devflow.rst │ │ │ ├── dlc-then-eks-devflow.rst │ │ │ ├── ec2-then-ec2-devflow.rst │ │ │ ├── misc-tensorflow-neuron.rst │ │ │ ├── misc-tensorflow-neuron.txt │ │ │ ├── neo-then-hosting-devflow.rst │ │ │ ├── setup/ │ │ │ │ ├── prev-releases/ │ │ │ │ │ ├── neuron-1.14.2-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.15.0-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.15.1-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.15.2-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.16.3-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.17.0-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.17.1-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.17.2-tensorflow-install.rst │ │ │ │ │ ├── neuron-1.18.0-tensorflow-install.rst │ │ │ │ │ └── neuron-1.19.0-tensorflow-install.rst │ │ │ │ ├── tensorflow-install-prev-al2023.rst │ │ │ │ ├── tensorflow-install-prev-u20.rst │ │ │ │ ├── tensorflow-install-prev-u22.rst │ │ │ │ ├── tensorflow-install-prev.rst │ │ │ │ ├── tensorflow-install.rst │ │ │ │ ├── tensorflow-update-u20.rst │ │ │ │ ├── tensorflow-update-u22.rst │ │ │ │ └── tensorflow-update.rst │ │ │ ├── tensorflow2-accelerated-ops.rst │ │ │ ├── tf2_faq.rst │ │ │ └── tutorials/ │ │ │ ├── bert_demo/ │ │ │ │ ├── bert_demo.rst │ │ │ │ ├── glue_mrpc_dev.tsv │ │ │ │ └── mrpc.proto │ │ │ ├── index.rst │ │ │ ├── k8s_bert_demo/ │ │ │ │ └── Dockerfile.tfserving_example │ │ │ ├── tensorflow-tutorial-setup.rst │ │ │ ├── tutorials-tensorflow-neuron.rst │ │ │ ├── tutorials-tensorflow-neuron.txt │ │ │ ├── tutorials-tensorflow-nlp.rst │ │ │ └── tutorials-tensorflow-utilizing-neuron-capabilities.rst │ │ ├── tensorflow-neuron-inference.rst │ │ ├── tensorflow-neuron-inference.txt │ │ ├── tensorflow-neuronx/ │ │ │ ├── api-reference-guide.rst │ │ │ ├── api-reference-guide.txt │ │ │ ├── misc-tensorflow-neuronx.rst │ │ │ ├── misc-tensorflow-neuronx.txt │ │ │ ├── setup/ │ │ │ │ ├── index.rst │ │ │ │ ├── prev-releases/ │ │ │ │ │ ├── neuronx-2.8.0-tensorflow-install.rst │ │ │ │ │ └── neuronx-2.9.0-tensorflow-install.rst │ │ │ │ ├── tensorflow-install-prev-al2.rst │ │ │ │ ├── tensorflow-install-prev-al2023.rst │ │ │ │ ├── tensorflow-install-prev-u20.rst │ │ │ │ ├── tensorflow-install-prev-u22.rst │ │ │ │ ├── tensorflow-neuronx-install.rst │ │ │ │ ├── tensorflow-update-al2-dlami.rst │ │ │ │ ├── tensorflow-update-al2.rst │ │ │ │ ├── tensorflow-update-u20-dlami.rst │ │ │ │ ├── tensorflow-update-u20.rst │ │ │ │ └── tensorflow-update-u22.rst │ │ │ ├── tf-neuronx-auto-replication-api.rst │ │ │ ├── tfneuronx-python-tracing-api.rst │ │ │ ├── tfnx-analyze-model-api.rst │ │ │ └── tutorials/ │ │ │ ├── tutorial-tensorflowx-serving-NeuronRT-Visible-Cores.rst │ │ │ ├── tutorials-tensorflow-neuronx.rst │ │ │ └── tutorials-tensorflow-neuronx.txt │ │ ├── tensorflow-neuronx-inference.rst │ │ ├── tensorflow-neuronx-inference.txt │ │ ├── tensorflow-setup.rst │ │ └── tensorflow-setup.txt │ ├── torch-neuron/ │ │ ├── additional-examples-inference-torch-neuron.rst │ │ ├── additional-examples-inference-torch-neuron.txt │ │ ├── api-compilation-python-api.rst │ │ ├── api-core-placement.rst │ │ ├── api-reference-guide-torch-neuron.rst │ │ ├── api-reference-guide-torch-neuron.txt │ │ ├── api-torch-neuron-dataparallel-api.rst │ │ ├── developer-guide-torch-neuron.rst │ │ ├── developer-guide-torch-neuron.txt │ │ ├── guides/ │ │ │ ├── core-placement/ │ │ │ │ └── torch-core-placement.rst │ │ │ └── torch-lstm-support.rst │ │ ├── index.rst │ │ ├── inference-torch-neuron.rst │ │ ├── misc-inference-torch-neuron.rst │ │ ├── misc-inference-torch-neuron.txt │ │ ├── placement.py │ │ ├── setup/ │ │ │ ├── index.rst │ │ │ ├── prev-releases/ │ │ │ │ ├── neuron-1.14.2-pytorch-install.rst │ │ │ │ ├── neuron-1.15.0-pytorch-install.rst │ │ │ │ ├── neuron-1.15.1-pytorch-install.rst │ │ │ │ ├── neuron-1.15.2-pytorch-install.rst │ │ │ │ ├── neuron-1.16.1-pytorch-install.rst │ │ │ │ ├── neuron-1.16.2-pytorch-install.rst │ │ │ │ ├── neuron-1.16.3-pytorch-install.rst │ │ │ │ ├── neuron-1.17.2-pytorch-install.rst │ │ │ │ ├── neuron-1.18.0-pytorch-install.rst │ │ │ │ ├── neuron-1.19.0-pytorch-install.rst │ │ │ │ ├── neuron-2.3.0-pytorch-install.rst │ │ │ │ ├── neuron-2.4.0-pytorch-install.rst │ │ │ │ └── neuron-2.5.0-pytorch-install.rst │ │ │ ├── pytorch-install-cxx11.rst │ │ │ ├── pytorch-install-prev-al2.rst │ │ │ ├── pytorch-install-prev-al2023.rst │ │ │ ├── pytorch-install-prev-u20.rst │ │ │ ├── pytorch-install-prev-u22.rst │ │ │ ├── pytorch-install-prev.rst │ │ │ ├── pytorch-install.rst │ │ │ ├── pytorch-update-al2-dlami.rst │ │ │ ├── pytorch-update-al2023.rst │ │ │ ├── pytorch-update-u20-dlami.rst │ │ │ ├── pytorch-update-u20.rst │ │ │ ├── pytorch-update-u22.rst │ │ │ └── pytorch-update.rst │ │ ├── torch-neuron-dataparallel-example-default.rst │ │ ├── torch-neuron-dataparallel-example-dim-neq-zero.rst │ │ ├── torch-neuron-dataparallel-example-disable-dynamic-batching.rst │ │ ├── torch-neuron-dataparallel-example-dynamic-batching.rst │ │ ├── torch-neuron-dataparallel-example-specify-ncs.rst │ │ ├── troubleshooting-guide.rst │ │ └── tutorials/ │ │ ├── neuroncore_pipeline_pytorch.rst │ │ ├── pytorch-tutorial-setup.rst │ │ ├── transformers-marianmt.rst │ │ ├── tutorial-libtorch.rst │ │ ├── tutorial-torchserve.rst │ │ ├── tutorial_source_instructions/ │ │ │ ├── run_libtorch.sh │ │ │ └── run_torchserve_u20.sh │ │ ├── tutorials-inference-torch-neuron.rst │ │ ├── tutorials-inference-torch-neuron.txt │ │ ├── tutorials-torch-neuron-computervision.rst │ │ ├── tutorials-torch-neuron-nlp.rst │ │ └── tutorials-utilizing-neuron-capabilities.rst │ ├── transformers-neuronx/ │ │ ├── api-reference-guide.rst │ │ ├── api-reference-guide.txt │ │ ├── developer-guide.rst │ │ ├── developer-guide.txt │ │ ├── index.rst │ │ ├── setup/ │ │ │ └── index.rst │ │ ├── transformers-neuronx-api-reference.rst │ │ ├── transformers-neuronx-developer-guide-for-continuous-batching.rst │ │ ├── transformers-neuronx-developer-guide.rst │ │ ├── transformers-neuronx-misc.rst │ │ ├── transformers-neuronx-misc.txt │ │ ├── transformers-neuronx-tutorials.rst │ │ ├── transformers-neuronx-tutorials.txt │ │ └── transformers-neuronx.txt │ └── tutorials/ │ ├── finetune_t5.rst │ ├── finetuning_llama2_7b_ptl.rst │ ├── gpt3_neuronx_nemo_megatron_pretraining.rst │ ├── megatron_gpt_pretraining.rst │ ├── multinode-training-model-profiling.rst │ ├── nxd-source-code/ │ │ ├── gpt_neox_tp_zero1/ │ │ │ ├── gpt_neox_20b.sh │ │ │ └── gpt_neox_6_9b.sh │ │ └── llama_tp_pp_ptl/ │ │ ├── llama_2_13b.sh │ │ ├── llama_2_70b.sh │ │ ├── llama_2_7b.sh │ │ └── llama_tp_pp_ptl_setup.sh │ ├── ssd300_demo/ │ │ ├── requirements.txt │ │ ├── ssd300_demo.rst │ │ ├── ssd300_detection.py │ │ ├── ssd300_evaluation.py │ │ ├── ssd300_evaluation_client.py │ │ └── ssd300_model.py │ ├── training-gpt-neox-20b.rst │ ├── training-gpt-neox.rst │ ├── training_codegen25_7b.rst │ ├── training_llama2_tp_pp_ptl.rst │ └── tutorial_source_code/ │ └── t5_finetuning/ │ ├── t5_finetuning_32_worker_training_code.sh │ ├── t5_finetuning_multi_worker_training_code.sh │ ├── t5_finetuning_setup_code.sh │ ├── t5_finetuning_single_worker_training_code.sh │ └── t5_modify_run_summarization_code.sh ├── audit-report.md ├── build.sh ├── compiler/ │ ├── error-codes/ │ │ ├── EARG001.rst │ │ ├── EBIR023.rst │ │ ├── EBVF030.rst │ │ ├── EHCA005.rst │ │ ├── EOOM001.rst │ │ ├── EOOM002.rst │ │ ├── ESFH002.rst │ │ ├── ESPP004.rst │ │ ├── ESPP047.rst │ │ ├── EUOC002.rst │ │ ├── EVRF001.rst │ │ ├── EVRF004.rst │ │ ├── EVRF005.rst │ │ ├── EVRF006.rst │ │ ├── EVRF007.rst │ │ ├── EVRF009.rst │ │ ├── EVRF010.rst │ │ ├── EVRF011.rst │ │ ├── EVRF013.rst │ │ ├── EVRF015.rst │ │ ├── EVRF016.rst │ │ ├── EVRF017.rst │ │ ├── EVRF018.rst │ │ ├── EVRF019.rst │ │ ├── EVRF022.rst │ │ ├── EVRF031.rst │ │ ├── EXSP001.rst │ │ ├── EXTP004.rst │ │ └── index.rst │ ├── index.rst │ ├── neuron-cc/ │ │ ├── api-reference-guide.rst │ │ ├── command-line-reference.rst │ │ ├── developer-guide.rst │ │ └── faq.rst │ ├── neuron-cc.rst │ ├── neuronx-cc/ │ │ ├── api-reference-guide/ │ │ │ └── index.rst │ │ ├── developer-guide.rst │ │ ├── faq.rst │ │ └── how-to-convolution-in-unet.rst │ └── neuronx-cc.rst ├── conf.py ├── containers/ │ ├── container-deployment-flows.rst │ ├── container-sm-hosting-devflow.rst │ ├── developerflows.rst │ ├── developerflows.txt │ ├── dlc-then-customize-devflow.rst │ ├── dlc-then-ec2-devflow.rst │ ├── dlc-then-ecs-devflow.rst │ ├── dlc-then-eks-devflow.rst │ ├── dlc-then-k8s-devflow.rst │ ├── docker-example/ │ │ ├── Dockerfile.device-plugin │ │ ├── index.rst │ │ ├── inference/ │ │ │ ├── Dockerfile-inference │ │ │ ├── Dockerfile-inference-dlc │ │ │ ├── Dockerfile-inference-dlc.rst │ │ │ ├── Dockerfile-libmode │ │ │ ├── Dockerfile-libmode.rst │ │ │ ├── Dockerfile-tf-serving.rst │ │ │ ├── Dockerfile.mxnet-serving │ │ │ ├── Dockerfile.tf-serving │ │ │ ├── config-properties.rst │ │ │ ├── config.properties │ │ │ ├── dockerd-libmode-entrypoint.rst │ │ │ ├── dockerd-libmode-entrypoint.sh │ │ │ ├── torchserve-neuron.rst │ │ │ └── torchserve-neuron.sh │ │ ├── training/ │ │ │ ├── Dockerfile-training-dlc │ │ │ ├── Dockerfile-trainium-dlc.rst │ │ │ ├── mlp.rst │ │ │ ├── mlp_train.py │ │ │ └── model.py │ │ └── v1/ │ │ └── inference/ │ │ ├── Dockerfile-app-rt-diff.rst │ │ ├── Dockerfile-app-rt-same.rst │ │ ├── Dockerfile-neuron-rtd.rst │ │ ├── Dockerfile-torch-neuron.rst │ │ ├── Dockerfile.app-rt-diff │ │ ├── Dockerfile.neuron-rtd │ │ ├── Dockerfile.torch-neuron │ │ ├── dockerd-entrypoint-app-rt-same.rst │ │ └── dockerd-entrypoint.sh │ ├── ec2-then-ec2-devflow.rst │ ├── ec2.rst │ ├── faq-troubleshooting-releasenote.rst │ ├── faq.rst │ ├── files/ │ │ ├── index-dra.rst │ │ ├── manifests/ │ │ │ ├── clusterrole.yaml │ │ │ ├── clusterrolebinding.yaml │ │ │ ├── daemonset.yaml │ │ │ ├── deviceclass.yaml │ │ │ ├── namespace.yaml │ │ │ └── serviceaccount.yaml │ │ ├── scripts/ │ │ │ └── install-dra-driver.sh │ │ └── specs/ │ │ ├── 1x4-connected-devices.yaml │ │ ├── 2-node-inference-us.yaml │ │ ├── 4-node-inference-us.yaml │ │ ├── all-devices.yaml │ │ ├── lnc-setting-trn2.yaml │ │ ├── specific-driver-version.yaml │ │ └── us-and-lnc-config.yaml │ ├── get-started/ │ │ ├── quickstart-configure-deploy-dlc.rst │ │ └── quickstart-pytorch-inference-dlc.rst │ ├── getting-started.rst │ ├── how-to/ │ │ └── how-to-ultraserver.rst │ ├── index.rst │ ├── k8.rst │ ├── kubernetes-getting-started.rst │ ├── locate-neuron-dlc-image.rst │ ├── neo-then-hosting-devflow.rst │ ├── neuron-dra.rst │ ├── neuron-plugins.rst │ ├── neuron_dlc_images.csv │ ├── troubleshooting.rst │ ├── tutorial-docker-runtime1.0.rst │ ├── tutorials/ │ │ ├── build-run-neuron-container.rst │ │ ├── inference/ │ │ │ ├── index.rst │ │ │ ├── index.txt │ │ │ ├── k8s_rn50_demo.rst │ │ │ └── tutorial-infer.rst │ │ ├── k8s-default-scheduler.rst │ │ ├── k8s-multiple-scheduler.rst │ │ ├── k8s-neuron-device-plugin.rst │ │ ├── k8s-neuron-helm-chart.rst │ │ ├── k8s-neuron-monitor.rst │ │ ├── k8s-neuron-problem-detector-and-recovery-irsa.rst │ │ ├── k8s-neuron-problem-detector-and-recovery.rst │ │ ├── k8s-neuron-scheduler-flow.rst │ │ ├── k8s-neuron-scheduler.rst │ │ ├── k8s-prerequisite.rst │ │ ├── k8s-setup.rst │ │ ├── training/ │ │ │ ├── index.rst │ │ │ ├── index.txt │ │ │ ├── k8s_mlp_train_demo.rst │ │ │ └── tutorial-training.rst │ │ ├── tutorial-docker-env-setup.rst │ │ └── tutorial-oci-hook.rst │ └── tutorials.rst ├── devflows/ │ ├── aws-batch-flows.rst │ ├── aws-batch-flows.txt │ ├── dlc-then-customize-devflow.rst │ ├── ec2-flows.rst │ ├── ec2-flows.txt │ ├── ecs-flows.rst │ ├── eks-flows.rst │ ├── index.rst │ ├── inference/ │ │ ├── aws-batch-flows.rst │ │ ├── aws-batch-flows.txt │ │ ├── byoc-hosting-devflow-inf2.rst │ │ ├── byoc-hosting-devflow.rst │ │ ├── container-sm-hosting-devflow.rst │ │ ├── dev-flows.rst │ │ ├── dlc-then-ec2-devflow.rst │ │ ├── dlc-then-ecs-devflow.rst │ │ ├── dlc-then-eks-devflow.rst │ │ ├── dlc-then-k8s-devflow.rst │ │ ├── ec2-flows.rst │ │ ├── ec2-flows.txt │ │ ├── ec2-then-ec2-devflow-inf2.rst │ │ ├── ec2-then-ec2-devflow.rst │ │ ├── env-setup-text.rst │ │ ├── neo-then-hosting-devflow.rst │ │ ├── parallelcluster-flows.rst │ │ ├── parallelcluster-flows.txt │ │ ├── sagemaker-flows.rst │ │ └── sagemaker-flows.txt │ ├── parallelcluster-flows.rst │ ├── parallelcluster-flows.txt │ ├── plugins/ │ │ ├── npd-ecs-flows.rst │ │ └── npd-ecs-flows.txt │ ├── sagemaker-flows.rst │ ├── setup/ │ │ ├── ecs-flows.rst │ │ ├── ecs-flows.txt │ │ ├── eks-flows.rst │ │ └── eks-flows.txt │ ├── third-party-solutions.rst │ └── training/ │ ├── aws-batch-flows.rst │ ├── aws-batch-flows.txt │ ├── batch/ │ │ └── batch-training.rst │ ├── dlc-then-ecs-devflow.rst │ ├── ec2/ │ │ └── ec2-training.rst │ ├── ec2-flows.rst │ ├── ec2-flows.txt │ ├── parallelcluster/ │ │ └── parallelcluster-training.rst │ ├── parallelcluster-flows.rst │ ├── parallelcluster-flows.txt │ ├── sagemaker-flows.rst │ ├── sagemaker-flows.txt │ └── sm-devflow/ │ └── sm-training-devflow.rst ├── dlami/ │ └── index.rst ├── frameworks/ │ ├── index.rst │ ├── jax/ │ │ ├── api-reference-guide/ │ │ │ ├── index.rst │ │ │ └── neuron-envvars.rst │ │ ├── index.rst │ │ └── setup/ │ │ ├── jax-neuronx-known-issues.rst │ │ └── jax-setup.rst │ └── torch/ │ ├── about/ │ │ └── index.rst │ ├── guide-torch-neuron-vs-torch-neuronx-inference.rst │ ├── index.rst │ ├── inference-torch-neuronx.rst │ ├── pytorch-native-overview.rst │ ├── torch-neuronx/ │ │ ├── additional-examples-inference-torch-neuronx.rst │ │ ├── additional-examples-training.rst │ │ ├── api-reference-guide/ │ │ │ ├── inference/ │ │ │ │ ├── api-torch-neuronx-analyze.rst │ │ │ │ ├── api-torch-neuronx-async-lazy-load.rst │ │ │ │ ├── api-torch-neuronx-core-placement.rst │ │ │ │ ├── api-torch-neuronx-data-parallel.rst │ │ │ │ ├── api-torch-neuronx-replace-weights.rst │ │ │ │ ├── api-torch-neuronx-trace.rst │ │ │ │ └── inference-api-guide-torch-neuronx.rst │ │ │ ├── torch-neuronx-profiling-api.rst │ │ │ └── training/ │ │ │ ├── index.rst │ │ │ ├── pytorch-neuron-parallel-compile.rst │ │ │ └── torch-neuron-envvars.rst │ │ ├── misc-inference-torch-neuronx.rst │ │ ├── misc-training.rst │ │ ├── programming-guide/ │ │ │ ├── inference/ │ │ │ │ ├── autobucketing-dev-guide.rst │ │ │ │ ├── core-placement.rst │ │ │ │ ├── index.rst │ │ │ │ └── trace-vs-xla-lazytensor.rst │ │ │ ├── torch-neuronx-profiling-dev-guide.rst │ │ │ └── training/ │ │ │ ├── index.rst │ │ │ ├── pytorch-neuron-debug.rst │ │ │ └── pytorch-neuron-programming-guide.rst │ │ ├── pytorch-neuron-supported-operators.rst │ │ ├── setup/ │ │ │ ├── install-templates/ │ │ │ │ └── pytorch-dev-install.txt │ │ │ ├── note-setup-general.rst │ │ │ ├── prev-releases/ │ │ │ │ ├── neuronx-2.7.0-pytorch-install.rst │ │ │ │ ├── neuronx-2.8.0-pytorch-install.rst │ │ │ │ └── neuronx-2.9.0-pytorch-install.rst │ │ │ ├── pytorch-install-prev-al2.rst │ │ │ ├── pytorch-install-prev-al2023.rst │ │ │ ├── pytorch-install-prev-u20.rst │ │ │ ├── pytorch-install-prev-u22.rst │ │ │ ├── pytorch-install-prev-u24.rst │ │ │ ├── pytorch-install.rst │ │ │ ├── pytorch-neuronx-install-cxx11.rst │ │ │ ├── pytorch-update-al2-dlami.rst │ │ │ ├── pytorch-update-al2.rst │ │ │ ├── pytorch-update-al2023.rst │ │ │ ├── pytorch-update-u20-dlami.rst │ │ │ ├── pytorch-update-u20.rst │ │ │ ├── pytorch-update-u22.rst │ │ │ └── pytorch-update-u24.rst │ │ ├── setup-trn1-multi-node-execution.rst │ │ ├── torch-neuronx-dataparallel-example-default.rst │ │ ├── torch-neuronx-dataparallel-example-dim-neq-zero.rst │ │ ├── torch-neuronx-dataparallel-example-disable-dynamic-batching.rst │ │ ├── torch-neuronx-dataparallel-example-dynamic-batching.rst │ │ ├── torch-neuronx-dataparallel-example-specify-ncs.rst │ │ ├── training-troubleshooting.rst │ │ └── tutorials/ │ │ ├── inference/ │ │ │ ├── tutorial-torchserve-neuronx.rst │ │ │ └── tutorials-torch-neuronx.rst │ │ ├── note-performance.txt │ │ └── training/ │ │ ├── analyze_for_training.rst │ │ ├── bert.rst │ │ ├── finetune_hftrainer.rst │ │ ├── mlp.rst │ │ ├── tutorial_source_code/ │ │ │ ├── analyze_training/ │ │ │ │ └── analyze_training_code.sh │ │ │ ├── bert_mrpc_finetuning/ │ │ │ │ ├── bert_mrpc_finetuning_converted_checkpoint_training.sh │ │ │ │ ├── bert_mrpc_finetuning_multi_worker_training_code.sh │ │ │ │ ├── bert_mrpc_finetuning_setup_code.sh │ │ │ │ └── bert_mrpc_finetuning_single_worker_training.sh │ │ │ ├── bert_training/ │ │ │ │ ├── bert_amp_training_code.sh │ │ │ │ ├── bert_lamb_bf16_training_code.sh │ │ │ │ ├── bert_lamb_training_code.sh │ │ │ │ ├── bert_phase2_training_code.sh │ │ │ │ ├── bert_precompilation_code.sh │ │ │ │ ├── bert_setup_code.sh │ │ │ │ ├── bert_setup_code_ph2.sh │ │ │ │ └── bert_training_code.sh │ │ │ ├── multi_layer_perceptron_training/ │ │ │ │ └── multi_layer_perceptron_training_code.sh │ │ │ └── zero1_training/ │ │ │ └── zero1_single_node_training_code.sh │ │ ├── tutorials-training-torch-neuronx.rst │ │ └── zero1_gpt2.rst │ ├── torch-setup.rst │ └── training-torch-neuronx.rst ├── general/ │ └── faq.rst ├── includes/ │ └── setup/ │ ├── select-framework-note.txt │ ├── tab-inference-mxnet-neuron-al2.txt │ ├── tab-inference-mxnet-neuron-al2023.txt │ ├── tab-inference-mxnet-neuron-u20.txt │ ├── tab-inference-mxnet-neuron-u22.txt │ ├── tab-inference-mxnet-neuron.txt │ ├── tab-inference-tensorflow-neuron-al2.txt │ ├── tab-inference-tensorflow-neuron-al2023.txt │ ├── tab-inference-tensorflow-neuron-u20.txt │ ├── tab-inference-tensorflow-neuron-u22.txt │ ├── tab-inference-tensorflow-neuronx-al2.txt │ ├── tab-inference-tensorflow-neuronx-al2023.txt │ ├── tab-inference-tensorflow-neuronx-u20.txt │ ├── tab-inference-tensorflow-neuronx-u22.txt │ ├── tab-inference-torch-neuron-al2.txt │ ├── tab-inference-torch-neuron-al2023.txt │ ├── tab-inference-torch-neuron-u20.txt │ ├── tab-inference-torch-neuron-u22.txt │ ├── tab-inference-torch-neuron.txt │ ├── tab-inference-torch-neuronx-al2.txt │ ├── tab-inference-torch-neuronx-al2023.txt │ ├── tab-inference-torch-neuronx-u20.txt │ ├── tab-inference-torch-neuronx-u22.txt │ └── tab-inference-torch-neuronx-u24.txt ├── index.rst ├── info/ │ └── exclude ├── libraries/ │ ├── index.rst │ ├── nemo-megatron/ │ │ └── index.rst │ ├── neuronx-distributed/ │ │ ├── activation_memory_reduction.rst │ │ ├── activation_memory_reduction_developer_guide.rst │ │ ├── api-reference-guide-inference.rst │ │ ├── api-reference-guide-training.rst │ │ ├── api-reference-guide.rst │ │ ├── api-reference-guide.txt │ │ ├── api_guide.rst │ │ ├── app_notes.rst │ │ ├── app_notes.txt │ │ ├── context_parallelism_overview.rst │ │ ├── developer-guide-inference.rst │ │ ├── developer-guide-inference.txt │ │ ├── developer-guide-training.rst │ │ ├── developer-guide-training.txt │ │ ├── developer-guide.rst │ │ ├── developer-guide.txt │ │ ├── index-inference.rst │ │ ├── index-training.rst │ │ ├── lora_finetune_developer_guide.rst │ │ ├── model_builder_v2_api_reference.rst │ │ ├── model_optimizer_wrapper_developer_guide.rst │ │ ├── neuronx-distributed-misc.rst │ │ ├── neuronx-distributed-misc.txt │ │ ├── neuronx_distributed_inference_developer_guide.rst │ │ ├── pipeline_parallelism_overview.rst │ │ ├── pp_developer_guide.rst │ │ ├── ptl_developer_guide.rst │ │ ├── save_load_developer_guide.rst │ │ ├── setup/ │ │ │ ├── index.rst │ │ │ └── index.txt │ │ ├── standard_mixed_precision.rst │ │ ├── tensor_parallelism_overview.rst │ │ ├── tp_developer_guide.rst │ │ └── tutorials/ │ │ ├── finetune_llama3_8b_ptl_lora.rst │ │ ├── index.rst │ │ ├── index.txt │ │ ├── inference.rst │ │ ├── inference_tutorials.rst │ │ ├── neuronx_distributed_tutorials.txt │ │ ├── nxd-source-code/ │ │ │ ├── llama_tp_pp/ │ │ │ │ ├── llama_2_13b.sh │ │ │ │ ├── llama_2_70b.sh │ │ │ │ ├── llama_31_70b.sh │ │ │ │ ├── llama_3_70b.sh │ │ │ │ └── llama_tp_pp_setup.sh │ │ │ └── llama_tp_zero1/ │ │ │ ├── llama_2_7b.sh │ │ │ ├── llama_31_8b.sh │ │ │ ├── llama_3_8b.sh │ │ │ └── llama_tp_zero1_setup.sh │ │ ├── nxd_inference_tutorials.txt │ │ ├── nxd_training_tutorials.txt │ │ ├── training.rst │ │ ├── training_llama_tp_pp.rst │ │ ├── training_llama_tp_zero1.rst │ │ └── training_tutorials.rst │ ├── nxd-inference/ │ │ ├── _templates/ │ │ │ ├── model_card.jinja.rst │ │ │ └── model_card_qwen3.jinja.rst │ │ ├── api-guides/ │ │ │ ├── api-guide.rst │ │ │ ├── api-guide.txt │ │ │ └── index.rst │ │ ├── app-notes/ │ │ │ ├── app_notes.txt │ │ │ ├── index.rst │ │ │ └── parallelism.rst │ │ ├── developer_guides/ │ │ │ ├── accuracy-eval-with-datasets.rst │ │ │ ├── custom-quantization.rst │ │ │ ├── disaggregated-inference.rst │ │ │ ├── feature-guide.rst │ │ │ ├── how-to-use-fpem.rst │ │ │ ├── index.rst │ │ │ ├── llm-inference-benchmarking-guide.rst │ │ │ ├── migrate-from-tnx-to-nxdi.rst │ │ │ ├── model-reference.rst │ │ │ ├── moe-arch-deep-dive.rst │ │ │ ├── nxd-examples-migration-guide.rst │ │ │ ├── onboarding-models.rst │ │ │ ├── performance-cli-params.rst │ │ │ ├── vllm-user-guide-v1.rst │ │ │ ├── vllm-user-guide.rst │ │ │ ├── weights-sharding-guide.rst │ │ │ └── writing-tests.rst │ │ ├── examples/ │ │ │ └── vllm_client.py │ │ ├── index.rst │ │ ├── misc/ │ │ │ ├── index.rst │ │ │ ├── misc.txt │ │ │ └── nxdi-troubleshooting.rst │ │ ├── models/ │ │ │ ├── index.rst │ │ │ ├── llama3/ │ │ │ │ ├── data/ │ │ │ │ │ └── card_llama33_70b.yml │ │ │ │ └── llama_33_70b.rst │ │ │ ├── models.txt │ │ │ └── qwen3/ │ │ │ ├── data/ │ │ │ │ └── card_qwen3_moe_235b.yml │ │ │ └── qwen3_moe_235b.rst │ │ ├── neuron-inference-overview.rst │ │ ├── nxdi-setup.rst │ │ ├── overview-index.rst │ │ ├── setup.txt │ │ ├── tutorials/ │ │ │ ├── disaggregated-inference-tutorial-1p1d.rst │ │ │ ├── disaggregated-inference-tutorial.rst │ │ │ ├── flux-inference-tutorial.ipynb │ │ │ ├── flux-inpainting-inference-tutorial.ipynb │ │ │ ├── generating-results-with-performance-cli.ipynb │ │ │ ├── index.rst │ │ │ ├── llama4-tutorial-v0.ipynb │ │ │ ├── llama4-tutorial.ipynb │ │ │ ├── llama405b_perf_comparison.csv │ │ │ ├── llama70b_apc_perf_comparison.csv │ │ │ ├── llama70b_perf_comparison.csv │ │ │ ├── modules_to_not_convert.json │ │ │ ├── pixtral-tutorial.ipynb │ │ │ ├── qwen2-vl-tutorial.ipynb │ │ │ ├── qwen3-moe-tutorial.ipynb │ │ │ ├── qwen3-vl-tutorial.ipynb │ │ │ ├── sd-inference-tutorial.rst │ │ │ ├── trn1-llama3.1-70b-instruct-accuracy-eval-tutorial.ipynb │ │ │ ├── trn2-llama3.1-405b-speculative-tutorial.rst │ │ │ ├── trn2-llama3.1-405b-tutorial.rst │ │ │ ├── trn2-llama3.1-8b-multi-lora-tutorial.ipynb │ │ │ ├── trn2-llama3.3-70b-apc-tutorial.ipynb │ │ │ ├── trn2-llama3.3-70b-dp-tutorial.ipynb │ │ │ ├── trn2-llama3.3-70b-fp8.rst │ │ │ ├── trn2-llama3.3-70b-tutorial.rst │ │ │ └── trn3-gpt-oss-120b-tutorial.rst │ │ └── vllm/ │ │ ├── index.rst │ │ ├── quickstart-vllm-offline-serving.rst │ │ └── quickstart-vllm-online-serving.rst │ ├── nxd-training/ │ │ ├── api-guide.txt │ │ ├── api-reference-guide.rst │ │ ├── app_notes/ │ │ │ ├── nxd-training-amr-appnote.rst │ │ │ ├── nxd-training-cp-appnote.rst │ │ │ ├── nxd-training-pp-appnote.rst │ │ │ └── nxd-training-tp-appnote.rst │ │ ├── app_notes.rst │ │ ├── app_notes.txt │ │ ├── developer-guide.rst │ │ ├── developer_guides/ │ │ │ ├── cpu_mode_developer_guide.rst │ │ │ ├── dev-guide.txt │ │ │ ├── index.rst │ │ │ ├── migration_nemo_nxdt.rst │ │ │ ├── migration_nnm_nxdt.rst │ │ │ ├── nemo_nxdt_mapping.csv │ │ │ ├── new_dataloader_guide.rst │ │ │ ├── new_model_guide.rst │ │ │ ├── nnm_nxdt_mapping.csv │ │ │ └── optimizer_lr_scheduler_flow.rst │ │ ├── general/ │ │ │ ├── config_overview.rst │ │ │ ├── features.rst │ │ │ ├── installation_guide.rst │ │ │ ├── known-issues.txt │ │ │ └── known_issues.rst │ │ ├── index.rst │ │ ├── misc.rst │ │ ├── misc.txt │ │ ├── overview.rst │ │ ├── overview.txt │ │ ├── setup.txt │ │ └── tutorials/ │ │ ├── checkpoint_conversion.rst │ │ ├── hf_llama3_70B_pretraining.rst │ │ ├── hf_llama3_8B_DPO_ORPO.rst │ │ ├── hf_llama3_8B_SFT.rst │ │ ├── hf_llama3_8B_SFT_LORA.rst │ │ ├── hf_llama3_8B_pretraining.rst │ │ ├── index.rst │ │ ├── megatron_gpt_pretraining.rst │ │ └── tutorials.txt │ └── transformers-neuronx/ │ └── index.rst ├── llms.txt ├── neuron-customops/ │ ├── api-reference-guide/ │ │ ├── api-reference-guide.rst │ │ └── custom-ops-ref-guide.rst │ ├── customops-intro.txt │ ├── index.rst │ ├── misc-customops.rst │ ├── programming-guide/ │ │ ├── custom-c++-operators-devguide.rst │ │ └── programming-guide.rst │ └── tutorials/ │ ├── customop-mlp-perf-opt.rst │ ├── customop-mlp-training.rst │ ├── tutorial_source_code/ │ │ ├── custom_c_mlp_training/ │ │ │ └── custom_c_mlp_training_code.sh │ │ └── custom_c_perf_optimization/ │ │ └── custom_c_perf_optimization_code.sh │ └── tutorials.rst ├── neuron-runtime/ │ ├── about/ │ │ ├── collectives.rst │ │ ├── core-dump.rst │ │ └── index.rst │ ├── api/ │ │ ├── debug-stream-api.rst │ │ ├── index.rst │ │ ├── ndebug_stream.rst │ │ ├── ndl.rst │ │ ├── nec.rst │ │ ├── neuron_driver_shared.rst │ │ ├── neuron_driver_shared_tensor_batch_op.rst │ │ ├── neuron_ds.rst │ │ ├── nrt-async-api-best-practices.rst │ │ ├── nrt-async-api-examples.rst │ │ ├── nrt-async-api-overview.rst │ │ ├── nrt.rst │ │ ├── nrt_async.rst │ │ ├── nrt_async_sendrecv.rst │ │ ├── nrt_experimental.rst │ │ ├── nrt_profile.rst │ │ ├── nrt_status.rst │ │ ├── nrt_sys_trace.rst │ │ └── nrt_version.rst │ ├── configuration-guide.rst │ ├── explore/ │ │ ├── compute-comm-overlap.rst │ │ ├── core-dump-deep-dive.rst │ │ ├── device-memory.rst │ │ ├── direct-hbm-tensor-alloc.rst │ │ ├── index.rst │ │ ├── internode-collective-comm.rst │ │ ├── intranode-collective-comm.rst │ │ ├── runtime-performance-tips.rst │ │ └── work-with-neff-files.rst │ ├── faq.rst │ ├── index.rst │ ├── nrt-configurable-parameters.rst │ ├── nrt-developer-guide.rst │ ├── nrt-troubleshoot.rst │ └── rn.rst ├── nki/ │ ├── _ext/ │ │ └── nki_directives.py │ ├── _templates/ │ │ ├── nki-custom-class-attr-only-template.rst │ │ └── nki-custom-class-template.rst │ ├── api/ │ │ ├── index.rst │ │ ├── nki/ │ │ │ ├── __init__.py │ │ │ ├── collectives/ │ │ │ │ └── __init__.py │ │ │ ├── isa/ │ │ │ │ └── __init__.py │ │ │ └── language/ │ │ │ └── __init__.py │ │ ├── nki.api.shared.rst │ │ ├── nki.collectives.rst │ │ ├── nki.isa.rst │ │ ├── nki.isa.rst.bak │ │ ├── nki.language.rst │ │ ├── nki.language.tile_size.rst │ │ ├── nki.rst │ │ └── nki.simulate.rst │ ├── deep-dives/ │ │ ├── index.rst │ │ ├── mxfp-matmul.rst │ │ ├── nki-aps.rst │ │ ├── nki-compiler.rst │ │ ├── nki-dge.rst │ │ ├── nki-dma-bandwidth-guide.rst │ │ ├── nki-dynamic-loops.rst │ │ ├── nki_perf_guide.rst │ │ └── src/ │ │ └── mxfp-matmul/ │ │ ├── mx_cpu_utils.py │ │ ├── mx_kernel_utils.py │ │ ├── mx_kernels.py │ │ └── mx_toplevel.py │ ├── examples/ │ │ ├── average_pool2d/ │ │ │ ├── average_pool2d_jax.py │ │ │ ├── average_pool2d_nki_kernels.py │ │ │ └── average_pool2d_torch.py │ │ ├── fused_mamba/ │ │ │ ├── mamba_nki_kernels.py │ │ │ └── mamba_torch.py │ │ ├── getting_started_baremetal.py │ │ ├── getting_started_jax.py │ │ ├── getting_started_torch.py │ │ ├── index-case-1.py │ │ ├── index-case-3.py │ │ ├── layout-dynamic-loop.py │ │ ├── layout-loop.py │ │ ├── layout-pass.py │ │ ├── layout-violation.py │ │ ├── matrix_multiplication/ │ │ │ ├── matrix_multiplication_nki_kernels.py │ │ │ └── matrix_multiplication_torch.py │ │ ├── simulate/ │ │ │ └── nki_simulate_example.py │ │ ├── tensor_addition/ │ │ │ └── tensor_addition_nki_kernels.py │ │ └── transpose2d/ │ │ ├── transpose2d_jax.py │ │ ├── transpose2d_nki_kernels.py │ │ └── transpose2d_torch.py │ ├── get-started/ │ │ ├── about/ │ │ │ ├── data-representation-overview.rst │ │ │ ├── index.rst │ │ │ ├── indexing-overview.rst │ │ │ ├── lnc.rst │ │ │ ├── memory-hierarchy-overview.rst │ │ │ ├── nki-dma-overview.rst │ │ │ └── tiling-overview.rst │ │ ├── index.rst │ │ ├── nki-language-guide.rst │ │ ├── quickstart-implement-run-kernel.rst │ │ └── setup-env.rst │ ├── guides/ │ │ ├── architecture/ │ │ │ ├── index.rst │ │ │ ├── trainium2_arch.rst │ │ │ ├── trainium3_arch.rst │ │ │ └── trainium_inferentia2_arch.rst │ │ ├── framework_custom_op.rst │ │ ├── how-to-scheduling-apis.rst │ │ ├── index.rst │ │ ├── nki_simulator.rst │ │ ├── tutorials/ │ │ │ ├── average_pool2d.rst │ │ │ ├── fused_mamba.rst │ │ │ ├── index.rst │ │ │ ├── kernel-optimization.rst │ │ │ ├── matrix_multiplication.rst │ │ │ └── transpose2d.rst │ │ └── use-neuron-profile.rst │ ├── index.rst │ ├── library/ │ │ ├── about/ │ │ │ └── index.rst │ │ ├── api/ │ │ │ ├── attention-block-tkg.rst │ │ │ ├── attention-cte.rst │ │ │ ├── attention-tkg.rst │ │ │ ├── blockwise-mm-backward.rst │ │ │ ├── conv1d.rst │ │ │ ├── cross-entropy.rst │ │ │ ├── cumsum.rst │ │ │ ├── depthwise-conv1d.rst │ │ │ ├── dynamic-elementwise-add.rst │ │ │ ├── fg-allgather.rst │ │ │ ├── fgcc.rst │ │ │ ├── find-nonzero-indices.rst │ │ │ ├── index.rst │ │ │ ├── mlp.rst │ │ │ ├── moe-cte.rst │ │ │ ├── moe-tkg.rst │ │ │ ├── output-projection-cte.rst │ │ │ ├── output-projection-tkg.rst │ │ │ ├── qkv.rst │ │ │ ├── rmsnorm-quant.rst │ │ │ ├── rope.rst │ │ │ ├── router-topk.rst │ │ │ ├── sb2sb-allgather.rst │ │ │ ├── topk-reduce.rst │ │ │ └── transformer-tkg.rst │ │ ├── index.rst │ │ ├── kernel-utils/ │ │ │ ├── allocator.rst │ │ │ ├── index.rst │ │ │ └── tensor-view.rst │ │ └── specs/ │ │ ├── design-rmsnorm-quant.rst │ │ └── index.rst │ ├── migration/ │ │ ├── index.rst │ │ ├── nki-0-3-0-update-guide.rst │ │ ├── nki-beta2-migration-guide.rst │ │ └── nki_block_dimension_migration_guide.rst │ ├── nki_faq.rst │ ├── scripts/ │ │ ├── markdown2rst.py │ │ └── requirements.txt │ └── test/ │ ├── test_nki_isa_activation.py │ ├── test_nki_isa_affine_select.py │ ├── test_nki_isa_bn_stats.py │ ├── test_nki_isa_copypredicated.py │ ├── test_nki_isa_dma_copy.py │ ├── test_nki_isa_dma_transpose.py │ ├── test_nki_isa_dropout.py │ ├── test_nki_isa_iota.py │ ├── test_nki_isa_local_gather.py │ ├── test_nki_isa_max8.py │ ├── test_nki_isa_memset.py │ ├── test_nki_isa_nc_find_index8.py │ ├── test_nki_isa_nc_match_replace8.py │ ├── test_nki_isa_nc_matmul.py │ ├── test_nki_isa_nc_stream_shuffle.py │ ├── test_nki_isa_nc_transpose.py │ ├── test_nki_isa_partition_reduce.py │ ├── test_nki_isa_range_select.py │ ├── test_nki_isa_reciprocal.py │ ├── test_nki_isa_reduce.py │ ├── test_nki_isa_select_reduce.py │ ├── test_nki_isa_sequence_bounds.py │ ├── test_nki_isa_tensor_copy.py │ ├── test_nki_isa_tensor_scalar.py │ ├── test_nki_isa_tensor_scalar_cumulative.py │ ├── test_nki_isa_tensor_tensor.py │ ├── test_nki_isa_tensor_tensor_scan.py │ ├── test_nki_mask.py │ ├── test_nki_memory_semantics.py │ ├── test_nki_nl_add.py │ ├── test_nki_nl_atomic_rmw.py │ ├── test_nki_nl_broadcast.py │ ├── test_nki_nl_dslice.py │ ├── test_nki_nl_gather_flattened.py │ ├── test_nki_nl_load_store.py │ ├── test_nki_nl_load_store_indirect.py │ ├── test_nki_nl_load_transpose2d.py │ ├── test_nki_nl_mgrid.py │ ├── test_nki_simulate_kernel.py │ ├── test_nki_spmd_grid.py │ ├── test_psum_modulo_alloc.py │ └── test_sbuf_modulo_alloc.py ├── release-notes/ │ ├── 2.29.0.rst │ ├── archive/ │ │ ├── customcxxps/ │ │ │ ├── gpsimd-customop-lib.rst │ │ │ └── gpsimd-tools.rst │ │ ├── index.rst │ │ ├── libneuronxla.rst │ │ ├── mxnet-neuron.rst │ │ ├── nemo/ │ │ │ ├── index.rst │ │ │ └── neuronx-nemo.rst │ │ ├── neuron-cc/ │ │ │ ├── neuron-cc-ops/ │ │ │ │ ├── index.rst │ │ │ │ ├── neuron-cc-ops-mxnet.rst │ │ │ │ ├── neuron-cc-ops-pytorch.rst │ │ │ │ ├── neuron-cc-ops-tensorflow.rst │ │ │ │ └── neuron-cc-ops-xla.rst │ │ │ └── neuron-cc.rst │ │ ├── neuron1/ │ │ │ ├── _legacy-labels.rst │ │ │ ├── neuronrelease/ │ │ │ │ └── previous-content.rst │ │ │ └── prev/ │ │ │ ├── content.rst │ │ │ └── rn.rst │ │ ├── tensorboard-neuron.rst │ │ ├── tensorflow/ │ │ │ ├── tensorflow-modelserver-neuron/ │ │ │ │ ├── tensorflow-modelserver-neuron-v2.rst │ │ │ │ ├── tensorflow-modelserver-neuron.rst │ │ │ │ └── tensorflow-modelserver-neuronx.rst │ │ │ ├── tensorflow-neuron/ │ │ │ │ ├── tensorflow-neuron-v2.rst │ │ │ │ └── tensorflow-neuron.rst │ │ │ └── tensorflow-neuronx/ │ │ │ └── tensorflow-neuronx.rst │ │ └── torch-neuron.rst │ ├── components/ │ │ ├── compiler.rst │ │ ├── containers.rst │ │ ├── dev-tools.rst │ │ ├── dlamis.rst │ │ ├── index.rst │ │ ├── jax.rst │ │ ├── nki-lib.rst │ │ ├── nki.rst │ │ ├── nxd-core.rst │ │ ├── nxd-inference.rst │ │ ├── nxd-training.rst │ │ ├── pytorch.rst │ │ └── runtime.rst │ ├── documentation/ │ │ └── neuron-documentation.rst │ ├── index.rst │ ├── prev/ │ │ ├── 2.25.0/ │ │ │ ├── compiler.rst │ │ │ ├── containers.rst │ │ │ ├── dlami.rst │ │ │ ├── docs-and-samples.rst │ │ │ ├── index.rst │ │ │ ├── nx-jax.rst │ │ │ ├── nx-pytorch.rst │ │ │ ├── nxd-core.rst │ │ │ ├── nxd-inference.rst │ │ │ ├── nxd-training.rst │ │ │ ├── runtime.rst │ │ │ └── tools.rst │ │ ├── 2.26.0/ │ │ │ ├── containers.rst │ │ │ ├── dlami.rst │ │ │ ├── index.rst │ │ │ ├── nki.rst │ │ │ ├── nx-jax.rst │ │ │ ├── nx-pytorch.rst │ │ │ ├── nxd-core.rst │ │ │ ├── nxd-inference.rst │ │ │ ├── runtime.rst │ │ │ └── tools.rst │ │ ├── 2.26.1.rst │ │ ├── 2.27.0/ │ │ │ ├── compiler.rst │ │ │ ├── containers.rst │ │ │ ├── dlami.rst │ │ │ ├── index.rst │ │ │ ├── nki-lib.rst │ │ │ ├── nki.rst │ │ │ ├── nx-pytorch.rst │ │ │ ├── nxd-inference.rst │ │ │ ├── runtime.rst │ │ │ └── tools.rst │ │ ├── 2.27.1.rst │ │ ├── 2.28.0.rst │ │ ├── 2.28.1.rst │ │ ├── content.rst │ │ └── rn.rst │ └── releasecontent.rst ├── requirements-python310.txt ├── requirements-python38.txt ├── requirements.txt ├── setup/ │ ├── index.rst │ ├── index.txt-back │ ├── install-templates/ │ │ ├── al2-python.rst │ │ ├── inf1/ │ │ │ ├── compile_mode.rst │ │ │ ├── deploy_mode.rst │ │ │ ├── develop_mode.rst │ │ │ ├── dlami-enable-neuron-mxnet.rst │ │ │ ├── dlami-enable-neuron-pytorch.rst │ │ │ ├── launch-inf1-ami.rst │ │ │ ├── launch-inf1-dlami-aws-cli.rst │ │ │ ├── launch-inf1-dlami.rst │ │ │ ├── neuron-pip-install.rst │ │ │ ├── neuron-pip-setup.rst │ │ │ ├── note-setup-cntr.rst │ │ │ ├── note-setup-general.rst │ │ │ ├── note-setup-libnrt-warning.rst │ │ │ └── tensorboard-plugin-neuron-pip-install.rst │ │ ├── inf2/ │ │ │ ├── dlami-enable-neuron-pytorch.rst │ │ │ ├── launch-inf2-dlami.rst │ │ │ └── note-setup-libnrt-warning.rst │ │ ├── launch-instance.txt │ │ ├── launch-trn1-dlami.rst │ │ ├── trn1/ │ │ │ └── dlami-notes.rst │ │ └── trn1-ga-warning.txt │ ├── jax/ │ │ ├── dlami.rst │ │ ├── dlc.rst │ │ ├── index.rst │ │ └── manual.rst │ ├── jax-neuronx.rst │ ├── legacy-inf1/ │ │ ├── index.rst │ │ └── pytorch.rst │ ├── multiframework-dlami.rst │ ├── mxnet-neuron.rst │ ├── notebook/ │ │ ├── running-jupyter-notebook-as-script.rst │ │ └── setup-jupyter-notebook-steps-troubleshooting.rst │ ├── pytorch/ │ │ ├── dlami.rst │ │ ├── dlc.rst │ │ ├── index.rst │ │ ├── manual.rst │ │ ├── update-dlami.rst │ │ ├── update-dlc.rst │ │ └── update-manual.rst │ ├── setup-rocky-linux-9.rst │ ├── setup-troubleshooting.rst │ ├── torch-neuron-ubuntu20.rst │ ├── torch-neuron.rst │ ├── torch-neuronx.rst │ └── troubleshooting.rst ├── src/ │ ├── benchmark/ │ │ ├── helper_scripts/ │ │ │ ├── llmperf_dp.patch │ │ │ ├── llmperf_reasoning.patch │ │ │ └── neuron_perf.patch │ │ └── tensorflow/ │ │ ├── distilbert-base-uncased-finetuned-sst-2-english_benchmark.py │ │ └── distilbert-base-uncased-finetuned-sst-2-english_compile.py │ ├── examples/ │ │ ├── mxnet/ │ │ │ ├── README.md │ │ │ ├── data_parallel/ │ │ │ │ ├── benchmark_utils.py │ │ │ │ ├── data_parallel_tutorial.ipynb │ │ │ │ └── parallel.py │ │ │ ├── mxnet-gluon-tutorial.ipynb │ │ │ ├── resnet50/ │ │ │ │ └── resnet50.ipynb │ │ │ └── resnet50_neuroncore_groups.ipynb │ │ ├── neuron-monitor/ │ │ │ └── neuron-monitor-grafana.json │ │ ├── pytorch/ │ │ │ ├── bert_tutorial/ │ │ │ │ ├── README.md │ │ │ │ ├── THIRD │ │ │ │ ├── THIRD PARTY LICENSE.txt │ │ │ │ ├── bert_benchmark_utils.py │ │ │ │ ├── glue_mrpc_dev.tsv │ │ │ │ ├── parallel.py │ │ │ │ ├── tutorial_pretrained_bert.ipynb │ │ │ │ └── tutorial_pretrained_bert_shared_weights.ipynb │ │ │ ├── byoc_sm_bert_tutorial/ │ │ │ │ ├── code/ │ │ │ │ │ └── inference.py │ │ │ │ ├── container/ │ │ │ │ │ └── Dockerfile │ │ │ │ └── sagemaker_container_neuron.ipynb │ │ │ ├── libtorch_demo/ │ │ │ │ ├── bert_neuronx/ │ │ │ │ │ ├── compile.py │ │ │ │ │ └── detect_instance.py │ │ │ │ ├── clean.sh │ │ │ │ ├── example_app/ │ │ │ │ │ ├── README.txt │ │ │ │ │ ├── build.sh │ │ │ │ │ ├── core_count.hpp │ │ │ │ │ ├── example_app.cpp │ │ │ │ │ ├── utils.cpp │ │ │ │ │ └── utils.hpp │ │ │ │ ├── neuron.patch │ │ │ │ ├── run_tests.sh │ │ │ │ ├── setup.sh │ │ │ │ ├── tokenizers_binding/ │ │ │ │ │ ├── build.sh │ │ │ │ │ ├── remote_rust_tokenizer.h │ │ │ │ │ ├── run.sh │ │ │ │ │ ├── run_python.sh │ │ │ │ │ ├── tokenizer_test │ │ │ │ │ ├── tokenizer_test.cpp │ │ │ │ │ └── tokenizer_test.py │ │ │ │ └── trace_bert_neuron.py │ │ │ ├── mnist_mlp/ │ │ │ │ ├── train_monitor.py │ │ │ │ └── train_tb.py │ │ │ ├── neuronx_distributed/ │ │ │ │ └── t5-inference/ │ │ │ │ ├── t5-inference-tutorial.ipynb │ │ │ │ ├── t5_model_layers.py │ │ │ │ ├── t5_models.py │ │ │ │ └── wrapper.py │ │ │ ├── pipeline_tutorial/ │ │ │ │ └── neuroncore_pipeline_pytorch.ipynb │ │ │ ├── resnet50.ipynb │ │ │ ├── resnet50_partition.ipynb │ │ │ ├── torch-neuronx/ │ │ │ │ ├── bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb │ │ │ │ ├── resnet50-inference-on-trn1-tutorial.ipynb │ │ │ │ └── t5-inference-tutorial.ipynb │ │ │ ├── torchserve/ │ │ │ │ ├── benchmark_bert.py │ │ │ │ ├── config.json │ │ │ │ ├── handler_bert.py │ │ │ │ ├── handler_bert_neuronx.py │ │ │ │ ├── infer_bert.py │ │ │ │ ├── torchserve.config │ │ │ │ ├── trace_bert_neuron.py │ │ │ │ └── trace_bert_neuronx.py │ │ │ ├── transformers-marianmt.ipynb │ │ │ └── yolo_v4.ipynb │ │ └── tensorflow/ │ │ ├── bert_demo/ │ │ │ ├── LICENSE │ │ │ ├── README.md │ │ │ ├── bert_client.py │ │ │ ├── bert_model.py │ │ │ ├── bert_model_server.py │ │ │ ├── bert_no_model.py │ │ │ ├── bert_server.py │ │ │ ├── download_mrpc_data.py │ │ │ ├── glue_mrpc_dev.tsv │ │ │ ├── latency_printer.py │ │ │ ├── mrpc.proto │ │ │ ├── mrpc_feature.py │ │ │ ├── mrpc_pb2.py │ │ │ ├── mrpc_pb2_grpc.py │ │ │ ├── protoc.sh │ │ │ ├── setup.py │ │ │ ├── tokenization.py │ │ │ ├── tune_save.sh │ │ │ └── uncased_L-24_H-1024_A-16.vocab.txt │ │ ├── huggingface_bert/ │ │ │ └── huggingface_bert.ipynb │ │ ├── k8s_bert_demo/ │ │ │ ├── Dockerfile.tfserving_example │ │ │ ├── README.md │ │ │ ├── bert_client.py │ │ │ └── bert_service.yml │ │ ├── keras_resnet50/ │ │ │ ├── LICENSE │ │ │ ├── README.md │ │ │ ├── fp32tofp16.py │ │ │ ├── full_sweep │ │ │ ├── gen_resnet50_keras.py │ │ │ ├── infer_resnet50_keras.py │ │ │ ├── infer_resnet50_keras_loadtest.py │ │ │ ├── keras_resnet50.ipynb │ │ │ ├── optimize_for_inference.py │ │ │ ├── pb2sm_compile.py │ │ │ └── run_all │ │ ├── openpose_demo/ │ │ │ └── openpose.ipynb │ │ ├── ssd300_demo/ │ │ │ ├── README.md │ │ │ ├── ssd300_detection.py │ │ │ ├── ssd300_evaluation.py │ │ │ ├── ssd300_evaluation_client.py │ │ │ └── ssd300_model.py │ │ ├── tensorflow-neuronx/ │ │ │ └── tfneuronx-roberta-base-tutorial.ipynb │ │ ├── tensorflow_resnet50/ │ │ │ └── resnet50.ipynb │ │ ├── tensorflow_serving_tutorial.rst │ │ ├── yolo_v3_demo/ │ │ │ ├── yolo_v3.ipynb │ │ │ └── yolo_v3_coco_saved_model.py │ │ └── yolo_v4_demo/ │ │ ├── README.md │ │ ├── evaluate.ipynb │ │ └── yolo_v4_coco_saved_model.py │ ├── helperscripts/ │ │ ├── installationScripts/ │ │ │ └── python_instructions.txt │ │ ├── n2-helper.py │ │ ├── n2-manifest.json │ │ ├── neuron-releases-manifest.json │ │ ├── neuron-setup-example.py │ │ ├── neuronsetuphelper.py │ │ └── release-manifest-def.py │ ├── k8/ │ │ ├── bert_service.yml │ │ ├── k8s-neuron-device-plugin-rbac.yml │ │ ├── k8s-neuron-device-plugin.yml │ │ ├── k8s-neuron-monitor-daemonset.yml │ │ ├── k8s-neuron-scheduler-configmap.yml │ │ ├── k8s-neuron-scheduler-eks.yml │ │ ├── k8s-neuron-scheduler.yml │ │ ├── k8s-ultraserver-init-script.sh │ │ ├── my-scheduler.yml │ │ └── neuron-problem-detector/ │ │ ├── k8s-neuron-problem-detector-and-recovery-config.yml │ │ ├── k8s-neuron-problem-detector-and-recovery-rbac.yml │ │ └── k8s-neuron-problem-detector-and-recovery.yml │ ├── libnrt/ │ │ ├── README.md │ │ └── include/ │ │ ├── ndl/ │ │ │ ├── ndl.h │ │ │ ├── neuron_driver_shared.h │ │ │ └── neuron_driver_shared_tensor_batch_op.h │ │ └── nrt/ │ │ ├── ndebug_stream.h │ │ ├── nds/ │ │ │ └── neuron_ds.h │ │ ├── nec.h │ │ ├── nrt.h │ │ ├── nrt_async.h │ │ ├── nrt_async_sendrecv.h │ │ ├── nrt_experimental.h │ │ ├── nrt_profile.h │ │ ├── nrt_status.h │ │ ├── nrt_sys_trace.h │ │ └── nrt_version.h │ ├── neuron-gatherinfo/ │ │ ├── LICENSE │ │ ├── clear_params_tfpb.py │ │ ├── mx_neuron_check_model.py │ │ ├── neuron-gatherinfo.py │ │ └── tf_neuron_check_model.py │ └── neuronperf/ │ ├── LICENSE │ ├── README.md │ ├── build.sh │ ├── conf.py │ ├── model_neuron_b1.csv │ ├── pyproject.toml │ ├── src/ │ │ └── neuronperf/ │ │ ├── __init__.py │ │ ├── __version__.py │ │ ├── benchmarking.py │ │ ├── compile_constants.py │ │ ├── cpu/ │ │ │ ├── __init__.py │ │ │ └── cpu.py │ │ ├── logging.py │ │ ├── model_index.py │ │ ├── mxnet/ │ │ │ ├── __init__.py │ │ │ └── mxnet.py │ │ ├── py.typed │ │ ├── reporting.py │ │ ├── scripts/ │ │ │ ├── __init__.py │ │ │ └── run_benchmark_file.py │ │ ├── tensorflow/ │ │ │ ├── __init__.py │ │ │ └── tensorflow.py │ │ ├── timing.py │ │ └── torch/ │ │ ├── __init__.py │ │ └── torch.py │ └── test/ │ └── test_neuronperf.py ├── static/ │ ├── google673a8c4fbaa024d8.html │ ├── robots.txt │ └── sitemap1.xml └── tools/ ├── index.rst ├── neuron-explorer/ │ ├── get-started.rst │ ├── how-to-link-view-source-code.rst │ ├── how-to-profile-workload.rst │ ├── index.rst │ ├── migration-faq.rst │ ├── overview-ai-recommendations.rst │ ├── overview-database-viewer.rst │ ├── overview-device-profiles.rst │ ├── overview-hierarchy-view.rst │ ├── overview-memory-viewer.rst │ ├── overview-summary-page.rst │ ├── overview-system-profiles.rst │ ├── overview-tensor-viewer.rst │ └── view-perfetto.rst ├── neuron-sys-tools/ │ ├── index.rst │ ├── nccom-test.rst │ ├── neuron-ls.rst │ ├── neuron-monitor-user-guide.rst │ ├── neuron-sysfs-user-guide.rst │ └── neuron-top-user-guide.rst ├── profiler/ │ ├── neuron-profile-user-guide.rst │ └── neuron-profiler-2-0-beta-user-guide.rst ├── tensorboard/ │ ├── getting-started-tensorboard-neuronx-plugin.rst │ └── index.rst ├── third-party-solutions.rst └── tutorials/ ├── index.rst ├── performance-profiling-vllm.rst ├── torch-neuronx-profiling-with-tb.rst ├── tutorial-neuron-monitor-mnist.rst └── tutorial-tensorboard-scalars-mnist.rst ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/ISSUE_TEMPLATE/bug-report.yml ================================================ --- name: "🐛 Bug Report" description: Report a bug title: "(short issue description)" labels: [bug, needs-triage] assignees: [] body: - type: textarea id: description attributes: label: Describe the bug description: What is the problem? Provide a clear description of your issue and the steps you took that produced it. validations: required: true - type: textarea id: modelname attributes: label: Model Name description: Provide Model Name validations: required: true - type: textarea id: workloadtype attributes: label: Describe the workload type description: Note the type of workload (such as Inference or Training) and any specific details about the workload configuration. validations: required: true - type: textarea id: instancetype attributes: label: Instance Type description: | Provide the AWS EC2 instance type you used to run the workload (such as `inf2.xlarge`, `trn1.32xlarge`, `trn2.48xlarge` etc.) validations: required: true - type: textarea id: release attributes: label: Release version description: | Provide the Neuron SDK release version (such as `2.25.0`) you are using, and all relevant Neuron component versions. ``` apt list --installed | grep -i -e neuron pip list | grep -i -e neuron -e torch -e transformers -e jax ``` - type: textarea id: reproduction attributes: label: Reproduction Steps description: | Provide the type of the model and links to any tutorials you may have used, as additional context. Provide a self-contained, concise snippet of code that can be used to reproduce the issue. For more complex issues provide a repo with the smallest sample that reproduces the bug. Avoid including business logic or unrelated code as it makes diagnosis more difficult. The code sample should be an SSCCE. See http://sscce.org/ for details. In short, please provide a code sample that we can copy/paste, run and reproduce. validations: required: true - type: checkboxes id: regression attributes: label: Regression Issue description: Is this as regression (did it work in a previous version and not now)? If this is a regression, provide the Neuron SDK release version where this configuration worked for you. options: - label: Select this option if this issue appears to be a regression. required: false - type: textarea id: solution attributes: label: Possible Solution description: | Suggest a fix or reason for the bug, if you know: validations: required: false - type: textarea id: context attributes: label: Logs/Context/Additional Information description: | Anything else that might be relevant for troubleshooting this bug. Providing context helps us come up with a solution that is most useful in the real world. When applicable, please provide HLOs and compiler commands. validations: required: false ================================================ FILE: .github/ISSUE_TEMPLATE/config.yml ================================================ blank_issues_enabled: false ================================================ FILE: .github/ISSUE_TEMPLATE/documentation.yml ================================================ --- name: "📕 Documentation Issue" description: Report an issue in the documentation and Developer Guide title: "(short issue description)" labels: [documentation, needs-triage] assignees: [] body: - type: textarea id: description attributes: label: Describe the issue description: A clear and concise description of the issue. validations: required: true - type: textarea id: links attributes: label: Links description: | Include links to affected documentation page(s). validations: required: true ================================================ FILE: .github/ISSUE_TEMPLATE/feature-request.yml ================================================ --- name: 🚀 Feature Request description: Suggest an idea for this project title: "(short issue description)" labels: [feature-request, needs-triage] assignees: [] body: - type: textarea id: description attributes: label: Describe the feature description: A clear and concise description of the feature you are proposing. validations: required: true - type: textarea id: use-case attributes: label: Use Case description: | Why do you need this feature? validations: required: true - type: textarea id: solution attributes: label: Proposed Solution description: | Provide detailed suggestions or requirements for this proposed feature. If you have them, include any reference implementation details (or even links to prototypes). validations: required: false - type: textarea id: other attributes: label: Other Information description: | Any additional details or information you can provide, including links to related content or similar issues. validations: required: false - type: checkboxes id: ack attributes: label: Acknowledgements options: - label: I may be able to implement this feature request required: false ================================================ FILE: .github/pull_request_template.md ================================================ **IMPORTANT!** _If this is a documentation PR for a specific release, this PR must go the corresponding release branch_ (`release-X.XX.X`). _If it is an "out-of-band" doc update, the PR must go to the_ `master` _branch_. ## Required PR information To expedite approvals and merges for releases, provide the following information (select the `...` button to the right at the top of your PR message to edit it): > **AWS email alias**: {_your-name_}@amazon.com >**Description**: {_What this documentation change is and why you made it. If you have a corresponding Jira ticket or content plan, link it here. The more details you provide around any decisions you made when preparing the docs, the less annoying comments you'll get preparing to release it._} > **Date this must be published by**: {_If empty, we will assume the release date for the branch you're merging into._} > **Link to ReadTheDocs staging for this branch's doc changes**: https://awsdocs-neuron-staging.readthedocs-hosted.com/en/{YOUR_BRANCH_NAME_HERE}/ > **Set the `docs-review-needed` label on the PR for tracking.** ## Before you request approvals > Run a spelling and grammar check over your prose and make the changes it suggests. VSCode has a number of extensions (cSpell, LTeX) that you can use. You can also provide the rendered HTML for (or a cut-and-paste of) your pages to an AI and have it correct your spelling, grammar, and formatting issues. If you need an advanced prompt, contact @erickson-doug. ## Approvers We require 3-4 approvers to merge for non-trivial content changes (where a "trivial" change is a typo/grammar fix or a minor update to the format syntax): 1. A senior+ engineer who will review your documentation for technical accuracy and clarity in communicating the technical concepts in your work 2. A product manager for your Neuron component area who will review it for customer relevance and product/component/feature messaging 3. The lead tech writer (@erickson-doug) who will review your work for overall doc design and quality, and perform the merge when all approvals are met 4. (For PRs with code/notebook submissions) A QA/test engineer who can run your code and confirm the results. Make sure you get a commitment from these reviewers in advance! It's hard to get good quality doc reviews in order in the 11th hour of a release. **Note**: For trivial changes, you only need @erickson-doug's approval. He will merge your content once he's confirmed the fixes on staging. ## Doc review checklist ### Engineering reviewer checklist - [ ] I've confirmed that the contributions in this PR meet the current [AWS Neuron writing guidelines](https://quip-amazon.com/m97CAO0kQFEU/Writing-for-AWS-Neuron). - [ ] I've confirmed that the documentation submitted is technically correct to the best of my knowledge. - [ ] I've confirmed that the documentation submitted has no spelling or grammar errors or use of internal jargon/terminology. - [ ] I've verified the changes render correctly on RTD (link above). - [ ] (If code is included) I've run tests to verify the contents of the change. --- ## For PRs that include code or notebook examples **MANDATORY: PR must include test run output** Provide this information for the QA reviewer in order to expedite their review. **Test run output:** Specify the release version, instance size and type, OS type and test output. **For Training tutorials:** {Convergence graph for training tutorials} {Performance metrics `average_throughput`, `latency_p50`, `latency_p99` and MFU% if available} Make sure this PR contains correct classification terms (Alpha, Beta, and Stable). If possible, provide your results or a link to them for the reviewer to check your work. ## Code example/notebook content PR checklist - [ ] (If applicable) I've automated a test to safeguard my changes from regression. - [ ] (If applicable) I've posted test collateral to prove my change was effective and not harmful. - [ ] (If applicable) I've added someone from QA to the list of reviewers. Do this if you didn't make an automated test or feel it's appropriate for another reason. - [ ] (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the pre-approved Amazon license list. See https://inside.amazon.com/en/services/legal/us/OpenSource/Pages/BlessedOpenSourceLicenses.aspx. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. ================================================ FILE: .github/stale_issue_mark_close_workflow.yml ================================================ name: Close inactive issues on: schedule: - cron: "30 1 * * *" jobs: close-issues: runs-on: ubuntu-latest permissions: issues: write pull-requests: write steps: - uses: actions/stale@v5 with: days-before-issue-stale: 30 days-before-issue-close: 14 stale-issue-label: "stale" stale-issue-message: "This issue is stale because it has been open for 30 days with no activity." close-issue-message: "This issue was closed because it has been inactive for 14 days since being marked as stale." days-before-pr-stale: -1 days-before-pr-close: -1 repo-token: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .github/workflows/acknowledge-new-issue.yml ================================================ name: Acknowledge New Issue on: issues: types: [opened] permissions: issues: write jobs: acknowledge: runs-on: ubuntu-latest steps: - name: Comment on issue uses: actions/github-script@v7 with: script: | const creator = context.payload.issue.user.login; await github.rest.issues.createComment({ owner: context.repo.owner, repo: context.repo.repo, issue_number: context.payload.issue.number, body: `Hi @${creator}, Thank you for filing the issue! We will take a look and get back to you.` }); ================================================ FILE: .github/workflows/auto-label-issues.yml ================================================ # Auto-label issues based on content keywords name: auto-label-issues on: issues: types: [opened] jobs: auto-label-issues: runs-on: ubuntu-latest permissions: issues: write steps: - name: Analyze issue content id: analyze_content uses: actions/github-script@v7 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} ISSUE_TITLE: ${{ github.event.issue.title }} ISSUE_BODY: ${{ github.event.issue.body }} with: script: | const title = process.env.ISSUE_TITLE || ''; const body = process.env.ISSUE_BODY || ''; const content = `${title} ${body}`; const labels = []; // ============================================================================= // LABEL CONFIGURATION - Easy to update dictionary // Add keywords, typos, or synonyms to the arrays below // ============================================================================= const labelConfig = { // ----- Issue Type Labels (mutually exclusive) ----- bug: { keywords: [ // Standard terms 'bug', 'error', 'crash', 'fail', 'failed', 'failure', 'failing', 'broken', 'exception', 'traceback', 'segfault', 'segmentation fault', // Synonyms 'issue', 'problem', 'defect', 'fault', 'glitch', 'malfunction', 'wrong', 'incorrect', 'unexpected', 'hang', 'hanging', 'hung', 'freeze', 'frozen', 'timeout', 'timed out', 'oom', 'out of memory', 'memory error', 'nan', 'diverge', 'diverged', // Common typos 'bugg', 'bgu', 'eror', 'errror', 'crahs', 'fial', 'brokn', 'broke' ], patterns: [/not\s*work/i, /doesn'?t\s*work/i, /won'?t\s*work/i, /can'?t\s*work/i] }, documentation: { keywords: [ // Standard terms 'doc', 'docs', 'documentation', 'readme', 'guide', 'tutorial', 'howto', 'how-to', 'how to', 'typo', 'typos', 'spelling', 'grammar', 'example', 'examples', 'sample', 'samples', 'instruction', 'instructions', 'clarify', 'clarification', 'unclear', 'confusing', 'outdated', 'out of date', 'stale', 'missing documentation', 'missing docs', 'broken link', 'dead link', '404', // Common typos 'documention', 'documenation', 'documentaion', 'tutoral', 'toturial' ], patterns: [/issue\s*on\s*page/i, /page\s*.*\.html/i] }, 'feature-request': { keywords: [ // Standard terms 'feature', 'feature request', 'feature-request', 'enhancement', 'improvement', 'implement', 'implementation', 'new feature', 'add feature', 'support for', 'add support', 'would be nice', 'would be great', 'would be helpful', 'suggestion', 'suggest', 'proposal', 'propose', 'wishlist', 'wish list', // Common typos 'feture', 'featrue', 'enchancement', 'improvment' ], patterns: [/add\s+support\s+for/i, /please\s+add/i, /would\s+be\s+(nice|great|helpful)/i] }, // ----- Hardware Labels (independent - multiple can be applied) ----- Trn1: { keywords: [ 'trn1', 'trn-1', 'trn 1', 'trn1n', 'trn1.2xlarge', 'trn1.32xlarge', 'trn1n.32xlarge', 'trainium', 'trainium1', 'trainium 1', 'trainium-1', // Common typos 'tranium', 'trainuim', 'trn-1n' ], patterns: [/trn1n?(?:\.[0-9]*xlarge)?/i, /trainium\s*1?(?!\s*2)/i] }, Trn2: { keywords: [ 'trn2', 'trn-2', 'trn 2', 'trn2.48xlarge', 'trainium2', 'trainium 2', 'trainium-2', // Common typos 'tranium2', 'trainuim2' ], patterns: [/trn2(?:\.[0-9]*xlarge)?/i, /trainium\s*2/i] }, Inf1: { keywords: [ 'inf1', 'inf-1', 'inf 1', 'inf1.xlarge', 'inf1.2xlarge', 'inf1.6xlarge', 'inf1.24xlarge', 'inferentia', 'inferentia1', 'inferentia 1', 'inferentia-1', // Common typos 'infertia', 'inferntia', 'infernita' ], patterns: [/inf1(?:\.[0-9]*xlarge)?/i, /inferentia\s*1?(?!\s*2)/i] }, Inf2: { keywords: [ 'inf2', 'inf-2', 'inf 2', 'inf2.xlarge', 'inf2.8xlarge', 'inf2.24xlarge', 'inf2.48xlarge', 'inferentia2', 'inferentia 2', 'inferentia-2', // Common typos 'infertia2', 'inferntia2', 'infernita2' ], patterns: [/inf2(?:\.[0-9]*xlarge)?/i, /inferentia\s*2/i] }, // ----- Use Case Labels (independent - both can be applied) ----- Inference: { keywords: [ // Standard terms 'inference', 'inferencing', 'predict', 'prediction', 'predictions', 'predicting', 'serving', 'serve', 'server', 'batch inference', 'real-time', 'realtime', 'endpoint', 'endpoints', // Common typos 'infernce', 'inferance', 'prediciton', 'deploymnet' ], patterns: [/infer(?:ence|ring)?/i, /predict(?:ion|ing)?/i, /deploy(?:ment|ing)?/i] }, Training: { keywords: [ // Standard terms 'training', 'train', 'trained', 'fine-tune', 'finetune', 'fine tune', 'finetuning', 'fine-tuning', 'pretrain', 'pre-train', 'pretraining', 'pre-training', 'learning', 'learn', 'gradient', 'gradients', 'backward', 'backprop', 'backpropagation', 'loss', 'convergence', 'converge', 'epoch', 'epochs', 'checkpoint', 'checkpointing', // Common typos 'trainig', 'traning', 'trainin', 'fintune', 'finetunning' ], patterns: [/train(?:ing|ed)?/i, /fine[\s-]?tun(?:e|ing)/i, /pre[\s-]?train(?:ing)?/i] } }; // ============================================================================= // MATCHING LOGIC // ============================================================================= function matchesLabel(config) { const contentLower = content.toLowerCase(); // Check keywords (case-insensitive substring match) for (const keyword of config.keywords) { if (contentLower.includes(keyword.toLowerCase())) { return true; } } // Check regex patterns for (const pattern of config.patterns) { if (pattern.test(content)) { return true; } } return false; } // Issue Type Labels - MUTUALLY EXCLUSIVE (priority: bug > documentation > feature-request) if (matchesLabel(labelConfig.bug)) { labels.push('bug'); } else if (matchesLabel(labelConfig.documentation)) { labels.push('documentation'); } else if (matchesLabel(labelConfig['feature-request'])) { labels.push('feature-request'); } // Hardware/Instance Type Labels - INDEPENDENT (multiple can be applied) if (matchesLabel(labelConfig.Trn1)) { labels.push('Trn1'); } if (matchesLabel(labelConfig.Trn2)) { labels.push('Trn2'); } if (matchesLabel(labelConfig.Inf1)) { labels.push('Inf1'); } if (matchesLabel(labelConfig.Inf2)) { labels.push('Inf2'); } // Use Case Labels - INDEPENDENT (both can be applied) if (matchesLabel(labelConfig.Inference)) { labels.push('Inference'); } if (matchesLabel(labelConfig.Training)) { labels.push('Training'); } core.setOutput('labels', labels.join(',')); core.setOutput('has_labels', labels.length > 0); - name: Apply labels to issue if: steps.analyze_content.outputs.has_labels == 'true' env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | IFS=',' read -ra LABELS <<< "${{ steps.analyze_content.outputs.labels }}" for label in "${LABELS[@]}"; do gh issue edit ${{ github.event.issue.number }} --add-label "$label" -R ${{ github.repository }} done ================================================ FILE: .gitignore ================================================ _build/ __pycache__/ .venv/ .DS_Store src/examples/pytorch/libtorch_demo.tar.gz src/neuronperf.tar.gz *-checkpoint.ipynb .idea/ .vscode/ nki/*/generated/ uncommitted/ ================================================ FILE: .readthedocs.yml ================================================ # .readthedocs.yml # Read the Docs configuration file # See https://docs.readthedocs.io/en/stable/config-file/v2.html for details # Required version: 2 # Set the version of Python and other tools you might need build: os: "ubuntu-22.04" tools: python: "3.10" # jobs: # pre_build: # - python -m sphinx -b linkcheck . _build/linkcheck # Build documentation in the docs/ directory with Sphinx sphinx: configuration: conf.py #conda #conda: # file: readthedocs-environment.yml # Build documentation with MkDocs #mkdocs: # configuration: mkdocs.yml # Optionally build your docs in additional formats such as PDF #formats: # - pdf # Optionally set the version of Python and requirements required to build your docs python: install: - requirements: requirements.txt ================================================ FILE: CODEOWNERS ================================================ # This file creates codeowners for the documentation. It will allow setting code reviewers for all Pull requests to merge to the master branch # Each line is a file pattern followed by one or more owners. # Reference guide - https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-code-owners#example-[…]ners-file # Example - These owners will be the default owners for everything in # the repo. Unless a later match takes precedence, # @global-owner1 and @global-owner2 will be requested for # review when someone opens a pull request. # * @global-owner1 @global-owner2 * @aws-maens @micwade-aws @musunita @aws-sadaf @rgrandhiamzn @eshalakhotia @jluntamazon @jeffhataws @aws-rhsoln @hannanjgaws @PrashantSaraf @aws-donkrets @aws-singhada @gsnaws @awsjoshir @sidjoshiaws @pinak-p @vikas-paliwal-aws @aarondou @mrinalks @erickson-doug @lnixaws @micwade-aws src/examples/mxnet/ @aws-rhsoln @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/mxnet-neuron/ @aws-rhsoln @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/mxnet-neuron/tutorials/ @musunita @aws-rhsoln @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia src/examples/tensorflow/ @awshaichen @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/tensorflow-neuron/ @awshaichen @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/tensorflow-neuron/tutorials/ @musunita @awshaichen @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia src/examples/pytorch/ @jluntamazon @aws-sadaf @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/pytorch-neuron/ @jluntamazon @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia neuron-guide/neuron-frameworks/pytorch-neuron/tutorials/ @musunita @jluntamazon @aws-maens @vikas-paliwal-aws @rgrandhiamzn @eshalakhotia libraries/nxd-inference/ @huntingcarlisle @lccasagrande @lipovsek-aws @erickson-doug @eshalakhotia @pinak-p @hannanjgaws @akhil-aws @ahimsh-aws @rgrandhiamzn @yahavb @FThompsonAWS @gsnaws @sidjoshiaws @jluntamazon @musunita ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing Guidelines Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional documentation, we greatly value feedback and contributions from our community. Please read through this document before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your bug report or contribution. ## Reporting Bugs/Feature Requests We welcome you to use the GitHub issue tracker to report bugs or suggest features. When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: * A reproducible test case or series of steps * The version of our code being used * Any modifications you've made relevant to the bug * Anything unusual about your environment or deployment ## Contributing Workflow (via Pull Requests) Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 1. You are working against the latest source on the *master* branch. 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. **Important**: Currently, local doc builds require a Python 3.9 environment. If you are on MacOS, you can install it from the terminal with `brew install python@3.9`. Add it to your working path with `brew link python@3.9` and confirm it works by running `python3.9 --version`. ### Docker Build If you don't have Python 3.9/3.10 or a compatible gcc toolchain, use the Docker workflow: ```bash ./build.sh build # Build Docker image (first time only) ./build.sh html # Build HTML docs to _build/html/ ./build.sh shell # Interactive shell for debugging ./build.sh clean # Remove _build/ directory ``` ### Manual Build To send us a pull request, please: 1. Clone the repository locally: ```bash git clone git@github.com:YOUR-USERNAME/private-aws-neuron-sdk-staging.git ``` 2. Install the build dependencies. This requires a Python 3.9 installation and venv: ```bash cd .. # The root folder where you have your cloned Git repos; don't run this in the repo folder but one level up or you'll have venv files in your repo folder python3.9 -m venv venv && . venv/bin/activate pip install -U pip cd private-aws-neuron-sdk-staging pip install -r requirements.txt ``` 3. Build the documentation into HTML. This command will allow you to view the rendered documentation by opening the generated `_build/html/index.html`. On first run, this will take about 15 mins. Subsequent html generations are incremental and will take less time. Run: ```bash sphinx-build -b html . _build/html ``` Or leverage the make file and run: ```bash make html ``` If this doesn't work, try this command: ```bash sphinx-build -C -b html . _build/html ``` For speedier builds in multiprocessor environments, run: ```bash sphinx-build -b html . _build/html -j auto ``` **NOTE**: If you get an error for the spelling extension, like `Extension error: Could not import extension sphinxcontrib.spelling (exception: The 'enchant' C library was not found and maybe needs to be installed. See https://pyenchant.github.io/pyenchant/install.html`, run `brew install enchant`. 4. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 5. Rebuild the documentation with `sphinx-build -b html . _build/html`. Always ensure that the docs build without errors and that your changes look correct before pushing your changes to remote. * If you encounter errors that are unclear, run the build in verbose mode with `sphinx-build -vv -b html . _build/html`. 6. Commit your changes to your branch with a clear, scoped commit messages. Bad: "fixed stuff". Good: "Updated ref IDs in all containers topics". 7. Push your changes to remote (`git push origin`) and create a PR from your branch into `master` or the standing release branch (example: `release-2.27.0`). Answer any default questions in the pull request interface. * See: [pull request guide](https://help.github.com/articles/creating-a-pull-request/)). 8. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. Updated process documentation can be found here: [Runbook: Authoring a topic for the Neuron documentation](https://quip-amazon.com/e9B9AM7Npb17/Runbook-Authoring-a-topic-for-the-Neuron-documentation). ## Updating the sitemap If you add or remove a topic, you must recreate the sitemap. To do so: 1. From a shell, `cd` to the root of this repo (`private-aws-neuron-sdk-staging`) on your local machine. 2. Run the following command: `python3 ./_utilities/create_sitemap.py`. This will generate the sitemap as `sitemap.xml` in the root folder of the repo. 3. Rename the `sitemap.xml` file to `sitemap1.xml`. 4. Move the `sitemap1.xml` file to the `/static` folder, copying over the previous version. 5. Delete the generated `sitemap.xml` file from the root (**not** from `/static`) if you did a copy instead of a move. 6. Push a PR with the updated sitemap to remote and request DougEric review/approve it. ## Finding contributions to work on Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. * Or, if you're so inclined, get on DougEric's Christmas card list by fixing broken links, formatting errors, removing stale topics, and fixing spelling/grammar errors. ## Code of Conduct This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact opensource-codeofconduct@amazon.com with any additional questions or comments. ## Security issue notifications If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. ## Licensing See the [LICENSE-DOCUMENTATION](./LICENSE-DOCUMENTATION), [LICENSE-SAMPLECODE](./LICENSE-SAMPLECODE) and [LICENSE-SUMMARY-DOCS-SAMPLES](./LICENSE-SUMMARY-DOCS-SAMPLES) files for our project's licensing. We will ask you to confirm the licensing of your contribution. We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger chan ================================================ FILE: Dockerfile ================================================ FROM python:3.10-slim RUN apt-get update && apt-get install -y --no-install-recommends \ make enchant-2 git pandoc \ && rm -rf /var/lib/apt/lists/* \ && pandoc --version COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv WORKDIR /docs COPY requirements.txt . RUN uv pip install --system -r requirements.txt --extra-index-url=https://pypi.org/simple ENTRYPOINT ["/bin/bash"] ================================================ FILE: LICENSE-DOCUMENTATION ================================================ *** Documentation: Creative Commons Attribution-ShareAlike 4.0 International Public License By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. Section 1 – Definitions. a. Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. b. Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. c. BY-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License. d. Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. e. Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. f. Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. g. License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution and ShareAlike. h. Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. i. Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. j. Licensor means the individual(s) or entity(ies) granting rights under this Public License. k. Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. l. Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. m. You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. Section 2 – Scope. a. License grant. 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: A. reproduce and Share the Licensed Material, in whole or in part; and B. produce, reproduce, and Share Adapted Material. 2. Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 3. Term. The term of this Public License is specified in Section 6(a). 4. Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 5. Downstream recipients. A. Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. B. Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. C. No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 6. No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). b. Other rights. 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 2. Patent and trademark rights are not licensed under this Public License. 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. Section 3 – License Conditions. Your exercise of the Licensed Rights is expressly made subject to the following conditions. a. Attribution. 1. If You Share the Licensed Material (including in modified form), You must: A. retain the following if it is supplied by the Licensor with the Licensed Material: i. identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); ii. a copyright notice; iii. a notice that refers to this Public License; iv. a notice that refers to the disclaimer of warranties; v. a URI or hyperlink to the Licensed Material to the extent reasonably practicable; B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. b. ShareAlike.In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply. 1. The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License. 2. You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 3. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. Section 4 – Sui Generis Database Rights. Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. Section 5 – Disclaimer of Warranties and Limitation of Liability. a. Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. b. To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. Section 6 – Term and Termination. a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 2. upon express reinstatement by the Licensor. c. For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. d. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. e. Sections 1, 5, 6, 7, and 8 survive termination of this Public License. Section 7 – Other Terms and Conditions. a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. Section 8 – Interpretation. a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. ================================================ FILE: LICENSE-SAMPLECODE ================================================ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: LICENSE-SUMMARY-DOCS-SAMPLES ================================================ *** Documentation and Sample Code: Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file. The sample code within this documentation is made available under the MIT-0 license. See the LICENSE-SAMPLECODE file. ================================================ FILE: Makefile ================================================ # Minimal makefile for Sphinx documentation # # You can set these variables from the command line, and also # from the environment for the first two. SPHINXOPTS ?= SPHINXBUILD ?= sphinx-build SOURCEDIR = $(CURDIR) BUILDDIR = _build # Put it first so that "make" without argument is like "make help". help: @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) .PHONY: help Makefile clean # Catch-all target: route all unknown targets to Sphinx using the new # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). %: Makefile @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) clean: -rm -rf $(BUILDDIR)/* ================================================ FILE: README.md ================================================ ![neuron](./images/Site-Merch_Neuron-ML-SDK_Editorial.png) # AWS Neuron ## Neuron SDK Overview AWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. With Neuron, you can develop, profile, and deploy high-performance machine learning workloads on top of accelerated EC2 instances, e.g. Inf1 and Trn1. Neuron includes a compiler, runtime driver, as well as debug and profiling utilities with a TensorBoard plugin for visualization, and is pre-integrated into popular machine learning frameworks like Pytorch, TensorFlow and MXNet, to provide a seamless machine learning acceleration workflow. ## Neuron SDK’s documentation For full documentations including user guide, Howtos and Tutorials see [Neuron SDK’s documentation](https://awsdocs-neuron.readthedocs-hosted.com/) ## Support If none of the github and online resources have an answer to your question, checkout the AWS Neuron [support forum](https://forums.aws.amazon.com/forum.jspa?forumID=355). ================================================ FILE: _backup-setup/neuron-setup/multiframework/multi-framework-ubuntu22-neuron-dlami.rst ================================================ .. _setup-ubuntu22-multi-framework-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small Get Started with Neuron on Ubuntu 22 with Neuron Multi-Framework DLAMI ====================================================================== You can quickly get started on Ubuntu 22 using the Neuron Deep Learning AMI (DLAMI). Then, start using one of the multiple frameworks or libraries that Neuron SDK supports by activating the corresponding virtual environment. Each virtual environment comes pre-installed with Neuron libraries needed for you to get started. The Neuron DLAMI supports all Neuron instances (Inf1/Inf2/Trn1/Trn1n/Trn2/Trn3) and is updated with each Neuron SDK release. To start using the latest version of the Neuron DLAMI, use the following steps: Step 1: Launch the instance using Neuron DLAMI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you open the `EC2 Console `_, select your desired AWS region and choose "Launch Instance". Under AMI selection select the "Quick Start" and "Ubuntu", choose the "Deep Learning AMI Neuron (Ubuntu 22.04)"(see screenshot below). Once you have selected the AMI, select the desired Neuron Instance(Inf1/Inf2/Trn1/Trn1n/Trn2/Trn3) , configure disk size and other criteria, launch the instance .. image:: /images/neuron-multi-framework-dlami-quick-start.png :scale: 20% :align: center .. note:: If you are looking to use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters ` to easily retrieve the latest DLAMI id. Step 2: Activate the desired virtual environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can activate one of the virtual environments depending on the library or framework you are interested in: 1. Get the desired virtual environment name for the framework/library by referring to :ref:`the Neuron DLAMI overview `. 2. Activate the virtual environment by using: :: source /opt//bin/activate After you have activated the desired virtual environment , you can try out one of the tutorials listed in the corresponding framework or library training and inference section. ================================================ FILE: _backup-setup/neuron-setup/multiframework/multi-framework-ubuntu24-neuron-dlami.rst ================================================ .. _setup-ubuntu24-multi-framework-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small Get Started with Neuron on Ubuntu 24 with Neuron Multi-Framework DLAMI ====================================================================== You can quickly get started on Ubuntu 24 using the Neuron Deep Learning AMI (DLAMI). Then, start using one of the multiple frameworks or libraries that Neuron SDK supports by activating the corresponding virtual environment. Each virtual environment comes pre-installed with Neuron libraries needed for you to get started. The Neuron DLAMI supports all Neuron instances (Inf2/Trn1/Trn1n/Trn2/Trn3) and is updated with each Neuron SDK release. To start using the latest version of the Neuron DLAMI, use the following steps: Step 1: Launch the instance using Neuron DLAMI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you open the `EC2 Console `_, select your desired AWS region and choose "Launch Instance". Under AMI selection select the "Quick Start" and "Ubuntu", choose the "Deep Learning AMI Neuron (Ubuntu 24.04)"(see screenshot below). Once you have selected the AMI, select the desired Neuron Instance(Inf2/Trn1/Trn1n/Trn2/Trn3), configure disk size and other criteria, launch the instance .. image:: /images/neuron-multi-framework-dlami-U24-quick-start.png :scale: 20% :align: center .. note:: If you are looking to use the Neuron DLAMI in your cloud automation flows , Neuron also supports :ref:`SSM parameters ` to easily retrieve the latest DLAMI id. Step 2: Activate the desired virtual environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can activate one of the virtual environments depending on the library or framework you are interested in: 1. Get the desired virtual environment name for the framework/library by referring to :ref:`the Neuron DLAMI overview `. 2. Activate the virtual environment by using: :: source /opt//bin/activate After you have activated the desired virtual environment , you can try out one of the tutorials listed in the corresponding framework or library training and inference section. ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2-base-dlami.rst ================================================ .. _setup-torch-neuron-al2-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Amazon Linux 2 with DLAMI Base ======================================================================= .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Amazon Linux 2) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-al2.txt .. include :: /archive/torch-neuron/setup/pytorch-update-al2.rst .. include :: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2-pytorch-dlami.rst ================================================ .. _setup-torch-neuron-al2-pytorch-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Amazon Linux 2 with Pytorch DLAMI ========================================================================= .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI `_ and copy the AMI name that starts with "Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Update Neuron Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=1.13.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 .. dropdown:: Get Started With Pytorch DLAMI :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 98 :end-line: 99 .. card:: Visit PyTorch Neuron(``torch-neuron``) for Inference section :link: inference-torch-neuron :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Visit PyTorch Neuron section for more :class-body: sphinx-design-class-body-small :link: neuron-pytorch :link-type: ref .. include:: /archive/torch-neuron/setup/pytorch-update-al2-dlami.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2.rst ================================================ .. _setup-torch-neuron-al2: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Amazon Linux 2 ========================================================= .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10 * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-al2.txt .. include :: /archive/torch-neuron/setup/pytorch-update-al2.rst .. include :: /archive/torch-neuron/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/amazon-linux/torch-neuron-al2023.rst ================================================ .. _setup-torch-neuron-al2023: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Amazon Linux 2023 =========================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Amazon Linux 2023 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-al2023.txt .. include:: /archive/torch-neuron/setup/pytorch-update-al2023.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-al2023.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20-base-dlami.rst ================================================ .. _setup-torch-neuron-u20-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Ubuntu 20 with DLAMI Base ================================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Ubuntu 20.04) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-u20.txt .. include:: /archive/torch-neuron/setup/pytorch-update-u20.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20-pytorch-dlami.rst ================================================ .. _setup-torch-neuron-u20-pytorch-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Ubuntu 20 with Pytorch DLAMI ===================================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI `_ and copy the AMI name that starts with "Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Update Neuron Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=1.13.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 .. dropdown:: Get Started With Pytorch DLAMI :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 101 :end-line: 102 .. card:: PyTorch Neuron(``torch-neuron``) for Inference :link: inference-torch-neuron :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Visit PyTorch Neuron section for more :class-body: sphinx-design-class-body-small :link: neuron-pytorch :link-type: ref .. include:: /archive/torch-neuron/setup/pytorch-update-u20-dlami.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu20.rst ================================================ .. _setup-torch-neuron-u20: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Ubuntu 20 ==================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Ubuntu Server 20 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-u20.txt .. include:: /archive/torch-neuron/setup/pytorch-update-u20.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuron/ubuntu/torch-neuron-ubuntu22.rst ================================================ .. _setup-torch-neuron-u22: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuron") Setup on Ubuntu 22 ===================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`setup-torch-neuron` for Inference. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Ubuntu Server 22 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-torch-neuron-u22.txt .. include:: /archive/torch-neuron/setup/pytorch-update-u22.rst .. include:: /archive/torch-neuron/setup/pytorch-install-prev-u22.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2-base-dlami.rst ================================================ .. _setup-torch-neuronx-al2-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Amazon Linux 2 with DLAMI Base ========================================================================= .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Amazon Linux 2) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 2 :end-line: 3 .. include:: /includes/setup/tab-inference-torch-neuronx-al2.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2-pytorch-dlami.rst ================================================ .. _setup-torch-neuronx-al2-dlami-pytorch: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Amazon Linux 2 with DLAMI Pytorch =========================================================================== .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI `_ and copy the AMI name that starts with "Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Update Neuron Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 .. dropdown:: Get Started With Pytorch DLAMI :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 50 :end-line: 51 .. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section :link: inference-torch-neuronx :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section :link: training-torch-neuronx :link-type: ref :class-body: sphinx-design-class-title-small .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2-dlami.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2.rst ================================================ .. _setup-torch-neuronx-al2: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Amazon Linux 2 ========================================================= .. note:: As of 2.20.0, Neuron Runtime no longer supports AL2. Upgrade to AL2023 following the :ref:`AL2 Migration guide ` .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10 * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 2 :end-line: 3 .. include:: /includes/setup/tab-inference-torch-neuronx-al2.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/amazon-linux/torch-neuronx-al2023.rst ================================================ .. _setup-torch-neuronx-al2023: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Amazon Linux 2023 ============================================================ .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Select Amazon Linux 2023 AMI * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 239 :end-line: 240 .. include:: /includes/setup/tab-inference-torch-neuronx-al2023.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20-base-dlami.rst ================================================ .. _setup-torch-neuronx-ubuntu20-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 20 with DLAMI Base ==================================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Ubuntu 20.04) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 5 :end-line: 6 .. include:: /includes/setup/tab-inference-torch-neuronx-u20.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20-pytorch-dlami.rst ================================================ .. _setup-torch-neuronx-ubuntu20-dlami-pytorch: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 20 with DLAMI Pytorch ====================================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Check for the latest version of the `DLAMI Neuron Pytorch 1.13 AMI `_ and copy the AMI name that starts with "Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see an exact matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Update Neuron Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=driver_runtime_tools --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 .. dropdown:: Get Started With Pytorch DLAMI :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 53 :end-line: 54 .. card:: Visit PyTorch Neuron(``torch-neuronx``) for Inference section :link: inference-torch-neuronx :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Visit PyTorch Neuron(``torch-neuronx``) for Training section :link: training-torch-neuronx :link-type: ref :class-body: sphinx-design-class-title-small .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20-dlami.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20.rst ================================================ .. _setup-torch-neuronx-ubuntu20: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :width: 100% :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 20 =================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. include:: /setup/install-templates/trn1-ga-warning.txt .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Select Ubuntu Server 20 AMI * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 5 :end-line: 6 .. include:: /includes/setup/tab-inference-torch-neuronx-u20.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u20.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu22.rst ================================================ .. _setup-torch-neuronx-ubuntu22: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 22 ===================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. include:: /setup/install-templates/trn1-ga-warning.txt .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Select Ubuntu Server 22 AMI * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 242 :end-line: 243 .. include:: /includes/setup/tab-inference-torch-neuronx-u22.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u22.rst .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.rst ================================================ FILE: _backup-setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu24.rst ================================================ .. _setup-torch-neuronx-ubuntu24: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small PyTorch Neuron ("torch-neuronx") Setup on Ubuntu 24 ===================================================== .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of PyTorch Neuron (``torch-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`pytorch-neuronx-main` for both Inference and Training. .. include:: /setup/install-templates/trn1-ga-warning.txt .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Trn1 web page `_, `Inf2 web page `_ * Select Ubuntu Server 24 AMI * When launching a Trn1, please adjust your primary EBS volume size to a minimum of 512GB. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 299 :end-line: 300 .. include:: /includes/setup/tab-inference-torch-neuronx-u24.txt .. include:: /frameworks/torch/torch-neuronx/setup/pytorch-update-u24.rst ================================================ FILE: _content-types/conceptual-deep-dive.rst ================================================ .. meta:: :description: {short description here} :date_updated: {planned date of publication here} .. _{RST page ref string here}: ================================================================================ Deep dive: {concept/practice/technique name; use sentence-case, not title case!} ================================================================================ .. {SEO-friendly intro paragraph, no more than 3 sentences total.} This topic explores {subjects} in depth and discusses the technical details of it from the perspective of an AWS Neuron expert. Some experience in {related subjects here} is required to understand it in full. What you should know before reading ----------------------------------- .. {If there is anything the reader should know before diving into this material, note it here and provide any supporting links. This also helps LLMs training on this content have greater technical context for this subject.} Before you start, you must be familiar with the following: - **Concept 1:** {Brief description. Link to a related topic if necessary.} - **Concept 2:** {Brief description. Link to a related topic if necessary.} Overview --------- .. {Your first section, which should cover the subject from the title at a high level. If appropriate, note when this concept is applicable in Neuron components and developer workflows. Starting off with a diagram can help illustrate the concept.} PARAGRAPH 1 PARAGRAPH 2 .. image:: images/diagram-name.png :alt: {Alt text for diagram} :align: center {Section 1 Title} ----------------- .. {Each section should build on top of what was discussed in the previous sections. If a new concept is introduced that wasn't discussed previously, link to a topic that covers it. You can add subsections within this section if it helps to break it up more and clarify the content, but do not go more than 1-2 levels deep.} PARAGRAPH 1 PARAGRAPH 2 .. code-block:: python # Code example if applicable def example_function(): pass {Section 2 Title} ----------------- .. {Each section should build on top of what was discussed in the previous sections. If a new concept is introduced that wasn't discussed previously, link to a topic that covers it. You can add subsections within this section if it helps to break it up more and clarify the content, but do not go more than 1-2 levels deep.} PARAGRAPH 1 PARAGRAPH 2 .. code-block:: python # Code example if applicable def example_function(): pass .. {Add more sections as appropriate to logically break up the content. Each section should be focused on a specific aspect of the concept.} {optional}Related Concepts ---------------- * :ref:`link-reference-name` - {description} * :ref:`link-reference-name` - {description} {optional}Further Reading --------------- .. toctree:: :maxdepth: 1 * `External Link `_ - {description} * :doc:`/path/to/internal/doc` - {description} .. (Note to both the writer and any AI incorporating this template: The content below is provided as a resource and should not be included as-is in any final document created using this template as a basis.) .. note:: .. Additional implementation details or important considerations can be added as admonitions. .. warning:: .. Critical information or potential pitfalls can be highlighted using warning admonitions. ================================================ FILE: _content-types/model-card.rst ================================================ .. _unique-ref-id-here: .. meta:: :description: AWS Neuron SDK model card for {Model Name}, version {version}. Overview, intended use, training data, performance, limitations, ethical considerations, and citations. :date-modified: 2026-10-03 Model Card: {Model Name} ======================= .. contents:: Table of Contents :depth: 1 :local: Model overview -------------- :Model name: {name} :Version: {version} :Organization: {organization} :License: {license} :Last updated: {date} .. warning:: {Important warnings or critical limitations} Quickstart ---------- .. code-block:: python # Example usage code from model import Model model = Model.from_pretrained("model_name") output = model.generate("Your input text") Model details ------------- Architecture ^^^^^^^^^^^^ - Base architecture: {architecture} - Number of parameters: {parameter_count} - Model dimensions: {model_dimensions} - Training objective: {training_objective} Hardware requirements ^^^^^^^^^^^^^^^^^^^^^ - Minimum RAM: {min_ram} - Recommended GPU: {gpu_specs} - Disk space: {disk_space} Intended Use ----------- Primary uses ^^^^^^^^^^^^ * {use_case_1} * {use_case_2} * {use_case_3} Out-of-Scope uses ^^^^^^^^^^^^^^^^^ * {prohibited_use_1} * {prohibited_use_2} Training data ------------ Datasets ^^^^^^^^ .. list-table:: :header-rows: 1 * - Dataset Name - Size - Description * - {dataset1} - {size1} - {description1} * - {dataset2} - {size2} - {description2} Training procedure ^^^^^^^^^^^^^^^^^^ * Training hardware: {hardware_details} * Training time: {duration} * Training cost: {cost_estimate} * Carbon footprint: {carbon_impact} Performance and limitations --------------------------- Benchmarks ^^^^^^^^^ .. list-table:: :header-rows: 1 * - Benchmark - Score - Details * - {benchmark1} - {score1} - {details1} * - {benchmark2} - {score2} - {details2} Known limitations ^^^^^^^^^^^^^^^^^ * {limitation_1} * {limitation_2} Bias and fairness ^^^^^^^^^^^^^^^^^ * {bias_consideration_1} * {bias_consideration_2} Ethical considerations ---------------------- Potential risks ^^^^^^^^^^^^^^^ * {risk_1} * {risk_2} Mitigation strategies ^^^^^^^^^^^^^^^^^^^^^ * {strategy_1} * {strategy_2} Model details and notes ---------------------- {Provide detailed information about the model, its training, evaluation, and any other relevant aspects. Create the sections as needed.} {Section 1 title} ^^^^^^^^^^^^^^^^^ {Details for section 1.} {Section 2 title} ^^^^^^^^^^^^^^^^^ {Details for section 2.} {. . .} Citations --------- .. code-block:: bibtex @article{model_paper, title={}, author={}, journal={}, year={} } Version history --------------- .. list-table:: :header-rows: 1 * - Version - Date - Changes * - {version1} - {date1} - {changes1} * - {version2} - {date2} - {changes2} Contact ------- :Documentation Issues: {link_to_issues} :Support: {support_contact} :Website: {website_url} ================================================ FILE: _content-types/procedural-how-to.rst ================================================ .. meta:: :description: {short description here} :date_updated: {planned date of publication here} .. _{RST page ref string here}: ======================================================================== How to {verb phrase with specific features or models that will be used} ======================================================================== Task overview ------------- .. {SEO-friendly intro paragraph, no more than 3 sentences total.} This topic discusses how to {description of task or process here} using the AWS Neuron SDK. {Short description of what the task will accomplish.} Prerequisites ------------- - **Prerequisite 1:** Description. Link to a related topic if necessary. - **Prerequisite 2:** Description. Link to a related topic if necessary. Instructions ------------ **1:** {First step; start with verb/action} .. {Describe what the user will do in this step, starting with a verb. If applicable, include any commands or code examples that illustrate the step.} .. code-block:: bash # Command or code example command --flag value .. {Additional detail if needed.} .. note:: .. {Optional; important information or caveats about this step} **2:** {Second step; start with verb/action} .. .. {Describe what the user will do in this step, starting with a verb. If applicable, include any commands or code examples that illustrate the step.} .. code-block:: python # Code example if applicable def example(): pass .. {Additional detail if needed.} .. note:: .. {Optional; important information or caveats about this step} .. **{More discrete steps as needed, following the same pattern as above.}** **N:** {Last step; start with verb/action} .. {Final step instructions} Confirm your work ----------------- To confirm you have successfully completed this task, {how to verify the task was done correctly}: .. {Provide them with a way to know they’ve done everything correctly. This could be a screenshot, command-line output, a tool to launch, or specific settings to check.} .. code-block:: bash # Verification command if applicable verify-command --check Common issues ------------- Uh oh! Did you encounter an error or other issue while working through this task? Here are some commonly encountered issues and how to address them. .. rubric:: {Problem 1} - **Possible solution**: {detailed solution} .. rubric:: {Problem 2} - **Possible solution**: {detailed solution} Related information ------------------- .. toctree:: :maxdepth: 1 * `External Link `_ - {description} * :doc:`/path/to/internal/doc` - {description} ================================================ FILE: _content-types/procedural-tutorial.ipynb ================================================ { "cells": [ { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext", "vscode": { "languageId": "raw" } }, "source": [ ".. meta::\n", " :description: {SEO-friendly short description of the tutorial. Include 'Neuron' and any keywords such as the language mode and framework.}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: {title starting with verb}\n", "\n", "This tutorial guides you through using the AWS Neuron SDK to {description of what the reader will accomplish in this tutorial, using a specific component or framework}.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "{Briefly summarize the purpose and outcome of this end-to-end tutorial}.\n", "{State what users will learn or achieve by completing the tutorial}." ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext", "vscode": { "languageId": "raw" } }, "source": [ ".. contents:: Table of contents\n", " :local:\n", " :depth: 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Before you start\n", "\n", "To successfully complete this tutorial, you must have completed the following steps in advance:\n\n", "- Downloaded and installed the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/index.html) for {component}.\n", "- {prerequisite 2 description here. If the user must read a topic in advance or perform any complex preparations, provide a link to a topic or download}\n", "- {prerequisite 3 description here}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "{Describe any initial local setup required before starting the tutorial.}\n", "{Include any code-specific installation, configuration, or environment setup steps.}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example setup command (Remove these comments and add the CLI commands, env variable declarations, or other operations for the user to prepare their environment.)\n", "# pip install package_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tutorial steps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\n", "\n", "{Describe the first main step. Provide code, commands, or configuration as needed.}\n", "\n", "{Optional} {Add any important notes, caveats, or warnings for this step.}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Code goes here!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\n", "\n", "{Describe the second main step.}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Code goes here!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\n", "\n", "Describe the third main step." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Code goes here!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step N: {Title, starting with an infinitive verb like 'Load...', 'Create...', etc.}\n", "\n", "Describe the last main step." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Code goes here!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\nCode completed. Now, let's run it..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run the code\n", "\n", "To run this code, {action to take to run the code}:\n", "Include commands, expected outputs, or checks to perform." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Example verification command\n", "# python foo.py\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "If your code works, you will see output like this:\n\n", "```\n", "Loading glorp inhume logic...Done!\n", "Configuring extubation channel instances...Done!\n\n", "1111 | 2222 | 3333\n", "4444 | 5555 | 6666\n\n", "Average glorps inhumed and extubated: 420\n", "Time to max glorp: 8 seconds\n", "```\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\nCongratulations! You now know how to {goal of tutorial}. If your code did not run or did not produce similar results, see the [Common issues](#Common issues) section below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Common issues\n", "\n", "Here are some common errors and mistakes you can make when developing code using the approach in this tutorial, and how you may be able to address them:\n\n", "- {describe error, symptoms, and possible solution}\n", "- {describe error, symptoms, and possible solution}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (Optional) Next steps\n", "\n", "{Suggest what users might want to do next after completing the tutorial.\n", "Link to related topics or advanced guides.}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Related topics\n", "\n", "- [Related topic 1](link_here)\n", "- [Related topic 2](link_here)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: _content-types/reference-kernel-api.rst ================================================ .. meta:: :description: API reference for the {kernel-name} kernel included in the NKI Library . :date-modified: MM/DD/YYYY .. currentmodule:: {kernel namespace}.{kernel module path} RMSNorm-Quant Kernel API Reference ================================== This topic provides the API reference for the ``{kernel name}`` kernel. The kernel performs optional RMS normalization followed by quantization to ``fp8``. The kernel supports: * {feature 1} * {feature 2} * {feature 3} * ... {more features as needed} Background ----------- The ``{kernel}`` kernel ... {description of kernel functionality based in sources} For detailed information about the mathematical operations and implementation details, refer to the :doc:`{kernel name} Kernel Design Specification `. API Reference -------------- {kernel argument class name} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. py:class:: {kernel argument class name} {kernel name} Kernel arguments. .. py:attribute:: {attribute-1} :type: {attribute-1-type} {description from docstring} .. py:attribute:: {attribute-1} :type: {attribute-1-type} {description from docstring} {more attributes as needed} .. py:method:: {method syntax} -> {return type} {description from docstring} .. py:method:: {method syntax} -> {return type} {description from docstring} **Raises**: * **{exception-1}** – {when exception is raised} * **{exception-1}** – {when exception is raised} {kernel API function name in code} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. py:function:: rmsnorm_quant_kernel(hidden: nt.tensor, ln_w: nt.tensor, kargs: RmsNormQuantKernelArgs) {definition of method used to instantiate or invoke kernel here, from source docstrings} {params and types with descriptions from source docstrings} Implementation Details ----------------------- The kernel implementation includes several key optimizations: 1. **{optimization-or-feature}**: {description} 2. **{optimization-or-feature}**: {description} 3. **{optimization-or-feature}**: {description} Example -------- The following is a simple example of how to use the ``{kernel}`` kernel: .. code-block:: python # Code here -- need usage example in pedagogical style. See Also -------- * :doc:`{kernel} ` ================================================ FILE: _content-types/release-notes-templates/compiler.rst ================================================ .. _neuron-2-XX-0-compiler: .. meta:: :description: The official release notes for the AWS Neuron SDK compiler component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: Neuron Compiler release notes ==================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/containers.rst ================================================ .. _neuron-2-XX-0-dlc: .. meta:: :description: The official release notes for the AWS Neuron SDK Deep Learning Containers (DLC) component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.0: Neuron Deep Learning Containers release notes ==================================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/dlami.rst ================================================ .. _neuron-2-XX-0-dlami: .. meta:: :description: The official release notes for the AWS Neuron SDK Deep Learning AWS Machine Images (DLAMIs) component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: Neuron Deep Learning AWS Machine Images release notes ============================================================================ **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.X release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/index.rst ================================================ .. _neuron-2-XX-0-whatsnew: .. _latest-neuron-release: .. meta:: :description: The official release notes for the AWS Neuron SDK, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X release notes =================================== **Date of release**: Month Day, 2026 .. toctree:: :hidden: :maxdepth: 1 PyTorch support JAX support NxD Inference NxD Training NxD Core Neuron Compiler NKI Neuron Runtime Developer tools Deep Learning AMIs Deep Learning Containers Release artifacts <../releasecontent> What's new? ----------- AWS and Annapurna Labs are excited to bring you release version 2.XX.X of the Neuron SDK! In this release you'll find improvements to... * . . . * . . . * . . . .. contents:: In this release :local: :depth: 1 Release highlights ------------------ Version 2.XX.X brings some exciting new features! HYPE TEXT HERE HIGHLIGHT 1 ^^^^^^^^^^^ HYPE TEXT HERE * TALKING POINT 1 * TALKING POINT 2 * . . . USE CASE DESCRIPTION HERE For more details, see :doc:`DOC LINK ` HIGHLIGHT 2 ^^^^^^^^^^^ HYPE TEXT HERE * TALKING POINT 1 * TALKING POINT 2 * . . . USE CASE DESCRIPTION HERE For more details, see :doc:`DOC LINK ` HIGHLIGHT 3 ^^^^^^^^^^^ HYPE TEXT HERE * TALKING POINT 1 * TALKING POINT 2 * . . . USE CASE DESCRIPTION HERE For more details, see :doc:`DOC LINK ` Other important changes ^^^^^^^^^^^^^^^^^^^^^^^ This release also includes the following improvements * . . . LINK TO COMPONENT RELEASE NOTE PAGE * . . . LINK TO COMPONENT RELEASE NOTE PAGE * . . . LINK TO COMPONENT RELEASE NOTE PAGE * . . . LINK TO COMPONENT RELEASE NOTE PAGE Component release notes ----------------------- Select a card below to review detailed release notes for each component of the Neuron SDK version 2.XX.X. These component release notes contain details on specific new and improved features, as well as breaking changes, bug fixes, and known issues for that component area of the Neuron SDK. .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: neuron-2-XX-0-pytorch :link-type: ref **PyTorch support** 2.XX.0 release notes ^^^ Neuron features and solutions that support the PyTorch ML framework. +++ Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-jax :link-type: ref **JAX support** 2.XX.0 release notes ^^^ Neuron features and solutions that support the JAX ML framework. +++ Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-nxd-training :link-type: ref **NxD Training** 2.XX.0 release notes ^^^ Neuron features and tools for LLM and agent ML model training. +++ Supports: ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-nxd-inference :link-type: ref **NxD Inference** 2.XX.0 release notes ^^^ Neuron features and tools for LLM and agent ML model inference. +++ Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-nxd-core :link-type: ref **NxD Core** 2.XX.0 release notes ^^^ Common features and tools for Neuron-based training and inference. +++ Supports: ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-compiler :link-type: ref **Neuron Compiler** 2.XX.0 release notes ^^^ The Neuron compiler for AWS Trainium and Inferentia, and its libraries and tools. +++ Supports: ``Inf2``, ``Trn1`` / ``Trn1n``, ``Trn2`` .. grid-item-card:: :link: neuron-2-XX-0-nki :link-type: ref **Neuron Kernel Interface (NKI)** 2.XX.0 release notes ^^^ Neuron's Python-based programming interface for developing and optimizing Neuron kernels. +++ Supports: ``Inf2``, ``Trn1``, ``Trn1n`` .. grid-item-card:: :link: neuron-2-XX-0-runtime :link-type: ref **Neuron Runtime** 2.XX.0 release notes ^^^ The Neuron kernel driver and C++ libraries for AWS Inferentia and Trainium instances. +++ Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n`` .. grid-item-card:: :link: neuron-2-XX-0-tools :link-type: ref **Neuron Developer Tools** 2.XX.0 release notes ^^^ Tools that support end-to-end development for AWS Neuron. +++ Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n`` .. grid-item-card:: :link: neuron-2-XX-0-dlami :link-type: ref **Neuron Deep Learning AWS Machine Images (DLAMIs)** 2.XX.0 release notes ^^^ AWS-specific machine images for building and deploying Neuron-based ML solutions. +++ Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n`` .. grid-item-card:: :link: neuron-2-XX-0-dlc :link-type: ref **Neuron Deep Learning Containers (DLCs)** 2.XX.0 release notes ^^^ AWS-specific container definitions for building and deploying Neuron-based ML solutions. +++ Supports: ``Inf1``, ``Inf2``, ``Trn1`` / ``Trn1n`` .. grid-item-card:: :link: latest-neuron-release-artifacts :link-type: ref **Neuron 2.XX.0 release artifacts** ^^^ The libraries and packages updated in this release. Support announcements --------------------- This section signals the official end-of-support or end of support for specific features, tools, and APIs. End-of-support announcements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *An "end-of-support (EoS)" announcement is a notification that a feature, tool, or API will not be supported in the future. Plan accordingly!* * END-OF-SUPPORT ANNOUNCEMENT 1 (link to announcement here) * . . . Ending support in 2.XX.X ^^^^^^^^^^^^^^^^^^^^^^^^ "End of support" means that AWS Neuron no longer supports the feature, tool, or API indicated in the note as of this release. * ENDING SUPPORT ANNOUNCEMENT 1 (link to announcement here) * . . . Previous releases ----------------- * :doc:`Neuron 2.27.0 ` * :doc:`Neuron 2.26.0 ` * :doc:`Neuron 2.25.0 ` * :doc:`Earlier releases ` * :ref:`prev-rn` * :ref:`pre-release-content` * :ref:`prev-n1-rn` ================================================ FILE: _content-types/release-notes-templates/nki.rst ================================================ .. _neuron-2-XX-0-nki: .. meta:: :description: The official release notes for the AWS Neuron Kernel Interface (NKI) component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.25.0: Neuron Kernel Interace (NKI) release notes ================================================================= **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.25.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.25.0: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/nx-jax.rst ================================================ .. _neuron-2-XX-0-jax: .. meta:: :description: The official release notes for the AWS Neuron SDK JAX support component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: JAX support release notes ================================================ **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.25.0 release notes home ` Released versions ----------------- * ``0.6.1.1.0.*`` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.25.0: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/nx-pytorch.rst ================================================ .. _neuron-2-XX-0-pytorch: .. meta:: :description: The official release notes for AWS Neuron SDK PyTorch support, version X.XX.0. Release date: XX/XX/XXXX. AWS Neuron SDK X.XX.0: PyTorch support release notes ==================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Released versions ----------------- * ... Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE WHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/nxd-core.rst ================================================ .. _neuron-2-XX-0-nxd-core: .. meta:: :description: The official release notes for the AWS Neuron SDK NxD Core component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: NxD Core release notes ============================================= **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/nxd-inference.rst ================================================ .. _neuron-2-XX-0-nxd-inference: .. meta:: :description: The official release notes for the AWS Neuron SDK Transformers for Inference component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: NxD Inference release notes ================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` * Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/nxd-training.rst ================================================ .. _neuron-2-XX-0-nxd-training: .. meta:: :description: The official release notes for the AWS Neuron SDK NxD Training component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.25.0: NxD Training release notes ================================================= **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.25.0: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/runtime.rst ================================================ .. _neuron-2-XX-0-runtime: .. meta:: :description: The official release notes for the AWS Neuron SDK Runtime component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: Neuron Runtime release notes =================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _content-types/release-notes-templates/tools.rst ================================================ .. _neuron-2-XX-0-tools: .. meta:: :description: The official release notes for the AWS Neuron SDK Developer Tools component, version X.XX.0. Release date: XX/XX/2026. AWS Neuron SDK 2.XX.X: Developer Tools release notes ==================================================== **Date of release**: Month Day, 2026 .. contents:: In this release :local: :depth: 1 * Go back to the :ref:`AWS Neuron 2.XX.0 release notes home ` Improvements ------------ *Improvements are significant new or improved features and solutions introduced this release of the AWS Neuron SDK. Read on to learn about them!* Feature 1 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 2 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Feature 3 ^^^^^^^^^ USER-FACING DESCRIPTION OF IMPROVEMENT (WHAT WILL IT DO FOR DEV CUSTOMERS), WHY WE MADE THE IMPROVEMENT, LINK TO SUPPORTING DOC PAGE Behavioral changes ------------------ *Behavioral changes are small, user-facing changes that you may notice after upgrading to this version.* * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * CHANGE DESCRIPTION SENTENCE. NOTE HOW THE USER MAY EXPERIENCE IT, IF APPLICABLE. * . . . Breaking changes ---------------- *Sometimes we have to break something now to make the experience better in the longer term. Breaking changes are changes that may require you to update your own code, tools, and configurations.* * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * CHANGE DESCRIPTION SENTENCE. NOTE HWHEN THE USER MAY ENCOUNTER IT. PROVIDE A WORKAROUND, IF POSSIBLE. * . . . Bug fixes --------- Here's what we fixed in 2.XX.X: * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * SHORT SENTENCE DESCRIBING BUG FIX. * . . . Known issues ------------ *Something doesn't work. Check here to find out if we already knew about it. We hope to fix these soon!* * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * SENTENCE DESCRIBING ISSUE AND WHEN THE USER WILL ENCOUNTER IT. * . . . ================================================ FILE: _ext/archive.py ================================================ # This file creates a downloadable archive from each directory listed in src_dirs. # You can modify or add additional archive_handler functions here to create additional archives. import os, tarfile def archive_handler(app): old_cwd = os.getcwd() src_dirs = ['src/examples/pytorch', 'src'] target_dirs = ['libtorch_demo', 'neuronperf'] archive_names = [name + '.tar.gz' for name in target_dirs] for src_dir, target_dir, archive_name in zip(src_dirs, target_dirs, archive_names): os.chdir(src_dir) try: os.remove(archive_name) except OSError: pass with tarfile.open(archive_name, 'w:gz') as tar: tar.add(target_dir) os.chdir(old_cwd) def setup(app): app.connect('builder-inited', archive_handler) return { 'version': '1.0', 'parallel_read_safe': True, 'parallel_write_safe': True, } ================================================ FILE: _ext/df_tables.py ================================================ import os from docutils.parsers.rst import Directive, directives from docutils.parsers.rst.directives.tables import CSVTable class DFTable(CSVTable): CSVTable.option_spec['df-arg'] = directives.unchanged df = None def __init__(self, name, arguments, options, content, lineno, content_offset, block_text, state, state_machine): super().__init__(name, arguments, options, content, lineno, content_offset, block_text, state, state_machine) def get_csv_data(self): return self.df.to_csv(index=False).splitlines(), None def run(self): source_file_name = self.state_machine.document.attributes["source"] dirname = os.path.abspath(os.path.dirname(source_file_name)) os.chdir(dirname) code = "\n".join(map(str, self.content)) ns = {} try: exec("\n".join( ["import numpy as np", "import pandas as pd", ]), ns) variable_name = "df" if self.options.get("df-var"): variable_name = self.options.get("df-var") exec(code, ns) self.df = ns[variable_name] except Exception as e: raise self.error(str(e)) return super().run() def setup(app): setup.app = app setup.config = app.config setup.confdir = app.confdir app.add_directive("df-table", DFTable) metadata = { "parallel_read_safe": True, "parallel_write_safe": True, "version": 0.1, } return metadata ================================================ FILE: _ext/local_documenter.py ================================================ import os import sys from sphinx.ext.autodoc import ModuleDocumenter, FunctionDocumenter class LocalModuleDocumenter(ModuleDocumenter): """ Provides identical functionality to "automodule", but allows the module function names to be overridden with the "module-name" option. This also allows local python files to be documented as if they were imported from an actual package by temporarily adding the directory of the RST file to the python path. """ option_spec = dict(ModuleDocumenter.option_spec) option_spec['module-name'] = lambda x = None: x def import_object(self, *args): """Find modules local to the RST document directory""" local = os.path.join(self.env.app.srcdir, os.path.dirname(self.env.docname)) sys.path.append(local) result = super().import_object(*args) sys.path.remove(local) return result def get_module_members(self): """Add module name override to local files""" members = super().get_module_members() name = self.options.module_name if name is not None: for member in members.values(): if callable(member.object): setattr(member.object, 'module_name_override', name) return members class LocalFunctionDocumenter(FunctionDocumenter): def format_name(self) -> str: """Apply module name override to local functions""" # Use overridden module path if it is provided if hasattr(self.object, 'module_name_override'): self.objpath = self.object.module_name_override.split('.') + [self.objpath[-1]] return super().format_name() def setup(app): app.add_autodocumenter(LocalFunctionDocumenter) app.add_autodocumenter(LocalModuleDocumenter) ================================================ FILE: _ext/neuron_tag.py ================================================ import os from docutils import nodes from docutils.statemachine import ViewList from sphinx.util.docutils import SphinxDirective from sphinx.util.nodes import nested_parse_with_titles # ============================================================================= # Legacy add/clear lists (used only for files NOT handled by explicit overrides) # ============================================================================= # These lists use substring matching via in_list(). They apply ONLY when no # explicit_override was set. As more paths get explicit overrides, entries # here become dead code. Kept for backward compatibility with paths not yet # explicitly overridden. add_inf1_tag = [ 'about-neuron/arch', 'archive/mxnet-neuron', 'about-neuron/announcements/index', 'archive/tensorflow/tensorflow-neuron/', ] add_trn1_tag = [ 'frameworks/neuron-customops/', 'neuron-customops/', 'frameworks/torch/inference-torch-neuronx', 'libraries/nemo-megatron/', 'libraries/nxd-training/', ] add_trn2_tag = [ 'libraries/nxd-training/', 'about-neuron/models/', ] add_trn3_tag = [ 'about-neuron/arch/neuron-hardware/neuron-core-v4', 'about-neuron/arch/neuron-hardware/trn3-arch', ] add_neuronx_tag = [ 'frameworks/torch/torch-neuronx/', 'archive/tensorflow/tensorflow-neuronx/', 'frameworks/torch/inference-torch-neuronx/', 'libraries/neuronx-distributed/', 'libraries/nxd-training', 'setup/tensorflow-neuronx', ] clear_inf1_tag = [ 'about-neuron/arch/neuron-features/neuron-caching', 'about-neuron/arch/neuron-features/eager-debug-mode', 'about-neuron/arch/neuron-features/collective-communication-operations', 'about-neuron/arch/neuron-features/dynamic-shapes', 'about-neuron/arch/neuron-features/control-flow', 'about-neuron/arch/neuron-features/custom-c++-operators', 'about-neuron/arch/neuron-features/collective-communication', 'about-neuron/arch/neuron-features/rounding-modes', 'about-neuron/arch/neuron-hardware/trn1-arch', 'about-neuron/arch/neuron-hardware/inf2-arch', 'about-neuron/arch/neuron-hardware/inferentia2', 'about-neuron/arch/neuron-hardware/trainium', 'about-neuron/arch/neuron-hardware/neuron-core-v2', 'about-neuron/arch/neuron-hardware/trn2-arch', 'about-neuron/arch/neuron-hardware/trn3-arch', 'about-neuron/arch/neuron-hardware/neuron-core-v3', 'about-neuron/arch/neuron-hardware/neuron-core-v4', 'about-neuron/benchmarks/trn1-performance', 'about-neuron/benchmarks/trn1/', 'about-neuron/benchmarks/inf2/inf2-performance', 'about-neuron/faq/training/', 'about-neuron/models/inference-inf2-trn1-samples', 'about-neuron/models/training-trn1-samples', 'about-neuron/models/training-inference-trn2-samples', 'about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision', 'about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron', 'about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note', 'about-neuron/calculator/neuron-calculator', 'about-neuron/announcements/neuron2.x/dlami-pytorch-introduce', 'about-neuron/announcements/neuron2.x/sm-training-trn1-introduce', 'about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1', 'devflows/training', 'devflows/inference/byoc-hosting-devflow-inf2', 'compiler/neuronx-cc/', 'about-neuron/appnotes/perf/neuronx-cc/', 'frameworks/torch/torch-neuronx/', 'frameworks/torch/training', 'frameworks/torch/inference-torch-neuronx', 'archive/tensorflow/tensorflow-neuronx/', 'archive/tensorflow/tensorflow-neuronx-inference', 'frameworks/torch/torch-neuronx/transformers-neuronx/readme', 'release-notes/neuron-cc/index', 'release-notes/runtime/aws-neuronx-collectives/', 'release-notes/torch/torch-neuronx/', 'release-notes/torch/transformers-neuronx/index', 'release-notes/tensorflow/tensorflow-neuronx/', 'release-notes/compiler/neuronx-cc/', 'tools/tutorials/tutorial-tensorboard-scalars-mnist', 'tools/tutorials/tutorial-neuron-monitor-mnist', 'tools/tensorboard/getting-started-tensorboard-neuronx-plugin', 'tools/neuron-sys-tools/nccom-test', 'setup/torch-neuronx', 'setup/tensorflow-neuronx', 'setup/neuron-setup/tensorflow/neuronx/', 'setup/neuron-setup/pytorch/neuronx/', 'nki/', 'frameworks/jax/', 'libraries/nxd-training/', '/release-notes/components/nki', '/release-notes/components/nki-lib', '/release-notes/components/compiler' ] clear_inf2_tag = [ 'frameworks/torch/torch-neuronx/training', 'frameworks/torch/training', 'archive/torch-neuron/inference-torch-neuron', 'archive/tensorflow/tensorflow-neuron-inference', 'frameworks/jax/', 'about-neuron/arch/neuron-hardware/trn1-arch', 'about-neuron/arch/neuron-hardware/trainium', 'about-neuron/arch/neuron-hardware/trn2-arch', 'about-neuron/arch/neuron-hardware/trn3-arch', 'about-neuron/arch/neuron-hardware/neuron-core-v3', 'about-neuron/arch/neuron-hardware/neuron-core-v4', 'about-neuron/arch/neuron-features/logical-neuroncore-config', 'about-neuron/benchmarks/trn1/trn1-inference-performance', 'about-neuron/benchmarks/trn1/trn1-training-performance', 'about-neuron/models/training-trn1-samples', 'about-neuron/models/training-inference-trn2-samples', 'about-neuron/announcements/neuron2.x/announce-neuron-trn2', 'neuronx-distributed/nxd-training', 'libraries/nxd-training/', 'tools/neuron-sys-tools/nccom-test', 'release-notes/runtime/aws-neuronx-collectives/', ] clear_trn1_tag = [ 'about-neuron/arch/neuron-hardware/inf2-arch', 'about-neuron/arch/neuron-hardware/inferentia2', 'about-neuron/arch/neuron-hardware/trn2-arch', 'about-neuron/arch/neuron-hardware/trn3-arch', 'about-neuron/arch/neuron-hardware/trainium2', 'about-neuron/arch/neuron-hardware/neuron-core-v3', 'about-neuron/arch/neuron-hardware/neuron-core-v4', 'about-neuron/benchmarks/inf2/inf2-performance', 'about-neuron/models/training-inference-trn2-samples', ] clear_trn2_tag = [ 'archive/tensorflow/', 'libraries/transformers-neuronx/', 'about-neuron/arch/neuron-hardware/trn1-arch', 'about-neuron/arch/neuron-hardware/trainium', 'about-neuron/arch/neuron-hardware/neuron-core-v2', 'about-neuron/arch/neuron-hardware/neuron-core-v4', 'about-neuron/arch/neuron-hardware/trn3-arch', 'about-neuron/benchmarks/', 'about-neuron/benchmarks/trn1/', 'about-neuron/benchmarks/inf2/inf2-performance', 'about-neuron/models/inference-inf2-trn1-samples', 'about-neuron/models/training-trn1-samples', 'neuron-customops/programming-guide/custom-c++-operators-devguide' ] clear_trn3_tag = [ 'archive/tensorflow/', 'libraries/transformers-neuronx/', 'about-neuron/arch/neuron-hardware/trn1-arch', 'about-neuron/arch/neuron-hardware/trainium', 'about-neuron/arch/neuron-hardware/neuron-core-v2', 'about-neuron/arch/neuron-hardware/neuron-core-v3', 'about-neuron/benchmarks/', 'about-neuron/benchmarks/trn1/', 'about-neuron/benchmarks/inf2/inf2-performance', 'about-neuron/models/inference-inf2-trn1-samples', 'about-neuron/models/training-trn1-samples', 'libraries/neuronx-distributed/context_parallelism_overview', 'about-neuron/appnotes/', 'neuron-customops/programming-guide/custom-c++-operators-devguide' ] # Neuron 1.x / NeuronCore v1 era content — clear all non-Inf1 tags clear_nc_v2_tag = [ 'tools/tutorials/tutorial-neuron-check-model', 'tools/tutorials/tutorial-neuron-gatherinfo', 'tools/tutorials/getting-started-tensorboard-neuron-plugin', 'tools/tensorboard/getting-started-tensorboard-neuron-plugin', 'tools/helper-tools/tutorial-neuron-check-model', 'tools/helper-tools/tutorial-neuron-gatherinfo', 'about-neuron/appnotes/neuron-cc/mixed-precision', 'about-neuron/appnotes/perf/neuron-cc/', 'about-neuron/appnotes/neuron1x/', 'about-neuron/appnotes/torch-neuron/', 'about-neuron/arch/neuron-hardware/inf1-arch', 'about-neuron/arch/neuron-hardware/inferentia', 'about-neuron/arch/neuron-hardware/neuron-core-v1', 'about-neuron/arch/neuron-features/neuroncore-pipeline', 'about-neuron/announcements/neuron1.x/', 'about-neuron/quick-start/mxnet-neuron', 'about-neuron/benchmarks/inf1/', 'about-neuron/faq/inference/', 'about-neuron/models/inference-inf1-samples', 'containers/dlc-then-ec2-devflow', 'containers/dlc-then-ecs-devflow', 'containers/dlc-then-eks-devflow', 'containers/container-sm-hosting-devflow', 'containers/rn', 'containers/tutorials/k8s-neuron-scheduler', 'compiler/neuron-cc/', 'release-notes/mxnet-neuron/', 'release-notes/torch/torch-neuron/', 'release-notes/tensorflow/tensorflow-neuron/', 'release-notes/compiler/neuron-cc/', 'release-notes/neuron1/', 'archive/torch-neuron/', 'archive/torch-neuron/inference-torch-neuron', 'archive/tensorflow/tensorflow-neuron/', 'archive/tensorflow/tensorflow-neuron-inference', 'archive/mxnet-neuron/', 'setup/tensorflow-neuron', 'setup/torch-neuron', 'setup/mxnet-neuron', 'setup/neuron-setup/pytorch/neuron/', 'setup/neuron-setup/mxnet/neuron/ubuntu/', 'setup/neuron-setup/mxnet/neuron/amazon-linux/', 'setup/neuron-setup/tensorflow/neuron/ubuntu/', 'setup/neuron-setup/tensorflow/neuron/amazon-linux/', ] # Top-level directories used for initial tag assignment NEURON1_DIRS = ['n1'] COMMON_DIRS = [ 'tools', 'neuron-runtime', 'release-notes', 'containers', 'compiler', 'frameworks', 'src', 'about-neuron', 'setup', 'devflows', 'dlami', 'libraries', ] TEXT_TEMPLATE = '**This document is relevant for**: ' # ============================================================================= # Hardware architecture page map (exact docname → instance list) # ============================================================================= HW_ARCH_MAP = { 'about-neuron/arch/neuron-hardware/inf1-arch': ['Inf1'], 'about-neuron/arch/neuron-hardware/inf2-arch': ['Inf2'], 'about-neuron/arch/neuron-hardware/inferentia': ['Inf1'], 'about-neuron/arch/neuron-hardware/inferentia2': ['Inf2'], 'about-neuron/arch/neuron-hardware/neuron-core-v1': ['Inf1'], 'about-neuron/arch/neuron-hardware/neuron-core-v2': ['Inf2', 'Trn1'], 'about-neuron/arch/neuron-hardware/neuron-core-v3': ['Trn2'], 'about-neuron/arch/neuron-hardware/neuron-core-v4': ['Trn3'], 'about-neuron/arch/neuron-hardware/trainium': ['Trn1'], 'about-neuron/arch/neuron-hardware/trainium2': ['Trn2'], 'about-neuron/arch/neuron-hardware/trainium3': ['Trn3'], 'about-neuron/arch/neuron-hardware/trn1-arch': ['Trn1'], 'about-neuron/arch/neuron-hardware/trn2-arch': ['Trn2'], 'about-neuron/arch/neuron-hardware/trn3-arch': ['Trn3'], } # NxD Core training-specific pages (no Inf2) NXD_CORE_TRAINING_PAGES = [ 'libraries/neuronx-distributed/index-training', 'libraries/neuronx-distributed/developer-guide-training', 'libraries/neuronx-distributed/api-reference-guide-training', 'libraries/neuronx-distributed/tp_developer_guide', 'libraries/neuronx-distributed/pp_developer_guide', 'libraries/neuronx-distributed/ptl_developer_guide', 'libraries/neuronx-distributed/save_load_developer_guide', 'libraries/neuronx-distributed/activation_memory_reduction', 'libraries/neuronx-distributed/activation_memory_reduction_developer_guide', 'libraries/neuronx-distributed/standard_mixed_precision', 'libraries/neuronx-distributed/tensor_parallelism_overview', 'libraries/neuronx-distributed/pipeline_parallelism_overview', 'libraries/neuronx-distributed/lora_finetune_developer_guide', 'libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide', 'libraries/neuronx-distributed/context_parallelism_overview', ] def _in_list(cur_file, file_list): """Return True if any entry in file_list is a substring of cur_file.""" return any(entry in cur_file for entry in file_list) def _splitall(path): """Split a path into all its components.""" parts = [] while True: head, tail = os.path.split(path) if head == path: parts.insert(0, head) break elif tail == path: parts.insert(0, tail) break else: path = head parts.insert(0, tail) return parts, len(parts) def _get_explicit_override(cur_file): """Return (instances, True) if cur_file has an explicit CSV-based override, or (None, False) otherwise. Rules are evaluated top-to-bottom. More specific paths must come AFTER broader paths so they can override them (last match wins). """ # --- Libraries ----------------------------------------------------------- # NxD Core = Inf2, Trn1, Trn2 (default for all neuronx-distributed pages) if cur_file.startswith('libraries/neuronx-distributed/'): result = ['Inf2', 'Trn1', 'Trn2'] # Training-specific pages drop Inf2 if cur_file in NXD_CORE_TRAINING_PAGES: result = ['Trn1', 'Trn2'] if cur_file.startswith('libraries/neuronx-distributed/tutorials/training') or \ cur_file.startswith('libraries/neuronx-distributed/tutorials/finetune'): result = ['Trn1', 'Trn2'] return result, True if cur_file.startswith('libraries/transformers-neuronx/'): return ['Inf2', 'Trn1'], True if cur_file.startswith('libraries/nxd-training/'): return ['Trn1', 'Trn2'], True # vLLM must come before general nxd-inference if cur_file.startswith('libraries/nxd-inference/vllm/'): return ['Trn2', 'Trn3'], True if cur_file.startswith('libraries/nxd-inference/'): return ['Inf2', 'Trn1', 'Trn2'], True if cur_file.startswith('libraries/nemo-megatron/'): return ['Trn1', 'Trn2'], True # --- NKI ----------------------------------------------------------------- if cur_file.startswith('nki/'): return ['Trn2', 'Trn3'], True # --- CustomOps ----------------------------------------------------------- if cur_file.startswith('neuron-customops/'): return ['Inf2', 'Trn1'], True # --- Frameworks ---------------------------------------------------------- if cur_file.startswith('frameworks/jax/'): return ['Trn2', 'Trn3'], True # TensorFlow NeuronX (must come before TensorFlow Neuron check) if 'tensorflow/tensorflow-neuronx' in cur_file: return ['Inf2', 'Trn1'], True # TensorFlow Neuron (Inf1) if 'tensorflow/tensorflow-neuron' in cur_file and 'neuronx' not in cur_file: return ['Inf1'], True # TorchNeuron native PyTorch (must come before torch-neuronx check) if 'torch/pytorch-native' in cur_file: return ['Trn2', 'Trn3'], True # PyTorch NeuronX (Torch/XLA) if 'torch/torch-neuronx' in cur_file: return ['Inf2', 'Trn1', 'Trn2'], True # PyTorch NeuronX top-level pages (not in torch-neuronx/ subdir) if cur_file in ['frameworks/torch/inference-torch-neuronx', 'frameworks/torch/training-torch-neuronx', 'frameworks/torch/training', 'frameworks/torch/inference']: return ['Inf2', 'Trn1', 'Trn2'], True # PyTorch Neuron (Inf1) if 'torch/torch-neuron' in cur_file and 'neuronx' not in cur_file: return ['Inf1'], True if cur_file == 'archive/torch-neuron/inference-torch-neuron': return ['Inf1'], True # MXNet if 'mxnet-neuron' in cur_file: return ['Inf1'], True # --- Neuron Runtime ------------------------------------------------------ # Collectives (more specific, must come after general runtime) if cur_file.startswith('neuron-runtime/about/collectives') or \ cur_file in ['neuron-runtime/explore/internode-collective-comm', 'neuron-runtime/explore/intranode-collective-comm', 'neuron-runtime/explore/compute-comm-overlap']: return ['Trn1', 'Trn2', 'Trn3'], True if cur_file.startswith('neuron-runtime/'): return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True # --- Compiler ------------------------------------------------------------ if cur_file.startswith('compiler/error-codes/'): return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True if cur_file == 'compiler/neuron-cc' or cur_file.startswith('compiler/neuron-cc/'): return ['Inf1'], True if cur_file == 'compiler/neuronx-cc' or cur_file.startswith('compiler/neuronx-cc/'): return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True if cur_file == 'neuron-customops/programming-guide' or cur_file.startswith('neuron-customops/programming-guide'): return ['Inf2', 'Trn1'], True # --- Setup --------------------------------------------------------------- if cur_file.startswith('setup/install-templates/inf1/'): return ['Inf1'], True if cur_file.startswith('setup/install-templates/inf2/'): return ['Inf2'], True if cur_file.startswith('setup/install-templates/trn1/') or \ cur_file == 'setup/install-templates/launch-trn1-dlami': return ['Trn1'], True if cur_file in ['setup/setup-neuron', 'setup/torch-neuron', 'setup/torch-neuron-ubuntu20']: return ['Inf1'], True if cur_file.startswith('setup/neuron-setup/pytorch/neuronx/'): return ['Inf2', 'Trn1', 'Trn2'], True if cur_file.startswith('setup/neuron-setup/tensorflow/neuronx/'): return ['Inf2', 'Trn1'], True if cur_file.startswith('setup/neuron-setup/pytorch/neuron/'): return ['Inf1'], True if cur_file.startswith('setup/neuron-setup/tensorflow/neuron/'): return ['Inf1'], True if cur_file == 'setup/jax-neuronx': return ['Trn2', 'Trn3'], True if cur_file == 'setup/torch-neuronx': return ['Inf2', 'Trn1', 'Trn2'], True if cur_file == 'setup/tensorflow-neuronx': return ['Inf2', 'Trn1'], True if cur_file == 'setup/tensorflow-neuron': return ['Inf1'], True return None, False def _get_page_override(cur_file): """Return (instances, True) for page-specific overrides that don't fit neatly into _get_explicit_override (devflows, containers, tools, about-neuron, etc.). """ # --- Devflows ------------------------------------------------------------ if cur_file == 'devflows/inference/byoc-hosting-devflow-inf2': return ['Inf2'], True if cur_file == 'devflows/inference/ec2-then-ec2-devflow-inf2': return ['Inf2'], True if cur_file == 'devflows/parallelcluster-flows': return ['Trn1', 'Trn2'], True if cur_file.startswith('devflows/training/batch/') or \ cur_file.startswith('devflows/training/ec2/') or \ cur_file.startswith('devflows/training/parallelcluster/') or \ cur_file.startswith('devflows/training/sm-devflow/'): return ['Trn1', 'Trn2', 'Trn3'], True if cur_file.startswith('devflows/plugins/npd'): return ['Inf2', 'Trn1', 'Trn2'], True # --- Containers ---------------------------------------------------------- # OCI Hooks if 'tutorial-oci-hook' in cur_file: return ['Inf1', 'Inf2', 'Trn1', 'Trn2'], True # DRA if cur_file == 'containers/neuron-dra' or cur_file.startswith('containers/files/'): return ['Trn2', 'Trn3'], True if cur_file == 'containers/how-to/how-to-ultraserver': return ['Trn2', 'Trn3'], True # DLC quickstarts if cur_file == 'containers/get-started/quickstart-configure-deploy-dlc': return ['Trn2', 'Trn3'], True if cur_file == 'containers/get-started/quickstart-pytorch-inference-dlc': return ['Inf2', 'Trn1', 'Trn2', 'Trn3'], True # Inf1-era container content if cur_file == 'containers/tutorial-docker-runtime1.0': return ['Inf1'], True if cur_file == 'containers/container-deployment-flows' or \ cur_file.startswith('containers/docker-example/inference/') or \ cur_file.startswith('containers/docker-example/v1/') or \ cur_file == 'containers/ec2-then-ec2-devflow' or \ cur_file == 'containers/neo-then-hosting-devflow': return ['Inf1'], True # Container training/inference tutorials and docker examples if cur_file.startswith('containers/docker-example/training/'): return ['Trn1', 'Trn2', 'Trn3'], True if cur_file.startswith('containers/tutorials/inference/'): return ['Inf1'], True if cur_file.startswith('containers/tutorials/training/'): return ['Trn1', 'Trn2', 'Trn3'], True # Neuron Monitor Container if cur_file == 'containers/tutorials/k8s-neuron-monitor': return ['Inf2', 'Trn1', 'Trn2'], True # Node Problem Detector if cur_file.startswith('containers/tutorials/k8s-neuron-problem-detector'): return ['Inf2', 'Trn1', 'Trn2'], True # --- Tools --------------------------------------------------------------- # TensorBoard plugin (End Of Support) if cur_file.startswith('tools/tensorboard/getting-started-tensorboard-neuronx') or \ cur_file == 'tools/tutorials/tutorial-tensorboard-scalars-mnist' or \ cur_file == 'tools/tutorials/torch-neuronx-profiling-with-tb': return ['Inf2', 'Trn1'], True # --- Announcements ------------------------------------------------------- if cur_file.startswith('about-neuron/announcements/'): return [], True # --- Hardware architecture ----------------------------------------------- if cur_file in HW_ARCH_MAP: return HW_ARCH_MAP[cur_file], True # --- Arch features ------------------------------------------------------- if cur_file == 'about-neuron/arch/neuron-features/custom-c++-operators': return ['Inf2', 'Trn1'], True if cur_file == 'about-neuron/arch/neuron-features/logical-neuroncore-config': return ['Trn2', 'Trn3'], True # --- Appnotes ------------------------------------------------------------ if cur_file == 'about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference': return ['Inf2', 'Trn1', 'Trn2'], True if cur_file == 'about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training': return ['Trn1', 'Trn2'], True if cur_file.startswith('about-neuron/appnotes/torch-neuronx/'): return ['Inf2', 'Trn1', 'Trn2'], True if cur_file.startswith('about-neuron/appnotes/transformers-neuronx/'): return ['Inf2', 'Trn1'], True if cur_file == 'about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision': return ['Trn1', 'Trn2', 'Trn3'], True if cur_file.startswith('about-neuron/appnotes/neuron1x/'): return ['Inf1'], True # --- Benchmarks ---------------------------------------------------------- if cur_file == 'about-neuron/benchmarks/index': return ['Inf1', 'Inf2', 'Trn1', 'Trn2', 'Trn3'], True # --- Quick-start --------------------------------------------------------- if cur_file == 'about-neuron/quick-start/tensorflow-neuron': return ['Inf1'], True if cur_file in ['about-neuron/quick-start/torch-neuron', 'about-neuron/quick-start/torch-neuron-tab-training']: return ['Inf1'], True if cur_file.startswith('about-neuron/quick-start/tab-inference-torch-neuronx'): return ['Inf2', 'Trn1', 'Trn2'], True if cur_file.startswith('about-neuron/quick-start/tab-inference-torch-neuron') and 'neuronx' not in cur_file: return ['Inf1'], True if cur_file.startswith('about-neuron/quick-start/tab-inference-tensorflow-neuronx'): return ['Inf2', 'Trn1'], True if cur_file.startswith('about-neuron/quick-start/tab-inference-tensorflow-neuron') and 'neuronx' not in cur_file: return ['Inf1'], True return None, False class NeuronTag(SphinxDirective): def run(self): cur_file = self.env.docname path_split, path_len = _splitall(cur_file) # Landing page gets no tag if path_split[0] == 'index': return self._render('') # Step 1: Assign default instances based on top-level directory return_instances = [] if path_split[0] in NEURON1_DIRS: return_instances = ['Inf1'] elif path_split[0] in COMMON_DIRS: return_instances = ['Inf1', 'Inf2', 'Trn1', 'Trn2', 'Trn3'] # Step 2: Check explicit overrides (CSV-based, highest priority) explicit_override = False result, matched = _get_explicit_override(cur_file) if matched: return_instances = result explicit_override = True if not explicit_override: result, matched = _get_page_override(cur_file) if matched: return_instances = result explicit_override = True # Step 3: Directory-based inference/training heuristic if not explicit_override: if path_len >= 2: parent_dir = path_split[path_len - 2] if parent_dir == 'inference': return_instances = ['Inf1'] elif parent_dir == 'training': return_instances = ['Trn1', 'Trn2', 'Trn3'] # Step 4: Legacy add/clear tag lists (only for non-overridden files) if not explicit_override: if _in_list(cur_file, add_trn1_tag): if 'Trn1' not in return_instances: return_instances.extend(['Trn1', 'Trn2', 'Trn3', 'Inf2']) if _in_list(cur_file, add_trn2_tag): if 'Trn2' not in return_instances: return_instances.extend(['Trn2', 'Trn3']) if _in_list(cur_file, add_trn3_tag): if 'Trn3' not in return_instances: return_instances.append('Trn3') if _in_list(cur_file, add_neuronx_tag): if 'Trn1' not in return_instances: return_instances.extend(['Trn1', 'Trn2', 'Trn3', 'Inf2']) if _in_list(cur_file, add_inf1_tag): if 'Inf1' not in return_instances: return_instances.append('Inf1') if _in_list(cur_file, clear_nc_v2_tag): for tag in ['Trn1', 'Trn2', 'Trn3', 'Inf2']: if tag in return_instances: return_instances.remove(tag) if _in_list(cur_file, clear_trn1_tag): if 'Trn1' in return_instances: return_instances.remove('Trn1') if _in_list(cur_file, clear_trn2_tag): if 'Trn2' in return_instances: return_instances.remove('Trn2') if _in_list(cur_file, clear_trn3_tag): if 'Trn3' in return_instances: return_instances.remove('Trn3') if _in_list(cur_file, clear_inf1_tag): if 'Inf1' in return_instances: return_instances.remove('Inf1') if _in_list(cur_file, clear_inf2_tag): if 'Inf2' in return_instances: return_instances.remove('Inf2') # Step 5: Generate output return_instances = sorted(set(return_instances)) if return_instances: text = TEXT_TEMPLATE + ', '.join('``' + i + '``' for i in return_instances) else: text = '' return self._render(text) def _render(self, text): """Parse RST text and return docutils nodes.""" rst = ViewList() rst.append(text, "neuron-tag", 1) node = nodes.section() node.document = self.state.document nested_parse_with_titles(self.state, rst, node) return node.children def setup(app): app.add_directive("neuron-tag", NeuronTag) return { 'version': '0.2', 'parallel_read_safe': True, 'parallel_write_safe': True, } ================================================ FILE: _ext/release-notes-automation-spec.md ================================================ # Release Notes Review Automation Specification ## Overview This specification defines a GitHub Action that automatically reviews release notes files in pull requests using Amazon Q CLI to ensure they meet quality standards defined in the release notes writing guidelines. ## Purpose Automate the review of release notes changes to: - Ensure consistency and quality across all release notes - Catch common issues before human review - Provide immediate feedback to PR authors - Reduce manual review burden on documentation team ## Scope ### In Scope - PRs labeled with "release-notes" - RST files under `/release-notes/components/` directory - Files that have been modified in the PR (not just added to context) - Automated review using Q CLI with release notes guidelines - Posting review feedback as PR comments ### Out of Scope - Release notes files outside `/release-notes/components/` - Non-RST files - PRs without the "release-notes" label - Manual approval/rejection of PRs (action only provides feedback) ## Requirements ### Functional Requirements #### FR1: PR Detection and Filtering - **FR1.1**: Action triggers on pull request events (opened, synchronize, labeled) - **FR1.2**: Action only runs when PR has "release-notes" label - **FR1.3**: Action identifies all changed RST files in `/release-notes/components/` directory #### FR2: File Analysis - **FR2.1**: Action reads content of each changed RST file - **FR2.2**: Action loads release notes guidelines from `_ext/release-notes-context.md` - **FR2.3**: Action processes files individually to provide file-specific feedback #### FR3: Q CLI Integration - **FR3.1**: Action invokes Amazon Q CLI with appropriate context - **FR3.2**: Action provides Q CLI with: - Release notes guidelines from `_ext/release-notes-context.md` - Content of the changed RST file - Instruction to review against guidelines - **FR3.3**: Action captures Q CLI output for each file #### FR4: Review Feedback - **FR4.1**: Action formats Q CLI feedback into readable PR comment - **FR4.2**: Action posts comment to PR with review results - **FR4.3**: Comment includes: - List of files reviewed - Issues found per file (using format from guidelines) - Suggested improvements - Link to full guidelines document - **FR4.4**: If no issues found, action posts positive confirmation #### FR5: Error Handling - **FR5.1**: Action handles Q CLI failures gracefully - **FR5.2**: Action reports when no RST files are found in scope - **FR5.3**: Action logs errors for debugging without failing the PR ### Non-Functional Requirements #### NFR1: Performance - Action completes review within 5 minutes for typical PRs (1-5 files) - Action processes files in parallel when possible #### NFR2: Security - Action uses GitHub secrets for Q CLI credentials - Action has read-only access to repository - Action has write access only to PR comments #### NFR3: Maintainability - Action configuration is version controlled in `.github/workflows/` - Action uses official Q CLI container/action when available - Action logic is simple and well-documented ## User Stories ### US1: Automatic Review Trigger **As a** documentation contributor **I want** the review action to run automatically when I label my PR **So that** I get immediate feedback without manual intervention **Acceptance Criteria:** - Action triggers when "release-notes" label is added - Action runs on subsequent commits to labeled PR - Action does not run on PRs without the label ### US2: Targeted File Review **As a** documentation contributor **I want** only my changed release notes files to be reviewed **So that** I get relevant feedback without noise from unchanged files **Acceptance Criteria:** - Only files in `/release-notes/components/*.rst` are reviewed - Only files modified in the PR are analyzed - Files in other directories are ignored ### US3: Clear Feedback **As a** documentation contributor **I want** clear, actionable feedback on my release notes **So that** I know exactly what to improve **Acceptance Criteria:** - Feedback follows the format specified in guidelines - Each issue includes: original text, problem, example rewrite, action items - Feedback is posted as a PR comment - Comment includes link to full guidelines ### US4: No False Failures **As a** documentation contributor **I want** the action to provide feedback without blocking my PR **So that** I can address issues without being blocked by automation **Acceptance Criteria:** - Action never fails the PR check - Action always succeeds even if issues are found - Issues are reported as comments, not check failures ## Technical Design ### GitHub Action Workflow **File Location:** `.github/workflows/release-notes-review.yml` **Trigger Events:** ```yaml on: pull_request: types: [opened, synchronize, labeled] paths: - 'release-notes/components/**/*.rst' ``` **Workflow Steps:** 1. **Check Label** - Verify PR has "release-notes" label - Exit gracefully if label not present 2. **Get Changed Files** - Use GitHub API to get list of changed files - Filter for `release-notes/components/**/*.rst` - Exit if no matching files found 3. **Setup Q CLI** - Install/configure Amazon Q CLI - Authenticate using GitHub secrets 4. **Load Guidelines** - Read `_ext/release-notes-context.md` - Prepare as context for Q CLI 5. **Review Each File** - For each changed RST file: - Read file content - Invoke Q CLI with prompt: ``` Review the following release notes file against the guidelines provided. Guidelines: [content from release-notes-context.md] File: [filename] Content: [file content] Provide feedback using the review format specified in the guidelines. Focus on: customer visibility, documentation links, impact clarity, specific conditions, and actionable information. ``` - Capture Q CLI response 6. **Format Feedback** - Combine all file reviews into single comment - Format as markdown with sections per file - Include summary at top 7. **Post Comment** - Post formatted feedback as PR comment - Include link to guidelines - Tag PR author ### Q CLI Prompt Template ```markdown You are reviewing release notes for the AWS Neuron SDK. Review the following file against the release notes writing guidelines. GUIDELINES: [Full content of _ext/release-notes-context.md] FILE TO REVIEW: {filename} CONTENT: {file_content} INSTRUCTIONS: 1. Review the content against all guidelines 2. Identify issues using the review format from the guidelines 3. For each issue, provide: - Issue number and title - Original text - Problem description - Phrasing problem (if applicable) - Example rewrite - Specific action items 4. If no issues found, state "No issues found - release notes meet guidelines" Focus especially on: - Customer-visible language (no internal code names) - Documentation URLs for all new features - Specific conditions (not vague language) - Clear impact statements - Proper categorization (breaking changes vs bug fixes) - Migration guidance for breaking changes ``` ### Comment Format Template ```markdown ## 🤖 Release Notes Review This PR modifies {count} release notes file(s). Here's the automated review: ### Files Reviewed - ✅ `release-notes/components/file1.rst` - {issue_count} issue(s) - ✅ `release-notes/components/file2.rst` - No issues found --- ### 📝 Review Feedback #### File: `release-notes/components/file1.rst` [Q CLI feedback for file1] --- #### File: `release-notes/components/file2.rst` [Q CLI feedback for file2] --- ### 📚 Resources - [Release Notes Writing Guidelines](_ext/release-notes-context.md) - Need help? Tag @documentation-team --- *This is an automated review. Please address the feedback and request human review when ready.* ``` ## Implementation Notes ### GitHub Action Configuration **Required Secrets:** - `Q_CLI_TOKEN` or equivalent for Q CLI authentication **Required Permissions:** ```yaml permissions: contents: read pull-requests: write ``` **Environment:** - Ubuntu latest runner - Node.js 18+ (if using JavaScript action) - Python 3.9+ (if using Python script) ### Q CLI Integration Options **Option 1: Direct CLI Invocation** ```bash q chat --prompt-file prompt.txt --context-file guidelines.md ``` **Option 2: Q CLI GitHub Action** (if available) ```yaml - uses: aws/q-cli-action@v1 with: prompt: ${{ steps.prepare.outputs.prompt }} context: ${{ steps.prepare.outputs.context }} ``` **Option 3: API Integration** (if Q provides API) ```python import q_cli response = q_cli.chat(prompt=prompt, context=guidelines) ``` ## Testing Strategy ### Unit Tests - Test file filtering logic - Test prompt generation - Test comment formatting ### Integration Tests - Test with sample PR containing valid release notes - Test with sample PR containing issues - Test with PR without "release-notes" label - Test with PR modifying non-component files ### Manual Testing - Create test PR with intentional issues - Verify action triggers correctly - Verify feedback is accurate and helpful - Verify comment formatting is readable ## Success Criteria 1. **Automation Works**: Action runs on 100% of labeled PRs 2. **Accurate Detection**: Action correctly identifies changed RST files 3. **Useful Feedback**: 80%+ of PR authors find feedback helpful 4. **No False Blocks**: Action never blocks valid PRs 5. **Performance**: Action completes within 5 minutes 6. **Reliability**: Action succeeds 95%+ of the time ## Future Enhancements ### Phase 2 (Optional) - Support for reviewing other release notes files (not just components) - Severity levels for issues (critical, warning, suggestion) - Auto-fix suggestions as code suggestions - Integration with PR review status - Metrics dashboard for common issues ### Phase 3 (Optional) - Pre-commit hook for local review - VS Code extension for real-time feedback - Training mode to help new contributors learn guidelines - Historical analysis of release notes quality trends ## Dependencies - GitHub Actions infrastructure - Amazon Q CLI availability and access - Repository write access for bot account - `_ext/release-notes-context.md` guidelines file ## Risks and Mitigations | Risk | Impact | Mitigation | |------|--------|------------| | Q CLI unavailable | High | Graceful failure with manual review fallback | | Q CLI rate limits | Medium | Implement retry logic and rate limiting | | False positives | Medium | Continuous refinement of guidelines and prompts | | Action performance | Low | Parallel processing and caching | | Cost of Q CLI usage | Low | Monitor usage and set budget alerts | ## Rollout Plan 1. **Phase 1**: Implement basic action with manual trigger 2. **Phase 2**: Enable automatic trigger on label 3. **Phase 3**: Gather feedback and refine prompts 4. **Phase 4**: Expand to other release notes files if successful ## Maintenance - **Owner**: Documentation team - **Review Frequency**: Quarterly - **Update Triggers**: - Changes to release notes guidelines - Q CLI updates - User feedback on accuracy - GitHub Actions platform changes ================================================ FILE: _ext/release-notes-context.md ================================================ # Release Notes Writing Guidelines ## Core Principles ### Answer Three Questions for Every Item - **What?** — What feature/API is affected? - **When?** — Under what conditions does this occur? - **So what?** — What is the impact on the user? ### All Content Must Be: - **Customer-visible** - Written from the customer's perspective about capabilities they can use - **Documented** - If documentation doesn't exist, exclude the feature. All new features must include documentation URLs. - **Actionable** - Include workarounds, timelines, or how to check if affected ## DO: - **Write in customer-visible terms** - Describe what customers can now do, not how it was implemented - **State the impact clearly** - Use concrete language about what happens to users - **Be specific about conditions** - Replace vague phrases with precise conditions - **Quantify performance improvements** - Provide specific before/after metrics (e.g., "improved from 2.164x to 3.654x speedup") and state the conditions that trigger these improvements (e.g., "for batch I/O operations with 1024 ops at 10KB") - **Explain the impact of wrong defaults** - When fixing incorrect default values, state what the wrong default was and what impact it had on users - **Specify what was missing** - When fixing "missing" items, list what was missing and confirm they are now documented - **Describe previous behavior for bugs** - Always explain what the incorrect behavior was before the fix - **Categorize breaking changes correctly** - If a bug fix changes API behavior (e.g., renaming a parameter), list it under Breaking Changes, not Bug Fixes - **Provide actionable information** - Include workarounds if available, fix timelines if known, or how users can check if they're affected - **Provide migration guidance for breaking changes** - Tell users what they should do when behavior changes, with before/after examples - **Link to documentation** - Every feature must have corresponding documentation with URL - **Include documentation URLs for all new features** - If no URL exists, either create documentation first or remove the feature from release notes - **Use standard terminology** - Use terms your audience already knows - **Use clear, descriptive sentences** - Transform technical phrases into customer-understandable language - **Focus on customer-visible results** - Describe what customers will see, not internal mechanics - **Drop unnecessary words** - Remove "when specified," "may," "is in progress" when they add no value - **Remove empty sections** - Don't include placeholder text like "None in this release" - **Verify accuracy** - Check version numbers, dates, and technical details - **Run IP scanner** - Catch any internal code name leaks before publishing - **Use active voice** - Write "The system ignores the parameter" instead of "The parameter is ignored" - **Define abbreviations on first use** - Write "time to first token (TTFT)" before using "TTFT" - **Remove temporal qualifiers** - Replace "for now" with specific timelines or remove entirely - **Provide concrete examples** - Include calculation examples for complex parameters ## DO NOT: - **Include internal code names** - Remove references like "TRN3PDS", "Mariana", "Penguin" - **Document undocumented features** - If documentation doesn't exist, exclude the feature - **Include features without documentation URLs** - Every new feature must have a documentation link - **List unreleased features** - Only include features available to customers - **Include internal-only metrics** - Remove metrics useful only internally - **Document bugs never released** - Only include fixes for publicly released issues - **Use internal API names** - Unless they're part of the public API - **Include debug variables** - Remove environment variables meant only for internal use - **Use vague language** - Avoid "in certain cases," "some patterns," "may sometimes" - **Use ambiguous phrasing** - Avoid phrases like "Fixed dynamic for loop" that could mean multiple things - **Leave impacts unexplained** - Don't just say "fixed wrong default" without explaining what the impact was - **Mix breaking changes with bug fixes** - Parameter renames or behavior changes belong in Breaking Changes, not Bug Fixes - **Create heavy noun chains** - Break up complex phrases (e.g., "dtype override was ignored during reshape" not "reshape dtype override not being applied") - **Write without context** - Every change needs metrics, conditions, or migration guidance - **Use hedging language** - Replace "may result in" with "results in" when deterministic - **Focus on internal implementation** - Avoid phrases like "internally uses" or internal platform identifiers - **Use passive voice without clear subject** - Avoid constructions where the actor is unclear - **Reference undefined versions** - Don't use "V0" or "V1" without defining them ## Impact Statements | Avoid | Prefer | |-------|--------| | "incorrectly interpret" | "produces incorrect results" | | "not being applied" | "is ignored" | | "failing check" | "crashes with validation error" | | "may incorrectly interpret tensor shapes" | "can produce incorrect results when transposing tensors" | ## Conditions - Be Specific | Avoid | Prefer | |-------|--------| | "in certain cases" | "when reduction axis is not the last dimension" | | "some patterns" | "multi-dimensional transposes with more than 2 axes" | | "may sometimes" | "consistently occurs when..." | | "for now" | "Support is planned for version X.X.X" or remove entirely | | "small inputs" | "inputs under 512 tokens" | | "low batch sizes" | "batch sizes of 4 or less" | ## Phrasing Examples ### Bug Fixes: | Avoid | Prefer | |-------|--------| | "Fixed bug in nrt_vnc_usage_find_internal" | "Improved error handling to return a clear error instead of asserting during nrt_init" | | "Fixed dynamic for loop incorrectly incrementing the loop induction variable" | "Fixed: dynamic for loops now correctly increment the loop counter. Previously, the counter incremented incorrectly, causing [specific impact]" | | "Fixed reshape dtype override not being applied when specified" | "Fixed a bug where specifying a data type override during a reshape operation was ignored" | | "Fixed reshape of shared/private HBM tensors failing partition size check" | "Fixed a bug where reshaping tensors stored in shared or private HBM incorrectly failed the partition size check" | | "Fixed incorrect default value for on_false_value" | "Fixed incorrect default value for on_false_value in nki.isa.range_select. Previously defaulted to [X], now correctly defaults to [Y], which [impact]" | ### Performance Improvements: | Avoid | Prefer | |-------|--------| | "Optimized zero-copy operations by enabling descriptor merging" | "Enhanced zero-copy operation performance: Write performance improved from 2.164x to 3.654x speedup for batch I/O operations(1_Batch_1024_Ops_10_KBs)" | | "Optimized mesh AllGather on TP8 configurations using destination routing" | "Optimized mesh AllGather: [X]% performance improvement on TP8 configurations when [specific conditions]" | ### New Features: | Avoid | Prefer | |-------|--------| | "Added support for TRN3PDS platform" | "Added support for [public instance type name] with optimized topology configurations for distributed training. See [documentation URL]" | | "Added IOCTL to lookup Neuron device/HBM for a given virtual address" | "Added capability to lookup Neuron device for a given virtual address, enabling frameworks to identify which device holds a tensor. See [documentation link] for API details" | ### Known Issues: | Avoid | Prefer | |-------|--------| | "may incorrectly interpret tensor shapes in certain multi-dimensional transpose patterns" | "can produce incorrect results when transposing tensors with certain multi-dimensional shapes" | | "Training, Inference, and Penguin kernels compilation and execution validation is in progress" | Remove entirely (internal project name and not customer-actionable) | | "Chunked prefill is not supported on Neuron for now" | "Chunked prefill is not supported. If you attempt to enable it with DISABLE_NEURON_CUSTOM_SCHEDULER='1', the system will fail to start with an error. Use standard prefill mode instead." | ## Breaking Changes Checklist When documenting breaking changes, always include: 1. **What changed** - The specific API, parameter, or behavior 2. **Why it's breaking** - What will stop working 3. **Migration path** - What users should do instead 4. **Example (if helpful)** - Show old vs. new usage ### Example: **Breaking:** NumPy synonyms (e.g., `np.add` for `nl.add`) are no longer accepted in NKI API calls. **Migration:** Replace all NumPy function calls with their NKI equivalents: - Replace `np.add(x, y)` with `nl.add(x, y)` - Replace `np.multiply(x, y)` with `nl.multiply(x, y)` Always explain: - Why is this breaking? - What was the previous behavior? - What is the workaround or migration effort? ## Quick Template ``` [Fixed/Known Issue]: [API/Feature] [impact] when [specific conditions]. [Optional: Workaround or timeline.] ``` ### Example: ``` Fixed: nki.isa.dma_copy causes a runtime timeout when copying FP32 from SBUF to BF16 in HBM with indirect addressing. Workaround: cast to BF16 in SBUF before copying. ``` ## Quality Checks Before Publishing 1. **No internal names** - Run IP scanner to catch code name leaks 2. **Customer value** - Each item explains why customers should care 3. **Documentation links** - New features link to relevant docs with URLs 4. **Documentation exists** - Verify all features are documented before including; if no documentation URL exists, remove the feature from release notes 5. **Accuracy** - Technical details are correct and verifiable 6. **Clarity** - Phrasing is clear and professional 7. **Completeness** - Previous behavior and migration paths explained 8. **Impact explained** - Bug fixes describe what was broken and what the impact was 9. **Active voice** - Sentences use active voice with clear subjects 10. **Abbreviations defined** - All abbreviations spelled out on first use 11. **No vague language** - All conditions and impacts are specific and quantified 12. **Examples provided** - Complex parameters include calculation examples ## Key Principles ### All content must be: - **Customer-visible** (not internal implementation details) - **Documented with URLs** (if docs don't exist, exclude it) - **Impactful** (explain value, not just what changed) ### Every bug fix must answer: - What was broken? - What was the impact? - What works now? ### Every new feature must include: - Documentation URL - Customer benefit - Usage guidance or examples ## How to Review Release Notes When reviewing release notes against these guidelines, provide feedback in the following format: ### Issue [Number]: [Brief Issue Title] **Original Text:** ``` [Exact text from the release notes] ``` **Problem:** [Description of the content/completeness issue] **Phrasing Problem:** [Description of the language/clarity issue, if applicable] **Example Rewrite:** ``` [Suggested improved version showing correct phrasing and content] ``` **Action:** - [Specific action item 1] - [Specific action item 2] ## Review Process: 1. **Extract original text** - Include the exact text being reviewed 2. **Identify problems** - Separate content issues from phrasing issues 3. **Provide examples** - Show how to rewrite the text correctly 4. **List actions** - Give specific, actionable steps to fix each issue 5. **Check documentation** - Verify URLs exist for all new features; if not, recommend removal 6. **Verify completeness** - Ensure all three questions (What? When? So what?) are answered 7. **Check phrasing** - Identify vague language, passive voice, undefined terms, internal references 8. **Validate breaking changes** - Ensure migration guidance and before/after examples are included ================================================ FILE: _ext/sphinx_plotly_directive.py ================================================ """ CODE FROM: https://github.com/harupy/sphinx-plotly-directive LICENSE: MIT Based on: https://matplotlib.org/3.1.3/devel/plot_directive.html A directive for including a Plotly figure in a Sphinx document ================================================================ By default, in HTML output, `plot` will include a .png file with a link to a high-res .png and .pdf. In LaTeX output, it will include a .pdf. The source code for the plot may be included in one of three ways: 1. **A path to a source file** as the argument to the directive:: .. plot:: path/to/plot.py When a path to a source file is given, the content of the directive may optionally contain a caption for the plot:: .. plot:: path/to/plot.py The plot's caption. Additionally, one may specify the name of a function to call (with no arguments) immediately after importing the module:: .. plot:: path/to/plot.py plot_function1 2. Included as **inline content** to the directive:: .. plotly:: import plotly.express as px px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) 3. Using **doctest** syntax:: .. plotly:: A plotting example: >>> import plotly.express as px >>> px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) 4. Using the `fig-vars` option. In the example below, `fig1` and `fig2` will be rendered:: .. plotly:: :fig-vars: fig1, fig2 import plotly.express as px fig1 = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) fig2 = px.scatter(x=[4, 3, 2, 1, 0], y=[0, 1, 4, 9, 16]) Options ------- The ``plotly`` directive supports the following options: format : {'python', 'doctest'} The format of the input. include-source : bool Whether to display the source code. The default can be changed using the `plot_include_source` variable in :file:`conf.py`. encoding : str If this source file is in a non-UTF8 or non-ASCII encoding, the encoding must be specified using the ``:encoding:`` option. The encoding will not be inferred using the ``-*- coding -*-`` metacomment. context : bool or str If provided, the code will be run in the context of all previous plot directives for which the ``:context:`` option was specified. This only applies to inline code plot directives, not those run from files. If the ``:context: reset`` option is specified, the context is reset for this and future plots, and previous figures are closed prior to running the code. ``:context: close-figs`` keeps the context but closes previous figures before running the code. nofigs : bool If specified, the code block will be run, but no figures will be inserted. This is usually useful with the ``:context:`` option. caption : str If specified, the option's argument will be used as a caption for the figure. This overwrites the caption given in the content, when the plot is generated from a file. iframe-width The width of the iframe in which a plotly figure is rendered. The default can be changed using the `plotly_iframe_width` variable in :file:`conf.py`. iframe-height The height of the iframe in which a plotly figure is rendered. The default can be changed using the `plotly_iframe_height` variable in :file:`conf.py`. Additionally, this directive supports all of the options of the `image` directive, except for *target* (since plot will add its own target). These include *alt*, *height*, *width*, *scale*, *align* and *class*. Configuration options --------------------- The plot directive has the following configuration options: plotly_include_source Default value for the include-source option plotly_html_show_source_link Whether to show a link to the source in HTML. plotly_pre_code Code that should be executed before each plot. If not specified or None it will default to a string containing:: import numpy as np import plotly import plotly.graph_objects as go import plotly.express as px plotly_basedir Base directory, to which ``plot::`` file names are relative to. (If None or empty, file names are relative to the directory where the file containing the directive is.) plotly_formats File formats to generate. List of tuples or strings:: [(suffix, dpi), suffix, ...] that determine the file format and the DPI. For entries whose DPI was omitted, sensible defaults are chosen. When passing from the command line through sphinx_build the list should be passed as suffix:dpi,suffix:dpi, ... plotly_html_show_formats Whether to show links to the files in HTML. plotly_working_directory By default, the working directory will be changed to the directory of the example, so the code can get at its data files, if any. Also its path will be added to `sys.path` so it can import any helper modules sitting beside it. This configuration option can be used to specify a central directory (also added to `sys.path`) where data files and helper modules for all code are located. plotly_iframe_width The width of the iframe in which a plotly figure is rendered. The default is "100%". plotly_iframe_height The height of the iframe in which a plotly figure is rendered. The default is "500px". plotly_template Provide a customized template for preparing restructured text. """ import copy import itertools import os import re import shutil import textwrap import traceback from os.path import relpath from pathlib import Path import jinja2 # Sphinx dependency. from docutils.parsers.rst import Directive, directives from docutils.parsers.rst.directives.images import Image import re import textwrap import plotly INDENT_SPACES = " " * 3 def save_plotly_figure(fig, path): r""" Save a Plotly figure. Parameters ---------- fig : plotly figure A plotly figure to save. path : str A file path. Returns ------- None Examples -------- >>> import plotly.express as px >>> import tempfile >>> fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) >>> path = tempfile.NamedTemporaryFile(suffix=".html").name >>> save_plotly_figure(fig, path) """ fig_html = plotly.offline.plot(fig, output_type="div", include_plotlyjs="cdn", auto_open=False) with open(path, "w") as f: f.write(fig_html) def assign_last_line_into_variable(code, variable_name): r""" Save a Plotly figure. Parameters ---------- code : str A string representing code. name : str A variable name. Returns ------- str Mew code. Examples -------- >>> code = "a = 1\nfunc(a)" >>> new_code = assign_last_line_into_variable(code, "b") >>> print(new_code) a = 1 b = func(a) """ lines = code.split("\n") for idx in range(len(lines) - 1, -1, -1): if lines[idx].strip() != "": lines[idx] = "{} = ".format(variable_name) + lines[idx] break return "\n".join(lines) def create_directive_block(name, arguments, options, content): r""" Create a directive block. Parameters ---------- name : str A directive name. arguments : list of str Arguments of the directive. option : dict Option of the directive. content : list of str Content of the directive. Returns ------- str A directive block. Examples -------- >>> block = create_directive_block( ... "plotly", ... ["f1", "f2"], ... {"a": 0, "b": 1}, ... ["l1", "l2"], ... ) >>> print(block) .. plotly:: f1 f2 :a: 0 :b: 1 l1 l2 """ header = ".. {}:: ".format(name) + " ".join(arguments) code = "\n".join(map(str, content)) lines = [header] if len(options.items()) > 0: def process_value(v): if isinstance(v, list): return ", ".join(v) return v options_block = "\n".join(":{}: {}".format(k, process_value(v)) for k, v in options.items()) lines.append(textwrap.indent(options_block, INDENT_SPACES)) lines.append("") lines.append(textwrap.indent(code, INDENT_SPACES)) return "\n".join(lines) def create_code_block(code, language=None): return "\n".join( [ ".. code-block::{}".format(" " + language if language else ""), "", textwrap.indent(code.strip(), INDENT_SPACES), "", ] ) def strip_last_line(code): r""" Strips the last line of the give code block Parameters ---------- code : str Code to strip Returns ------- str: Stripped code Examples -------- >>> strip_last_line("a") '' >>> strip_last_line("a\nb") 'a' >>> strip_last_line("a\nb\nc") 'a\nb' """ return "\n".join(code.strip().split("\n")[:-1]) def ends_with_show(code): r""" Returns True if the last line of the given code block ends with `show()` Parameters ---------- code : str Code that may contain a line that looks like `fig.show()` Returns ------- str: Variable name of the object that calls `show()` Examples -------- >>> ends_with_show("fig.show()") # simple True >>> ends_with_show("fig.show(1, a=2)") # show with arguments True >>> ends_with_show("fig = dummy\nfig.show()\n") # multiline True >>> ends_with_show("foo") # doesn't contains `show` False """ # TODO: Use a more strict regular expression pattern = r"^(.+)\.show\(.*\)$" match = re.search(pattern, code.strip().split("\n")[-1], flags=re.DOTALL) return bool(match) # ----------------------------------------------------------------------------- # Registration hook # ----------------------------------------------------------------------------- def _option_boolean(arg): if not arg or not arg.strip(): # no argument given, assume used as a flag return True elif arg.strip().lower() in ("no", "0", "false"): return False elif arg.strip().lower() in ("yes", "1", "true"): return True else: raise ValueError('"%s" unknown boolean' % arg) def _option_context(arg): if arg in [None, "reset", "close-figs"]: return arg raise ValueError("Argument should be None or 'reset' or 'close-figs'") def _option_format(arg): return directives.choice(arg, ("python", "doctest")) def _option_fig_vars(arg): return [x.strip() for x in arg.split(",")] def mark_plot_labels(app, document): """ To make plots referenceable, we need to move the reference from the "htmlonly" (or "latexonly") node to the actual figure node itself. """ for name, explicit in document.nametypes.items(): if not explicit: continue labelid = document.nameids[name] if labelid is None: continue node = document.ids[labelid] if node.tagname in ("html_only", "latex_only"): for n in node: if n.tagname == "figure": sectname = name for c in n: if c.tagname == "caption": sectname = c.astext() break node["ids"].remove(labelid) node["names"].remove(name) n["ids"].append(labelid) n["names"].append(name) document.settings.env.labels[name] = ( document.settings.env.docname, labelid, sectname, ) break class PlotlyDirective(Directive): """The ``.. plotly::`` directive, as documented in the module's docstring.""" has_content = True required_arguments = 0 optional_arguments = 2 final_argument_whitespace = False option_spec = { "alt": directives.unchanged, "height": directives.length_or_unitless, "width": directives.length_or_percentage_or_unitless, "scale": directives.nonnegative_int, "align": Image.align, "class": directives.class_option, "include-source": _option_boolean, "format": _option_format, "context": _option_context, "nofigs": directives.flag, "encoding": directives.encoding, "caption": directives.unchanged, "fig-vars": _option_fig_vars, "iframe-width": directives.unchanged, "iframe-height": directives.unchanged, } def run(self): """Run the plot directive.""" try: return run( self.arguments, self.content, self.options, self.state_machine, self.state, self.lineno, ) except Exception as e: raise self.error(str(e)) def setup(app): setup.app = app setup.config = app.config setup.confdir = app.confdir app.add_directive("plotly", PlotlyDirective) app.add_config_value("plotly_pre_code", None, True) app.add_config_value("plotly_include_source", False, True) app.add_config_value("plotly_html_show_source_link", True, True) app.add_config_value("plotly_formats", ["html"], True) app.add_config_value("plotly_basedir", None, True) app.add_config_value("plotly_html_show_formats", True, True) app.add_config_value("plotly_working_directory", None, True) app.add_config_value("plotly_iframe_width", "100%", True) app.add_config_value("plotly_iframe_height", "500px", True) app.add_config_value("plotly_template", None, True) app.add_config_value("plotly_include_directive_source", None, False) app.connect("doctree-read", mark_plot_labels) metadata = { "parallel_read_safe": True, "parallel_write_safe": True, "version": 0.1, } return metadata # ----------------------------------------------------------------------------- # Doctest handling # ----------------------------------------------------------------------------- def contains_doctest(text): try: # check if it's valid Python as-is compile(text, "", "exec") return False except SyntaxError: pass r = re.compile(r"^\s*>>>", re.M) m = r.search(text) return bool(m) def unescape_doctest(text): """ Extract code from a piece of text, which contains either Python code or doctests. """ if not contains_doctest(text): return text code = "" for line in text.split("\n"): m = re.match(r"^\s*(>>>|\.\.\.) (.*)$", line) if m: code += m.group(2) + "\n" elif line.strip(): code += "# " + line.strip() + "\n" else: code += "\n" return code def split_code_at_show(text): """Split code at plt.show().""" parts = [] is_doctest = contains_doctest(text) part = [] for line in text.split("\n"): if (not is_doctest and line.strip() == "plt.show()") or ( is_doctest and line.strip() == ">>> plt.show()" ): part.append(line) parts.append("\n".join(part)) part = [] else: part.append(line) if "\n".join(part).strip(): parts.append("\n".join(part)) return parts # ----------------------------------------------------------------------------- # Template # ----------------------------------------------------------------------------- TEMPLATE = """ {% if directive_source %} Source: {{ directive_source }} Output: {% endif %} {{ source_code }} .. only:: html {% if source_link or (html_show_formats and not multi_image) %} ( {%- if source_link -%} `Source code <{{ source_link }}>`__ {%- endif -%} {%- if html_show_formats and not multi_image -%} {%- for fig in figures -%} {%- for fmt in fig.formats -%} {%- if source_link or not loop.first -%}, {% endif -%} `{{ fmt }} <{{ dest_dir }}/{{ fig.basename }}.{{ fmt }}>`__ {%- endfor -%} {%- endfor -%} {%- endif -%} ) {% endif %} {% for fig in figures %} .. raw:: html {% for option in options -%} {{ option }} {% endfor %} {% if html_show_formats and multi_figure -%} ( {%- for fmt in fig.formats -%} {%- if not loop.first -%}, {% endif -%} `{{ fmt }} <{{ dest_dir }}/{{ fig.basename }}.{{ fmt }}>`__ {%- endfor -%} ) {%- endif -%} {{ caption }} {% endfor %} .. only:: not html {% for fig in figures %} .. raw:: html {% for option in options -%} {{ option }} {% endfor %} {{ caption }} {% endfor %} """ exception_template = """ .. only:: html [`source code <%(linkdir)s/%(basename)s.py>`__] Exception occurred rendering plot. """ # the context of the plot for all directives specified with the # :context: option plot_context = dict() class FigureFile: def __init__(self, basename, dirname): self.basename = basename self.dirname = dirname self.formats = [] def filename(self, format): return os.path.join(self.dirname, "%s.%s" % (self.basename, format)) def filenames(self): return [self.filename(fmt) for fmt in self.formats] def out_of_date(original, derived): """ Return whether *derived* is out-of-date relative to *original*, both of which are full file paths. """ return not os.path.exists(derived) or ( os.path.exists(original) and os.stat(derived).st_mtime < os.stat(original).st_mtime ) class PlotError(RuntimeError): pass def run_code(code, code_path, ns=None, function_name=None, fig_vars=None): """ Import a Python module from a path, and run the function given by name, if function_name is not None. """ # Change the working directory to the directory of the example, so # it can get at its data files, if any. Add its path to sys.path # so it can import any helper modules sitting beside it. pwd = os.getcwd() if setup.config.plotly_working_directory is not None: try: os.chdir(setup.config.plotly_working_directory) except OSError as err: raise OSError( str(err) + "\n`plot_working_directory` option in" "Sphinx configuration file must be a valid " "directory path" ) from err except TypeError as err: raise TypeError( str(err) + "\n`plot_working_directory` option in " "Sphinx configuration file must be a string or " "None" ) from err elif code_path is not None: dirname = os.path.abspath(os.path.dirname(code_path)) os.chdir(dirname) try: code = unescape_doctest(code) if ns is None: ns = {} if not ns: if setup.config.plotly_pre_code is None: exec( "\n".join( [ "import numpy as np", "import plotly", "import plotly.graph_objects as go", "import plotly.express as px", ] ), ns, ) else: exec(str(setup.config.plotly_pre_code), ns) if "__main__" in code: ns["__name__"] = "__main__" variable_name = "fig" if ends_with_show(code): exec(strip_last_line(code), ns) figs = [ns[fig_var] for fig_var in fig_vars] if fig_vars else [ns[variable_name]] elif function_name is not None: exec(code, ns) exec(assign_last_line_into_variable(function_name + "()", variable_name), ns) figs = [ns[variable_name]] elif fig_vars: exec(code, ns) figs = [ns[fig_var] for fig_var in fig_vars] else: exec(assign_last_line_into_variable(code, variable_name), ns) figs = [ns[variable_name]] except (Exception, SystemExit) as err: raise PlotError(traceback.format_exc()) from err finally: os.chdir(pwd) return figs def get_plot_formats(config): default_dpi = {"html": 0} formats = [] plot_formats = config.plotly_formats for fmt in plot_formats: if isinstance(fmt, str): if ":" in fmt: suffix, dpi = fmt.split(":") formats.append((str(suffix), int(dpi))) else: formats.append((fmt, default_dpi.get(fmt, 80))) elif isinstance(fmt, (tuple, list)) and len(fmt) == 2: formats.append((str(fmt[0]), int(fmt[1]))) else: raise PlotError('invalid image format "%r" in plot_formats' % fmt) return formats def render_figures( code, code_path, output_dir, output_base, context, function_name, config, context_reset=False, close_figs=False, fig_vars=None, ): """ Run a pyplot script and save the images in *output_dir*. Save the images under *output_dir* with file names derived from *output_base* """ formats = get_plot_formats(config) # -- Try to determine if all images already exist code_pieces = split_code_at_show(code) # Look for single-figure output files first all_exists = True fig = FigureFile(output_base, output_dir) for format, dpi in formats: if out_of_date(code_path, fig.filename(format)): all_exists = False break fig.formats.append(format) if all_exists: return [(code, [fig])] # Then look for multi-figure output files results = [] all_exists = True for i, code_piece in enumerate(code_pieces): figures = [] for j in itertools.count(): if len(code_pieces) > 1: fig = FigureFile("%s_%02d_%02d" % (output_base, i, j), output_dir) else: fig = FigureFile("%s_%02d" % (output_base, j), output_dir) for fmt, dpi in formats: if out_of_date(code_path, fig.filename(fmt)): all_exists = False break fig.formats.append(fmt) # assume that if we have one, we have them all if not all_exists: all_exists = j > 0 break figures.append(fig) if not all_exists: break results.append((code_piece, figures)) if all_exists: return results # We didn't find the files, so build them results = [] if context: ns = plot_context else: ns = {} if context_reset: plot_context.clear() close_figs = not context or close_figs for i, code_piece in enumerate(code_pieces): if not context: pass elif close_figs: pass fig_objects = run_code(code_piece, code_path, ns, function_name, fig_vars) figures = [] for j, fig_obj in enumerate(fig_objects): if len(fig_objects) == 1 and len(code_pieces) == 1: fig = FigureFile(output_base, output_dir) elif len(code_pieces) == 1: fig = FigureFile("%s_%02d" % (output_base, j), output_dir) else: fig = FigureFile("%s_%02d_%02d" % (output_base, i, j), output_dir) figures.append(fig) for fmt, dpi in formats: try: save_plotly_figure(fig_obj, fig.filename(fmt)) except Exception as err: raise PlotError(traceback.format_exc()) from err fig.formats.append(fmt) results.append((code_piece, figures)) if not context: pass return results def run(arguments, content, options, state_machine, state, lineno): document = state_machine.document config = document.settings.env.config nofigs = "nofigs" in options formats = get_plot_formats(config) default_fmt = formats[0][0] options_copy = copy.deepcopy(options) options.setdefault("include-source", config.plotly_include_source) options.setdefault("iframe-width", config.plotly_iframe_width) options.setdefault("iframe-height", config.plotly_iframe_height) keep_context = "context" in options context_opt = None if not keep_context else options["context"] rst_file = document.attributes["source"] rst_dir = os.path.dirname(rst_file) if len(arguments): if not config.plotly_basedir: source_file_name = os.path.join(setup.app.builder.srcdir, directives.uri(arguments[0])) else: source_file_name = os.path.join( setup.confdir, config.plotly_basedir, directives.uri(arguments[0]) ) # If there is content, it will be passed as a caption. caption = "\n".join(content) # Enforce unambiguous use of captions. if "caption" in options: if caption: raise ValueError( "Caption specified in both content and options." " Please remove ambiguity." ) # Use caption option caption = options["caption"] # If the optional function name is provided, use it if len(arguments) == 2: function_name = arguments[1] else: function_name = None code = Path(source_file_name).read_text(encoding="utf-8") output_base = os.path.basename(source_file_name) else: source_file_name = rst_file code = textwrap.dedent("\n".join(map(str, content))) counter = document.attributes.get("_plot_counter", 0) + 1 document.attributes["_plot_counter"] = counter base, ext = os.path.splitext(os.path.basename(source_file_name)) output_base = "%s-%d.py" % (base, counter) function_name = None caption = options.get("caption", "") base, source_ext = os.path.splitext(output_base) if source_ext in (".py", ".rst", ".txt"): output_base = base else: source_ext = "" # ensure that LaTeX includegraphics doesn't choke in foo.bar.pdf filenames output_base = output_base.replace(".", "-") # is it in doctest format? is_doctest = contains_doctest(code) if "format" in options: if options["format"] == "python": is_doctest = False else: is_doctest = True # determine output directory name fragment source_rel_name = relpath(source_file_name, setup.confdir) source_rel_dir = os.path.dirname(source_rel_name) while source_rel_dir.startswith(os.path.sep): source_rel_dir = source_rel_dir[1:] # build_dir: where to place output files (temporarily) build_dir = os.path.join( os.path.dirname(setup.app.doctreedir), "plot_directive", source_rel_dir ) # get rid of .. in paths, also changes pathsep # see note in Python docs for warning about symbolic links on Windows. # need to compare source and dest paths at end build_dir = os.path.normpath(build_dir) if not os.path.exists(build_dir): os.makedirs(build_dir) # output_dir: final location in the builder's directory dest_dir = os.path.abspath(os.path.join(setup.app.builder.outdir, source_rel_dir)) if not os.path.exists(dest_dir): os.makedirs(dest_dir) # no problem here for me, but just use built-ins # how to link to files from the RST file dest_dir_link = os.path.join(relpath(setup.confdir, rst_dir), source_rel_dir).replace( os.path.sep, "/" ) try: build_dir_link = relpath(build_dir, rst_dir).replace(os.path.sep, "/") except ValueError: # on Windows, relpath raises ValueError when path and start are on # different mounts/drives build_dir_link = build_dir source_link = dest_dir_link + "/" + output_base + source_ext # make figures try: results = render_figures( code, source_file_name, build_dir, output_base, keep_context, function_name, config, context_reset=context_opt == "reset", close_figs=context_opt == "close-figs", fig_vars=options.get("fig-vars"), ) errors = [] except PlotError as err: reporter = state.memo.reporter sm = reporter.system_message( 2, "Exception occurred in plotting {}\n from {}:\n{}".format( output_base, source_file_name, err ), line=lineno, ) results = [(code, [])] errors = [sm] # Properly indent the caption caption = "\n".join(" " + line.strip() for line in caption.split("\n")) # generate output restructuredtext total_lines = [] for j, (code_piece, figures) in enumerate(results): if options["include-source"]: if is_doctest: lines = ["", *code_piece.splitlines()] else: lines = [ ".. code-block:: python", "", *textwrap.indent(code_piece, " ").splitlines(), ] source_code = "\n".join(lines) else: source_code = "" if nofigs: figures = [] opts = [ ":%s: %s" % (key, val) for key, val in options.items() if key in ("alt", "height", "width", "scale", "align", "class") ] # Not-None src_link signals the need for a source link in the generated # html if j == 0 and config.plotly_html_show_source_link: src_link = source_link else: src_link = None if config.plotly_include_directive_source: directive_source = create_directive_block("plotly", arguments, options_copy, content) directive_source = create_code_block(directive_source, "text") else: directive_source = "" result = jinja2.Template(config.plotly_template or TEMPLATE).render( directive_source=directive_source, default_fmt=default_fmt, dest_dir=dest_dir_link, build_dir=build_dir_link, source_link=src_link, multi_figure=len(figures) > 1, options=opts, figures=figures, iframe_width=options["iframe-width"], iframe_height=options["iframe-height"], source_code=source_code, html_show_formats=config.plotly_html_show_formats and len(figures), caption=caption, ) total_lines.extend(result.split("\n")) total_lines.extend("\n") if total_lines: state_machine.insert_input(total_lines, source=source_file_name) # copy image files to builder's output directory, if necessary Path(dest_dir).mkdir(parents=True, exist_ok=True) for code_piece, figures in results: for fig in figures: for fn in fig.filenames(): destfig = os.path.join(dest_dir, os.path.basename(fn)) if fn != destfig: shutil.copyfile(fn, destfig) # copy script (if necessary) Path(dest_dir, output_base + source_ext).write_text( unescape_doctest(code) if source_file_name == rst_file else code, encoding="utf-8", ) return errors ================================================ FILE: _ext/symlink.py ================================================ from docutils import nodes from docutils.parsers.rst import Directive, directives import os, sys def remove_symlink_handler(app, exception): dst = './src' if os.path.exists(dst): if os.path.isdir(dst): if os.path.islink(dst): os.unlink(dst) else: shutil.rmtree(dst) else: if os.path.islink(dst): os.unlink(dst) else: os.remove(dst) def setup(app): app.connect('build-finished', remove_symlink_handler) src = '../src' dst = './src' # This creates a symbolic link on python in tmp directory if os.path.exists(dst): if os.path.isdir(dst): if os.path.islink(dst): os.unlink(dst) else: shutil.rmtree(dst) else: if os.path.islink(dst): os.unlink(dst) else: os.remove(dst) os.symlink(src, dst) return { 'version': '1.0', 'parallel_read_safe': True, 'parallel_write_safe': True, } ================================================ FILE: _static/css/custom.css ================================================ .xxtable-smaller-font-size p, strong { font-size:0.9em; } .ablog-post-title p { font-size:0.9em; } .ablog-post p { font-size:0.9em; } .sphinx-design-class-title-small { font-size:0.9em; } .sphinx-design-class-title-med { font-size:1em; } .sphinx-design-class-body-small { font-size:0.9em; } h1{font-size:2em;} h2{font-size:1.5em;} h3{font-size:1.3em;} h4{font-size:1.2em;} div.topic { font-size:0.85em; } li.toctree-l1 { font-size:0.95em; } th , tr, td { white-space: normal !important; } th { font-size:0.90em; } .ff th , tr, td{ font-size:0.90em; white-space: normal !important; } .ff div.section.p { font-size:0.8em; } hr { border-color: #0000DD; height: 2px; } ================================================ FILE: _static/css/custom.css.new ================================================ .table-smaller-font-size p, strong { font-size: 90%; } td, th , tr { white-space: normal !important; } /* Fixes the size of the RTD flyout */ /* .rst-versions { width: 320px !important; } */ /* Content area color */ .wy-nav-content { background: #ffffff; } /* Scroll Bar*/ .wy-side-scroll { width: auto; overflow-y: auto; margin-top: 0px; } /* width of the side panel */ .wy-nav-side { width: 320px; } /* content section full screen */ .wy-nav-content { max-width: none; } /* set color of left side bar */ .wy-nav-side,.wy-side-nav-search,.wy-nav-top { /*background: #0079c1; /*005eb8 */ background: #ffffff; } /* Change caption color to be more legible */ .wy-menu > .caption > span.caption-text { color: #000000; font-size: 20px; } /* Change the version color to match caption color */ .wy-side-nav-search>div.version { color: #000000; } /* Get rid of that ugly yellow highlight color and replace with something more appealing to the eye */ .highlight .hll { background-color: #ffffff; } /* @media screen and (max-width: 768px) { .wy-nav-content-wrap { margin-left: 0px; } .wy-nav-side { width: 500px; } } */ ================================================ FILE: _templates/recentposts.html ================================================ {% if ablog %}

{{ gettext('Recent Posts') }}

    {% set pcount = 1 %} {% for recent in ablog.recent(10, pagename) %}
  • {{ recent.title }}
  • {% endfor %}
{% endif %} ================================================ FILE: _templates/search-field.html ================================================
Search Engine: Default Google
================================================ FILE: _templates/search-google.html ================================================ {%- extends "page.html" %} {# Over-ride the body to be custom search structure we want #} {% block docs_body %}

{{ _("Search") }}

{{ _('Search Results') }}

{% endblock docs_body %} {# Below sections just re-create the behavior of Sphinx default search #} {# Page metadata #} {%- block htmltitle -%} {{ _("Search") }} - {{ title or docstitle }} {%- endblock htmltitle -%} {# Manually include the search JS that Sphinx includes #} {% block scripts -%} {{ super() }} {%- endblock scripts %} ================================================ FILE: _templates/search.html ================================================ {%- extends "page.html" %} {# Over-ride the body to be custom search structure we want #} {% block docs_body %}

{{ _("Search") }}

{% endblock docs_body %} {# Below sections just re-create the behavior of Sphinx default search #} {# Page metadata #} {%- block htmltitle -%} {{ _("Search") }} - {{ title or docstitle }} {%- endblock htmltitle -%} {# Manually include the search JS that Sphinx includes #} {% block scripts -%} {{ super() }} {%- endblock scripts %} ================================================ FILE: _utilities/JIRA_SETUP_QUICKSTART.md ================================================ # Jira Integration Quick Start ## Prerequisites Check Run these commands to verify you have everything installed: ```bash # Check AWS CLI aws --version # Check ada credentials tool ada --version # Check Python 3 python3 --version # Check if uvx is available (for MCP server) uvx --version ``` If any are missing, install them: ```bash # AWS CLI brew install awscli # ada credentials tool toolbox install ada # uv (includes uvx) brew install uv ``` ## One-Time Setup ### 1. Configure Ada Credentials ```bash ada credentials setup ``` When prompted: - **Account**: 621547421844 - **Role**: Admin - **Profile name**: kaena ### 2. Add Kaena Profile to AWS Config ```bash echo '[profile kaena] credential_process='$HOME'/.toolbox/bin/ada credentials print --profile=kaena' >> ~/.aws/config ``` ### 3. Run the Setup Script ```bash cd /path/to/aws-neuron-sdk-staging chmod +x _utilities/setup_jira_token.sh ./_utilities/setup_jira_token.sh ``` This script will: - Fetch the Jira API token from AWS Secrets Manager - Update your MCP configuration with the token - Verify everything is set up correctly ### 4. Restart Kiro After running the setup script, restart Kiro CLI to load the new MCP server. ## Using Jira in Kiro Once set up, you can use Kiro Powers to interact with Jira: ```bash # In Kiro CLI, check available powers kiro powers list # Look for Atlassian/Jira related tools ``` ## Manual Verification To manually verify the setup worked: ```bash # Check MCP config has Jira server cat ~/.kiro/settings/mcp.json | grep -A 10 atlassian-jira # Test AWS Secrets Manager access export AWS_PROFILE=kaena aws secretsmanager get-secret-value \ --secret-id NKI_JIRA_API_TOKEN \ --region us-west-2 \ --query SecretString \ --output text ``` ## Troubleshooting ### "Error: Failed to fetch Jira API token" 1. Verify ada credentials are set up: ```bash ada credentials list ``` 2. Check AWS profile is configured: ```bash cat ~/.aws/config | grep -A 2 kaena ``` 3. Test AWS access: ```bash export AWS_PROFILE=kaena aws sts get-caller-identity ``` ### "MCP server not loading" 1. Check uvx is installed: ```bash uvx --version ``` 2. Manually test the MCP server: ```bash uvx mcp-server-atlassian ``` 3. Check Kiro MCP logs (location varies by installation) ## What's Next After setup, you can: - Query NKI Jira tickets - Create new tickets - Update ticket status - Search and filter tickets - Generate reports See the full guide at `.kiro/steering/jira.md` for detailed usage examples. ================================================ FILE: _utilities/add_meta.py ================================================ #!/usr/bin/env python3 """Add missing .. meta:: blocks with :description:, :keywords:, and :date-modified: to .rst files.""" import os import re import sys from pathlib import Path TODAY = "2026-03-13" # Map file paths to sensible descriptions/keywords based on content def infer_meta(filepath: str, content: str) -> dict: """Infer description and keywords from file path and content.""" rel = filepath.replace("frameworks/", "") # Extract title from RST title = "" lines = content.split("\n") title_chars = set("=-~^\"'`#*+_.") for i, line in enumerate(lines): stripped = line.rstrip() if (len(stripped) >= 3 and len(set(stripped)) == 1 and stripped[0] in title_chars and i > 0): candidate = lines[i-1].strip() if candidate and not candidate.startswith(".."): title = candidate break # Build description from title or path if title: desc = f"{title} - AWS Neuron SDK documentation" else: desc = f"AWS Neuron SDK documentation for {os.path.basename(filepath).replace('.rst', '').replace('-', ' ')}" # Build keywords from path components kw_parts = set() if "torch" in rel: kw_parts.update(["PyTorch", "AWS Neuron"]) if "neuronx" in rel: kw_parts.update(["torch-neuronx", "Trainium", "Inferentia"]) if "jax" in rel: kw_parts.update(["JAX", "AWS Neuron", "JAX NeuronX"]) if "training" in rel.lower(): kw_parts.add("training") if "inference" in rel.lower(): kw_parts.add("inference") if "setup" in rel.lower() or "install" in rel.lower() or "update" in rel.lower(): kw_parts.add("setup") if "tutorial" in rel.lower(): kw_parts.add("tutorials") if "api" in rel.lower(): kw_parts.add("API reference") if "profil" in rel.lower(): kw_parts.add("profiling") if "troubleshoot" in rel.lower(): kw_parts.add("troubleshooting") if "debug" in rel.lower(): kw_parts.add("debugging") if not kw_parts: kw_parts.update(["AWS Neuron", "machine learning"]) keywords = ", ".join(sorted(kw_parts)) return {"description": desc, "keywords": keywords} def has_meta_field(content: str, field: str) -> bool: """Check if a .. meta:: block contains a specific field.""" return bool(re.search(rf"^\s+:{field}:", content, re.MULTILINE)) def process_file(filepath: str, dry_run: bool = False): """Process a single .rst file to ensure it has complete meta block.""" with open(filepath, "r", encoding="utf-8", errors="replace") as f: content = f.read() # Skip include-only fragments (no title, very short) if len(content.strip()) < 50: print(f" SKIP (fragment): {filepath}") return False has_meta = ".. meta::" in content has_desc = has_meta_field(content, "description") has_kw = has_meta_field(content, "keywords") has_date = has_meta_field(content, "date-modified") if has_desc and has_kw and has_date: print(f" OK (complete): {filepath}") return False meta = infer_meta(filepath, content) if has_meta: # Meta block exists but missing fields — add them missing = [] if not has_desc: missing.append(f" :description: {meta['description']}") if not has_kw: missing.append(f" :keywords: {meta['keywords']}") if not has_date: missing.append(f" :date-modified: {TODAY}") insert_text = "\n".join(missing) # Find the end of the existing meta block (last line starting with :field:) lines = content.split("\n") meta_start = -1 meta_last_field = -1 for i, line in enumerate(lines): if line.strip() == ".. meta::": meta_start = i elif meta_start >= 0 and re.match(r"\s+:\w", line): meta_last_field = i elif meta_start >= 0 and meta_last_field >= 0 and not line.strip().startswith(":") and not (line.strip() and not line[0].isspace()): break if meta_last_field >= 0: lines.insert(meta_last_field + 1, insert_text) new_content = "\n".join(lines) else: # Fallback: insert after .. meta:: line new_content = content.replace(".. meta::", f".. meta::\n{insert_text}", 1) else: # No meta block at all — add one at the top (after any labels) lines = content.split("\n") insert_idx = 0 # Skip leading labels (.. _label:) and blank lines for i, line in enumerate(lines): stripped = line.strip() if stripped.startswith(".. _") and stripped.endswith(":"): insert_idx = i + 1 elif stripped == "" and i <= insert_idx + 1: insert_idx = i + 1 else: break meta_block = ( f"\n.. meta::\n" f" :description: {meta['description']}\n" f" :keywords: {meta['keywords']}\n" f" :date-modified: {TODAY}\n\n" ) lines.insert(insert_idx, meta_block) new_content = "\n".join(lines) action = "UPDATE" if has_meta else "ADD" fields = [] if not has_desc: fields.append("description") if not has_kw: fields.append("keywords") if not has_date: fields.append("date-modified") print(f" {action} ({', '.join(fields)}): {filepath}") if not dry_run: with open(filepath, "w", encoding="utf-8") as f: f.write(new_content) return True def main(): import argparse parser = argparse.ArgumentParser(description="Add meta blocks to .rst files") parser.add_argument("directory", default="frameworks", nargs="?") parser.add_argument("--dry-run", action="store_true", help="Show what would change without writing") args = parser.parse_args() root = Path(args.directory) rst_files = sorted(root.rglob("*.rst")) print(f"Scanning {len(rst_files)} .rst files in {root}/:") changed = 0 for f in rst_files: if process_file(str(f), dry_run=args.dry_run): changed += 1 print(f"\n{'Would change' if args.dry_run else 'Changed'} {changed} file(s).") if __name__ == "__main__": main() ================================================ FILE: _utilities/audit_frameworks.py ================================================ #!/usr/bin/env python3 """ Audit script for the /frameworks directory of the AWS Neuron SDK documentation. Detects orphaned pages (not referenced by any toctree, :doc:, :ref:, or .. include:: directive) and stale pages (containing outdated references). Usage: python3 _utilities/audit_frameworks.py --root . --output audit-report.md """ import argparse import os import re from pathlib import Path # --------------------------------------------------------------------------- # Reference extraction helpers # --------------------------------------------------------------------------- # Regex patterns for RST directives and roles TOCTREE_BLOCK_RE = re.compile(r"^\.\.\s+toctree::", re.MULTILINE) DOC_ROLE_RE = re.compile(r":doc:`(?:[^<`]*<)?(/[^>`]+|[^>`/][^>`]*)`") REF_ROLE_RE = re.compile(r":ref:`(?:[^<`]*<)?([^>`]+)`") INCLUDE_RE = re.compile(r"^\.\.\s+include::\s+(.+)$", re.MULTILINE) LABEL_RE = re.compile(r"^\.\.\s+_([a-zA-Z0-9_-]+)\s*:", re.MULTILINE) def _resolve_path(ref: str, referencing_file: Path, root: Path) -> str | None: """Resolve a toctree/doc/include reference to a repo-relative path.""" ref = ref.strip() if not ref: return None # Absolute path (starts with /) if ref.startswith("/"): resolved = ref.lstrip("/") else: # Relative to the directory of the referencing file ref_dir = referencing_file.parent.relative_to(root) resolved = str(ref_dir / ref) # Normalise (collapse ..) resolved = os.path.normpath(resolved) return resolved def _resolve_to_files(base: str, root: Path) -> list[str]: """Given a resolved base path, return candidate file paths that exist.""" candidates = [] # Direct file match (already has extension) if (root / base).is_file(): candidates.append(base) return candidates # Try common extensions for ext in (".rst", ".ipynb", ".txt"): p = base + ext if (root / p).is_file(): candidates.append(p) # Could be a directory with index.rst idx = os.path.join(base, "index.rst") if (root / idx).is_file(): candidates.append(idx) return candidates def extract_toctree_entries(content: str, filepath: Path, root: Path) -> set[str]: """Extract all file paths referenced in toctree directives.""" referenced: set[str] = set() lines = content.split("\n") i = 0 while i < len(lines): if TOCTREE_BLOCK_RE.match(lines[i]): # Skip toctree options (lines starting with : or blank within indent) i += 1 # Skip blank lines and option lines while i < len(lines): stripped = lines[i].strip() if stripped == "" or stripped.startswith(":"): i += 1 continue break # Now read toctree entries (indented non-empty lines) while i < len(lines): line = lines[i] stripped = line.strip() if stripped == "": i += 1 continue # Check if still indented (part of toctree body) if line[0] in (" ", "\t"): # Entry may have a title: "Title " or just "path" entry = stripped m = re.match(r".*<(.+)>", entry) if m: entry = m.group(1).strip() # Resolve the path resolved = _resolve_path(entry, filepath, root) if resolved: for f in _resolve_to_files(resolved, root): referenced.add(f) i += 1 else: break else: i += 1 return referenced def extract_doc_refs(content: str, filepath: Path, root: Path) -> set[str]: """Extract all file paths referenced via :doc: roles.""" referenced: set[str] = set() for m in DOC_ROLE_RE.finditer(content): ref = m.group(1).strip() resolved = _resolve_path(ref, filepath, root) if resolved: for f in _resolve_to_files(resolved, root): referenced.add(f) return referenced def extract_include_refs(content: str, filepath: Path, root: Path) -> set[str]: """Extract all file paths referenced via .. include:: directives.""" referenced: set[str] = set() for m in INCLUDE_RE.finditer(content): ref = m.group(1).strip() resolved = _resolve_path(ref, filepath, root) if resolved: for f in _resolve_to_files(resolved, root): referenced.add(f) return referenced def extract_ref_labels(content: str) -> set[str]: """Extract all :ref: label targets from content.""" return set(m.group(1) for m in REF_ROLE_RE.finditer(content)) def extract_label_definitions(content: str) -> set[str]: """Extract all label definitions (.. _label:) from content.""" return set(m.group(1) for m in LABEL_RE.finditer(content)) # --------------------------------------------------------------------------- # Orphan detection # --------------------------------------------------------------------------- def find_all_framework_files(root: Path) -> tuple[set[str], set[str], set[str]]: """Find all .rst, .ipynb, and .txt files under frameworks/. Returns (rst_files, ipynb_files, txt_files) as repo-relative paths. """ rst_files: set[str] = set() ipynb_files: set[str] = set() txt_files: set[str] = set() fw_dir = root / "frameworks" if not fw_dir.is_dir(): return rst_files, ipynb_files, txt_files for p in fw_dir.rglob("*"): if not p.is_file(): continue rel = str(p.relative_to(root)) if "__pycache__" in rel: continue if p.suffix == ".rst": rst_files.add(rel) elif p.suffix == ".ipynb": ipynb_files.add(rel) elif p.suffix == ".txt": txt_files.add(rel) return rst_files, ipynb_files, txt_files def collect_all_references(root: Path) -> tuple[set[str], set[str], set[str]]: """Scan ALL .rst and .txt files in the repo to collect references. Returns (toctree_and_doc_refs, include_refs, ref_labels_used). We scan the entire repo (not just /frameworks) so that references from root index.rst, setup/, about-neuron/, etc. are captured. """ toctree_doc_refs: set[str] = set() include_refs: set[str] = set() ref_labels_used: set[str] = set() # Directories to skip entirely skip_dirs = {"_build", ".git", "venv", ".venv", "__pycache__", ".kiro", ".vscode", ".github", "node_modules", "_backup-rn"} for ext in ("*.rst", "*.txt"): for p in root.rglob(ext): # Skip files in excluded directories rel = str(p.relative_to(root)) parts = Path(rel).parts if any(part in skip_dirs for part in parts): continue try: content = p.read_text(encoding="utf-8", errors="replace") except Exception: continue toctree_doc_refs |= extract_toctree_entries(content, p, root) toctree_doc_refs |= extract_doc_refs(content, p, root) include_refs |= extract_include_refs(content, p, root) ref_labels_used |= extract_ref_labels(content) return toctree_doc_refs, include_refs, ref_labels_used def build_label_to_file_map(root: Path) -> dict[str, str]: """Build a mapping from :ref: label -> repo-relative file path. Only scans files under frameworks/ since we only need to know which framework files are referenced via :ref:. """ label_map: dict[str, str] = {} fw_dir = root / "frameworks" if not fw_dir.is_dir(): return label_map for p in fw_dir.rglob("*.rst"): rel = str(p.relative_to(root)) try: content = p.read_text(encoding="utf-8", errors="replace") except Exception: continue for label in extract_label_definitions(content): label_map[label] = rel return label_map def detect_orphans(root: Path) -> list[dict]: """Detect orphaned pages under /frameworks. Returns a list of dicts with keys: path, type, reason, action. """ rst_files, ipynb_files, txt_files = find_all_framework_files(root) toctree_doc_refs, include_refs, ref_labels_used = collect_all_references(root) label_map = build_label_to_file_map(root) # Files referenced via :ref: labels ref_referenced_files: set[str] = set() for label in ref_labels_used: if label in label_map: ref_referenced_files.add(label_map[label]) # All referenced content files (rst + ipynb) all_content_refs = toctree_doc_refs | ref_referenced_files # All referenced include files (txt) all_include_refs = include_refs orphans: list[dict] = [] # Check .rst and .ipynb files against toctree/doc/ref references for f in sorted(rst_files | ipynb_files): if f not in all_content_refs and f not in all_include_refs: ext = Path(f).suffix orphans.append({ "path": f, "type": ext, "reason": "Not in any toctree or cross-reference", "action": "Delete", }) # Check .txt files against include references only for f in sorted(txt_files): if f not in all_include_refs: orphans.append({ "path": f, "type": ".txt (include fragment)", "reason": "Not referenced by any .. include:: directive", "action": "Delete", }) return orphans # --------------------------------------------------------------------------- # Stale page detection # --------------------------------------------------------------------------- # Staleness indicator patterns STALE_OS_RE = re.compile( r"Ubuntu\s+18\.04|Ubuntu\s+20\.04|Amazon\s+Linux\s+2(?!\s*023)(?!\s*\d{3})\b", re.IGNORECASE, ) STALE_PYTHON_RE = re.compile( r"Python\s+3\.[0-9](?!\d)\b", # matches Python 3.0 through 3.9 ) STALE_SDK_RE = re.compile(r"Neuron\s+SDK\s+2\.(\d+)") TORCH_NEURON_SETUP_RE = re.compile( r"torch-neuron.*(?:setup|install|update)", re.IGNORECASE, ) NEURON_CC_RE = re.compile(r"\bneuron-cc\b") def _check_stale_python(content: str) -> list[str]: """Find references to Python versions below 3.10.""" indicators = [] for m in STALE_PYTHON_RE.finditer(content): ver_str = m.group(0) # Extract minor version minor = int(ver_str.split(".")[-1]) if minor < 10: indicators.append(ver_str) return list(set(indicators)) def _check_stale_sdk(content: str) -> list[str]: """Find references to Neuron SDK versions older than 2.20.""" indicators = [] for m in STALE_SDK_RE.finditer(content): ver = int(m.group(1)) if ver < 20: indicators.append(m.group(0)) return list(set(indicators)) def _check_stale_os(content: str) -> list[str]: """Find references to unsupported OS versions.""" return list(set(m.group(0) for m in STALE_OS_RE.finditer(content))) def _check_torch_neuron_unsupported_os(content: str) -> list[str]: """Flag torch-neuron setup/update instructions for unsupported OS.""" indicators = [] if TORCH_NEURON_SETUP_RE.search(content): os_refs = _check_stale_os(content) if os_refs: indicators.append( f"torch-neuron setup/update with unsupported OS: {', '.join(os_refs)}" ) return indicators def _check_neuron_cc(content: str) -> list[str]: """Flag deprecated neuron-cc references.""" if NEURON_CC_RE.search(content): return ["References deprecated neuron-cc compiler"] return [] def detect_stale_pages(root: Path) -> list[dict]: """Detect stale pages under /frameworks. Returns a list of dicts with keys: path, indicators, recommendation. """ stale: list[dict] = [] fw_dir = root / "frameworks" if not fw_dir.is_dir(): return stale for p in fw_dir.rglob("*"): if not p.is_file(): continue if p.suffix not in (".rst", ".txt"): continue rel = str(p.relative_to(root)) try: content = p.read_text(encoding="utf-8", errors="replace") except Exception: continue indicators: list[str] = [] indicators.extend(_check_stale_os(content)) indicators.extend(_check_stale_python(content)) indicators.extend(_check_stale_sdk(content)) indicators.extend(_check_torch_neuron_unsupported_os(content)) indicators.extend(_check_neuron_cc(content)) if indicators: # Determine recommendation is_archival = ( "mxnet-neuron/" in rel or "tensorflow/" in rel or ("torch-neuron/" in rel and "torch-neuronx/" not in rel) ) if is_archival: rec = "Will be archived" else: rec = "Update or archive" stale.append({ "path": rel, "indicators": "; ".join(sorted(set(indicators))), "recommendation": rec, }) return sorted(stale, key=lambda x: x["path"]) # --------------------------------------------------------------------------- # Report generation # --------------------------------------------------------------------------- def generate_report(orphans: list[dict], stale: list[dict]) -> str: """Generate the audit report as Markdown.""" lines: list[str] = [] lines.append("# Frameworks Audit Report\n") # Orphaned pages lines.append("## Orphaned Pages\n") if orphans: lines.append("| File Path | Type | Reason | Action |") lines.append("|---|---|---|---|") for o in orphans: lines.append( f"| {o['path']} | {o['type']} | {o['reason']} | {o['action']} |" ) else: lines.append("No orphaned pages detected.\n") lines.append("") # Stale pages lines.append("## Stale Pages\n") if stale: lines.append("| File Path | Staleness Indicators | Recommendation |") lines.append("|---|---|---|") for s in stale: lines.append( f"| {s['path']} | {s['indicators']} | {s['recommendation']} |" ) else: lines.append("No stale pages detected.\n") lines.append("") return "\n".join(lines) # --------------------------------------------------------------------------- # CLI # --------------------------------------------------------------------------- def main(): parser = argparse.ArgumentParser( description="Audit /frameworks for orphaned and stale pages." ) parser.add_argument( "--root", default=".", help="Repository root directory (default: current directory)", ) parser.add_argument( "--output", default="audit-report.md", help="Output file path for the audit report (default: audit-report.md)", ) args = parser.parse_args() root = Path(args.root).resolve() print(f"Auditing frameworks under: {root}") orphans = detect_orphans(root) print(f"Found {len(orphans)} orphaned page(s).") stale = detect_stale_pages(root) print(f"Found {len(stale)} stale page(s).") report = generate_report(orphans, stale) output_path = Path(args.output) if not output_path.is_absolute(): output_path = root / output_path output_path.write_text(report, encoding="utf-8") print(f"Audit report written to: {output_path}") if __name__ == "__main__": main() ================================================ FILE: _utilities/check_urls.sh ================================================ #!/bin/bash # Output file output_file="url_check_results.txt" # Initialize counters total=0 working=0 not_found=0 other=0 # Create output file with header echo "URL Status Check Results" > $output_file echo "=========================" >> $output_file echo "" >> $output_file # Read each URL from the file while read url; do # Skip empty lines if [ -z "$url" ]; then continue fi # Increment total counter ((total++)) # Print progress echo "Checking $total: $url" # Use curl to check the URL status status_code=$(curl -s -o /dev/null -w "%{http_code}" "$url") # Check status code if [ "$status_code" -eq 200 ]; then echo "✓ WORKING: $url" >> $output_file ((working++)) elif [ "$status_code" -eq 404 ]; then echo "✗ NOT FOUND (404): $url" >> $output_file ((not_found++)) else echo "? OTHER STATUS ($status_code): $url" >> $output_file ((other++)) fi # Small delay to avoid overwhelming the server sleep 0.1 done < old-nki-apis.txt # Write summary echo "" >> $output_file echo "" >> $output_file echo "Summary" >> $output_file echo "=======" >> $output_file echo "Total URLs checked: $total" >> $output_file echo "Working URLs: $working" >> $output_file echo "Not found (404) URLs: $not_found" >> $output_file echo "Other status URLs: $other" >> $output_file echo "URL check completed. Results saved to $output_file" ================================================ FILE: _utilities/create_sitemap.py ================================================ # v1.0 by dougeric 2025-09-30 # Script to create sitemap.xml for Sphinx-generated docs; must be run at the root of the docs repo with venv import os from pathlib import Path from datetime import datetime def create_sitemap(root_dir, base_url): """ This function generates a sitemap.xml file for the given root directory and base URL. It recursively scans all .rst files in the root directory, excluding those in directories starting with "_". For each .rst file, it calculates the last modification time, converts the .rst path to the corresponding HTML path, and adds a entry to the sitemap in the format required by Google Search Console. """ sitemap = ['', ''] for path in Path(root_dir).rglob('*.rst'): # Skip directories starting with "_" if any(part.startswith('_') for part in path.parts): continue # Convert .rst path to expected html path rel_path = path.relative_to(root_dir) html_path = str(rel_path).replace('.rst', '.html') # Get file modification time mod_time = datetime.fromtimestamp(os.path.getmtime(path)) sitemap.append(f' ') sitemap.append(f' {base_url}/{html_path}') sitemap.append(f' {mod_time.strftime("%Y-%m-%d")}') sitemap.append(f' ') sitemap.append('') return '\n'.join(sitemap) # Call the function and write the result to sitemap.xml sitemap_content = create_sitemap('./', 'https://awsdocs-neuron.readthedocs-hosted.com/en/latest') with open('sitemap.xml', 'w') as f: f.write(sitemap_content) print("\nsitemap.xml has been created.\n") ================================================ FILE: _utilities/format_build_logs.py ================================================ #!/usr/bin/env python3 """ Format Sphinx Build Logs This script checks for Python 3.9 and pip, creates a virtual environment, runs sphinx-build, and formats the build log as Markdown with separate sections for errors and warnings. """ import os import sys import subprocess import re import datetime import platform import shutil from collections import Counter from pathlib import Path def check_python_version(): """Check if Python 3.9 is installed.""" python_version = sys.version_info if python_version.major != 3 or python_version.minor != 9: print("Error: Python 3.9 is required.") if platform.system() == "Darwin": # macOS print("To install Python 3.9 on macOS, visit: https://www.python.org/downloads/release/python-3913/") print("Or use Homebrew: brew install python@3.9") elif platform.system() == "Windows": print("To install Python 3.9 on Windows, visit: https://www.python.org/downloads/release/python-3913/") else: print("Please install Python 3.9 from: https://www.python.org/downloads/release/python-3913/") sys.exit(1) return True def check_pip_installed(): """Check if pip is installed.""" try: subprocess.run([sys.executable, "-m", "pip", "--version"], check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return True except subprocess.CalledProcessError: print("Error: pip is not installed.") print("Please install pip: https://pip.pypa.io/en/stable/installation/") sys.exit(1) def find_repo_root(): """Find the root of the private-aws-neuron-sdk-staging repo.""" # Start with the current directory current_dir = Path.cwd() # Check if we're already in the repo root if current_dir.name == "private-aws-neuron-sdk-staging": return current_dir # Check parent directory parent_dir = current_dir.parent if parent_dir.name == "private-aws-neuron-sdk-staging": return parent_dir # Look for the repo in the current directory for item in current_dir.iterdir(): if item.is_dir() and item.name == "private-aws-neuron-sdk-staging": return item # Look for the repo in the parent directory for item in parent_dir.iterdir(): if item.is_dir() and item.name == "private-aws-neuron-sdk-staging": return item print("Error: Repository 'private-aws-neuron-sdk-staging' not found on local machine.") sys.exit(1) def setup_venv(repo_parent): """Create and activate a Python 3.9 virtual environment.""" venv_path = repo_parent / "venv" # Create venv if it doesn't exist if not venv_path.exists(): print(f"Creating virtual environment at {venv_path}...") try: subprocess.run([sys.executable, "-m", "venv", str(venv_path)], check=True) except subprocess.CalledProcessError as e: print(f"Error creating virtual environment: {e}") sys.exit(1) # Determine the path to the activate script if platform.system() == "Windows": activate_script = venv_path / "Scripts" / "activate.bat" activate_cmd = str(activate_script) else: activate_script = venv_path / "bin" / "activate" activate_cmd = f"source {activate_script}" print(f"Virtual environment created at {venv_path}") print(f"To activate manually, run: {activate_cmd}") return venv_path def get_venv_python(venv_path): """Get the path to the Python executable in the virtual environment.""" if platform.system() == "Windows": return venv_path / "Scripts" / "python.exe" else: return venv_path / "bin" / "python" def get_venv_pip(venv_path): """Get the path to the pip executable in the virtual environment.""" if platform.system() == "Windows": return venv_path / "Scripts" / "pip.exe" else: return venv_path / "bin" / "pip" def install_requirements(repo_root, venv_pip): """Install requirements from requirements.txt.""" requirements_file = repo_root / "requirements.txt" if not requirements_file.exists(): print(f"Error: requirements.txt not found at {requirements_file}") sys.exit(1) print("Installing requirements...") try: subprocess.run([ str(venv_pip), "install", "-r", str(requirements_file), "--extra-index-url=https://pypi.org/simple" ], check=True) except subprocess.CalledProcessError as e: print(f"Error installing requirements: {e}") sys.exit(1) print("Requirements installed successfully.") def run_sphinx_build(repo_root, venv_path): """Run sphinx-build and capture the output.""" sphinx_build_path = venv_path / "bin" / "sphinx-build" if platform.system() == "Windows": sphinx_build_path = venv_path / "Scripts" / "sphinx-build.exe" if not sphinx_build_path.exists(): print(f"Error: sphinx-build not found at {sphinx_build_path}") sys.exit(1) print("Running sphinx-build...") # Create a log file to capture output log_file_path = repo_root / "sphinx_build_output.log" try: # Run sphinx-build with output redirected to both terminal and log file with open(log_file_path, 'w') as log_file: process = subprocess.Popen( [str(sphinx_build_path), "-b", "html", ".", "_build/html", "-w", "warnings.txt"], cwd=str(repo_root), stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1 ) # Capture output in real-time output = [] for line in process.stdout: print(line, end='') # Print to terminal log_file.write(line) # Write to log file output.append(line) process.wait() if process.returncode != 0: print(f"sphinx-build exited with code {process.returncode}") # Also read the warnings.txt file if it exists warnings_file = repo_root / "warnings.txt" if warnings_file.exists(): with open(warnings_file, 'r') as f: warnings_content = f.read() output.append("\n--- WARNINGS FILE CONTENT ---\n") output.append(warnings_content) return ''.join(output) except Exception as e: print(f"Error running sphinx-build: {e}") sys.exit(1) def parse_build_log(log_text): """Parse the build log to extract errors and warnings.""" # Save raw log for debugging with open("raw_build_log.txt", "w") as f: f.write(log_text) # Check if warnings.txt exists and use it directly warnings_file = Path("warnings.txt") if warnings_file.exists(): print(f"Found warnings.txt file with direct warnings from Sphinx") with open(warnings_file, 'r') as f: warnings_content = f.read() # Parse warnings.txt which has format: path:line: WARNING: message warnings = [] for line in warnings_content.split('\n'): if not line.strip(): continue # Try to match the standard format first match = re.match(r'(.*?):(\d+): WARNING: (.*)', line) if match: file_path, line_num, message = match.groups() warnings.append({ 'file': file_path, 'line': line_num, 'message': message.strip() }) print(f"Standard format match: file={file_path}, line={line_num}, message={message[:50]}...") else: # Check for the "document isn't included in any toctree" pattern # Format: /path/to/file.rst: WARNING: document isn't included in any toctree toctree_match = re.match(r'(.*?): WARNING: (document isn\'t included in any toctree.*)', line) if toctree_match: file_path, message = toctree_match.groups() warnings.append({ 'file': file_path, 'line': '0', # No line number in this format 'message': message.strip() }) print(f"Toctree match: file={file_path}, message={message[:50]}...") else: # If no match, just add as unknown warnings.append({ 'file': 'unknown', 'line': '0', 'message': line.strip() }) print(f"No match: message={line[:50]}...") else: print("No warnings.txt file found, parsing log output directly") warnings = [] lines = log_text.split('\n') i = 0 while i < len(lines): line = lines[i].strip() # Skip empty lines if not line: i += 1 continue # Check for the "document isn't included in any toctree" pattern # Format: /path/to/file.rst: WARNING: document isn't included in any toctree toctree_match = re.match(r'(.*?): WARNING: (document isn\'t included in any toctree.*)', line) if toctree_match: file_path, message = toctree_match.groups() warnings.append({ 'file': file_path, 'line': '0', # No line number in this format 'message': message.strip() }) i += 1 continue # Check for warnings in the raw message # This is for warnings that are already in the log as complete messages raw_warning_match = re.match(r'(.*?): WARNING: (.*)', line) if raw_warning_match: file_path, message = raw_warning_match.groups() warnings.append({ 'file': file_path, 'line': '0', # No line number in this format 'message': message.strip() }) i += 1 continue # Check for standard format: path:line: WARNING: message std_match = re.match(r'(.*?):(\d+): WARNING: (.*)', line) if std_match: file_path, line_num, message = std_match.groups() warnings.append({ 'file': file_path, 'line': line_num, 'message': message.strip() }) i += 1 continue # Check for alternative format: WARNING: message (path:line) alt_match = re.match(r'WARNING: (.*?) \((.*?):(\d+)\)', line) if alt_match: message, file_path, line_num = alt_match.groups() warnings.append({ 'file': file_path, 'line': line_num, 'message': message.strip() }) i += 1 continue # Check for simple warnings that start with "WARNING:" if line.startswith("WARNING:"): message = line[8:].strip() # Remove "WARNING: " prefix # Collect continuation lines i += 1 while i < len(lines) and lines[i].strip() and not lines[i].strip().startswith(("WARNING:", "ERROR:")): message += " " + lines[i].strip() i += 1 warnings.append({ 'file': 'unknown', 'line': '0', 'message': message }) continue i += 1 # Debug: Print the first few warnings to see what's being parsed print(f"Parsed {len(warnings)} warnings") for i, warning in enumerate(warnings[:5]): print(f"Warning {i+1}: file={warning['file']}, line={warning['line']}, message={warning['message'][:50]}...") # Debug: Print the warning categories categories = categorize_issues(warnings) print(f"Warning categories: {categories}") # Regular expressions for errors error_pattern = re.compile(r'(.*?):(\d+): (?:ERROR|SEVERE): (.*?)(?:\n|$)') errors = [] lines = log_text.split('\n') for line in lines: error_match = error_pattern.search(line) if error_match: file_path, line_num, message = error_match.groups() errors.append({ 'file': file_path, 'line': line_num, 'message': message.strip() }) return errors, warnings def categorize_issues(issues): """Categorize issues by type.""" categories = Counter() for issue in issues: # Extract the main category from the message message = issue['message'].lower() if "undefined label" in message: categories["Undefined Label"] += 1 elif "unknown document" in message: categories["Unknown Document"] += 1 elif "duplicate label" in message: categories["Duplicate Label"] += 1 elif "image file not found" in message: categories["Missing Image"] += 1 elif "toctree contains reference to nonexisting document" in message: categories["Missing Document"] += 1 elif "document isn't included in any toctree" in message: categories["Document Not in TOC"] += 1 else: categories["Other"] += 1 return categories def format_markdown(errors, warnings, build_time): """Format the build log as Markdown.""" timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") error_categories = categorize_issues(errors) warning_categories = categorize_issues(warnings) markdown = f"# Sphinx Build Log - {timestamp}\n\n" # Build summary markdown += "## Build Summary\n\n" markdown += f"- **Build Time**: {build_time:.2f} seconds\n" markdown += f"- **Total Errors**: {len(errors)}\n" markdown += f"- **Total Warnings**: {len(warnings)}\n\n" # Error categories if error_categories: markdown += "### Error Categories\n\n" for category, count in error_categories.most_common(): markdown += f"- **{category}**: {count}\n" markdown += "\n" # Warning categories if warning_categories: markdown += "### Warning Categories\n\n" for category, count in warning_categories.most_common(): markdown += f"- **{category}**: {count}\n" markdown += "\n" # Errors section markdown += "## Errors\n\n" if errors: for i, error in enumerate(errors, 1): # Format the file path to be more readable file_path = error['file'] if file_path.startswith('/Users/dougeric/git/private-aws-neuron-sdk-staging/'): file_path = file_path[len('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):] # Create a more readable header with file and line info if error['file'] != 'unknown': markdown += f"### Error {i}: {file_path} (line {error['line']})\n\n" else: markdown += f"### Error {i}\n\n" markdown += f"```\n{error['message']}\n```\n\n" else: markdown += "No errors found.\n\n" # Warnings section markdown += "## Warnings\n\n" if warnings: for i, warning in enumerate(warnings, 1): # Format the file path to be more readable file_path = warning['file'] if file_path.startswith('/Users/dougeric/git/private-aws-neuron-sdk-staging/'): file_path = file_path[len('/Users/dougeric/git/private-aws-neuron-sdk-staging/'):] # Create a more readable header with file and line info if warning['file'] != 'unknown': if warning['line'] != '0': markdown += f"### Warning {i}: {file_path} (line {warning['line']})\n\n" else: markdown += f"### Warning {i}: {file_path}\n\n" else: markdown += f"### Warning {i}\n\n" # Don't include the file path in the message if it's already in the header message = warning['message'] if warning['file'] != 'unknown' and message.startswith(warning['file']): # Remove the file path from the message message = message[len(warning['file'])+2:] # +2 for ": " markdown += f"```\n{message}\n```\n\n" else: markdown += "No warnings found.\n\n" return markdown def main(): """Main function.""" print("Checking Python version...") check_python_version() print("Checking pip installation...") check_pip_installed() print("Finding repository root...") repo_root = find_repo_root() repo_parent = repo_root.parent print(f"Repository found at: {repo_root}") print("Setting up virtual environment...") venv_path = setup_venv(repo_parent) venv_python = get_venv_python(venv_path) venv_pip = get_venv_pip(venv_path) print(f"Changing directory to {repo_root}...") os.chdir(str(repo_root)) print("Installing requirements...") install_requirements(repo_root, venv_pip) print("Running sphinx-build...") start_time = datetime.datetime.now() build_log = run_sphinx_build(repo_root, venv_path) end_time = datetime.datetime.now() build_time = (end_time - start_time).total_seconds() print("Parsing build log...") errors, warnings = parse_build_log(build_log) print("Formatting build log as Markdown...") markdown = format_markdown(errors, warnings, build_time) # Write the formatted log to a file timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") output_file = repo_root / f"build-log-{timestamp}.md" with open(output_file, "w") as f: f.write(markdown) print(f"Build log written to {output_file}") print(f"Found {len(errors)} errors and {len(warnings)} warnings.") if __name__ == "__main__": main() ================================================ FILE: _utilities/inject_archive_meta.py ================================================ #!/usr/bin/env python3 """Inject noindex/nofollow meta directives and deprecation banners into archived .rst files.""" import os import re import sys META_BLOCK = """.. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 """ WARNING_TEMPLATE = """ .. warning:: This document is archived. {framework} is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. """ # Default for backward compatibility WARNING_BLOCK = WARNING_TEMPLATE.format(framework="MXNet") def find_title_end(lines): """Find the line index after the RST title underline. RST titles look like: Title Text ========== or with overline: ========== Title Text ========== Returns the index of the line AFTER the title underline, or -1 if not found. """ title_chars = set('=-~^"\'`#*+_.') i = 0 while i < len(lines): line = lines[i].rstrip() # Check if this line is an underline (all same char, at least 3 chars) if len(line) >= 3 and len(set(line)) == 1 and line[0] in title_chars: # Check if previous line is text (title) - this is an underline if i > 0 and lines[i-1].strip() and not (len(set(lines[i-1].rstrip())) == 1 and lines[i-1].rstrip()[0] in title_chars): return i + 1 # Check if next line is text and line after that is underline (overline pattern) if i + 2 < len(lines) and lines[i+1].strip(): next_next = lines[i+2].rstrip() if len(next_next) >= 3 and len(set(next_next)) == 1 and next_next[0] in title_chars: return i + 3 i += 1 return -1 def inject_meta_and_warning(filepath, framework="MXNet"): """Inject meta block at top and warning after title in an RST file.""" with open(filepath, 'r') as f: content = f.read() # Skip if already has noindex meta if ':noindex:' in content: print(f" SKIP (already has meta): {filepath}") return warning_block = WARNING_TEMPLATE.format(framework=framework) lines = content.split('\n') # Separate any leading labels (.. _label:) and blank lines # These need to stay before the meta block label_lines = [] content_start = 0 for i, line in enumerate(lines): stripped = line.strip() if stripped.startswith('.. _') and stripped.endswith(':'): label_lines.append(line) content_start = i + 1 elif stripped == '' and all(l.strip().startswith('.. _') for l in lines[:i] if l.strip()): label_lines.append(line) content_start = i + 1 else: break # Build the content after labels remaining_lines = lines[content_start:] remaining_content = '\n'.join(remaining_lines) # Find title end in remaining content title_end = find_title_end(remaining_lines) if title_end >= 0: # Insert warning after title before_title = '\n'.join(remaining_lines[:title_end]) after_title = '\n'.join(remaining_lines[title_end:]) new_remaining = before_title + '\n' + warning_block + after_title else: # No title found, just add warning at the start of content print(f" WARNING: No title found in {filepath}") new_remaining = warning_block + remaining_content # Reconstruct: labels + meta + content with warning label_section = '\n'.join(label_lines) + '\n' if label_lines else '' new_content = label_section + META_BLOCK + new_remaining # Ensure file ends with newline if not new_content.endswith('\n'): new_content += '\n' with open(filepath, 'w') as f: f.write(new_content) print(f" OK: {filepath}") def main(): import argparse parser = argparse.ArgumentParser(description='Inject archive meta into .rst files') parser.add_argument('archive_dir', nargs='?', default='archive/mxnet-neuron', help='Directory containing .rst files to process') parser.add_argument('--framework', default='MXNet', help='Framework name for the deprecation warning (e.g., MXNet, TensorFlow)') args = parser.parse_args() archive_dir = args.archive_dir framework = args.framework rst_files = [] for root, dirs, files in os.walk(archive_dir): for fname in files: if fname.endswith('.rst'): rst_files.append(os.path.join(root, fname)) rst_files.sort() print(f"Processing {len(rst_files)} .rst files in {archive_dir}:") for filepath in rst_files: inject_meta_and_warning(filepath, framework=framework) print(f"\nDone. Processed {len(rst_files)} files.") if __name__ == '__main__': main() ================================================ FILE: _utilities/metadata_schema.yaml ================================================ # Metadata Schema for AWS Neuron SDK Setup Documentation # This schema defines the structured metadata fields used in setup documentation pages metadata_fields: # Core identification fields description: type: string required: true description: "SEO and AI agent description of the page content" example: "Install PyTorch Neuron using AWS Deep Learning AMI on Inf2, Trn1, Trn2, Trn3" keywords: type: array[string] required: true description: "Comma-separated search terms for discoverability" example: "pytorch, neuron, dlami, installation, inf2, trn1, trn2, trn3" date-modified: type: date required: true format: "YYYY-MM-DD" description: "ISO 8601 date of last modification" example: "2026-03-02" content-type: type: enum required: true values: - navigation-hub - framework-setup-hub - installation-guide - troubleshooting - legacy-guide description: "Type of documentation page" # Setup-specific fields framework: type: enum required_for: [installation-guide, framework-setup-hub] values: - pytorch - jax - tensorflow - mxnet description: "ML framework being documented" validation: "Must match parent directory name" instance-types: type: array[enum] required_for: [installation-guide, framework-setup-hub, navigation-hub] values: - inf1 - inf2 - trn1 - trn2 - trn3 description: "Supported AWS instance types" validation: "Cannot mix inf1 with inf2/trn1/trn2/trn3" installation-method: type: enum required_for: [installation-guide] values: - dlami - manual - container description: "Installation approach documented" os: type: array[enum] required_for: [installation-guide] values: - ubuntu-24.04 - ubuntu-22.04 - al2023 - rocky-9 description: "Supported operating systems" python-versions: type: array[string] required: false description: "Supported Python versions" example: "3.10, 3.11, 3.12" status: type: enum required: false values: - current - beta - legacy - deprecated description: "Status of the documented feature/hardware" validation: "Must be 'legacy' when instance-types contains only inf1" # AI agent hints task: type: string required: false description: "Task-based description for AI agents" example: "Install PyTorch on Trn1 using DLAMI" prerequisites: type: array[string] required: false description: "List of required knowledge/resources" estimated-time: type: string required: false description: "Estimated completion time" example: "5 minutes" # Validation Rules validation_rules: - rule: "inf1_separation" description: "inf1 cannot be mixed with inf2, trn1, trn2, or trn3" check: "If 'inf1' in instance-types, then len(instance-types) == 1" error_message: "Cannot mix inf1 with other instance types" - rule: "framework_directory_match" description: "framework metadata must match parent directory" check: "framework value must equal parent directory name" error_message: "Framework '{framework}' does not match directory '{directory}'" - rule: "legacy_status_for_inf1" description: "Pages with only inf1 must have legacy status" check: "If instance-types == ['inf1'], then status == 'legacy'" error_message: "Inf1-only pages must have status: legacy" - rule: "legacy_directory_location" description: "Legacy content must be in legacy-inf1 directory" check: "If status == 'legacy', then path contains '/legacy-inf1/'" warning_message: "Legacy content should be in /setup/legacy-inf1/ directory" - rule: "installation_guide_completeness" description: "Installation guides must have complete metadata" check: "If content-type == 'installation-guide', then framework, instance-types, installation-method, and os must be present" error_message: "Installation guide missing required metadata: {missing_fields}" - rule: "content_type_requirements" description: "Each content type has specific required fields" requirements: navigation-hub: [description, keywords, instance-types, content-type] framework-setup-hub: [description, keywords, framework, instance-types, content-type] installation-guide: [description, keywords, framework, instance-types, installation-method, os, content-type] troubleshooting: [description, keywords, content-type] legacy-guide: [description, keywords, instance-types, status, content-type] # Usage Examples examples: installation_guide: description: "Install PyTorch Neuron using AWS DLAMI on Inf2, Trn1, Trn2, Trn3" keywords: "pytorch, neuron, dlami, installation, inf2, trn1, trn2, trn3" framework: "pytorch" instance-types: "inf2, trn1, trn2, trn3" installation-method: "dlami" os: "ubuntu-24.04, ubuntu-22.04, al2023" content-type: "installation-guide" date-modified: "2026-03-02" framework_hub: description: "Install PyTorch for AWS Neuron on Inf2, Trn1, Trn2, Trn3 instances" keywords: "pytorch, neuron, installation, trn1, trn2, trn3, inf2" framework: "pytorch" instance-types: "inf2, trn1, trn2, trn3" content-type: "framework-setup-hub" date-modified: "2026-03-02" legacy_guide: description: "Legacy installation guide for AWS Inferentia 1 (Inf1) instances" keywords: "neuron, inf1, legacy, installation, inferentia" instance-types: "inf1" status: "legacy" content-type: "legacy-guide" date-modified: "2026-03-02" ================================================ FILE: _utilities/migrate_setup_content.py ================================================ #!/usr/bin/env python3 """ Setup Content Migration Script Maps old setup file paths to new framework-first paths and generates a migration report. This script does NOT move files — it produces a report of what references exist and where they should point. Usage: python3 _utilities/migrate_setup_content.py [--dry-run] [--fix] Options: --dry-run Show what would be changed without modifying files (default) --fix Apply changes to files """ import argparse import os import re import sys from pathlib import Path # Old path → new path mapping PATH_MAP = { "/setup/torch-neuronx": "/setup/pytorch/index", "/setup/jax-neuronx": "/setup/jax/index", "/setup/tensorflow-neuronx": "/frameworks/tensorflow/index", "/setup/setup-neuronx": "/setup/index", "/setup/setup-neuron": "/setup/index", "/setup/mxnet-neuron": "/archive/mxnet-neuron/index", } # External URL mapping (for hardcoded URLs in tutorials) URL_MAP = { "setup/torch-neuronx.html": "setup/pytorch/index.html", "setup/jax-neuronx.html": "setup/jax/index.html", } # Directories to scan SCAN_DIRS = [ "about-neuron", "frameworks", "libraries", "tools", "compiler", "containers", "devflows", "release-notes", "setup", "nki", "dlami", ] # Directories to skip SKIP_DIRS = {"_build", ".git", "__pycache__", ".venv", "node_modules"} def find_rst_files(base_dir: str) -> list[Path]: """Find all .rst files in scan directories.""" files = [] for scan_dir in SCAN_DIRS: dir_path = Path(base_dir) / scan_dir if dir_path.exists(): for rst_file in dir_path.rglob("*.rst"): if not any(skip in rst_file.parts for skip in SKIP_DIRS): files.append(rst_file) return sorted(files) def find_references(content: str, file_path: Path) -> list[dict]: """Find old setup path references in file content.""" refs = [] # Match :doc: references for old_path, new_path in PATH_MAP.items(): pattern = re.compile( rf":doc:`([^`]*<)?{re.escape(old_path)}(>)?`", re.IGNORECASE ) for match in pattern.finditer(content): line_num = content[: match.start()].count("\n") + 1 refs.append( { "file": str(file_path), "line": line_num, "old": match.group(0), "old_path": old_path, "new_path": new_path, "type": "doc_ref", } ) # Match :ref: references to old labels old_labels = { "setup-torch-neuronx": "pytorch-setup", "setup-jax-neuronx": "jax-setup", "setup-tensorflow-neuronx": "tensorflow-setup", } for old_label, new_label in old_labels.items(): pattern = re.compile(rf":ref:`([^`]*<)?{re.escape(old_label)}(>)?`") for match in pattern.finditer(content): line_num = content[: match.start()].count("\n") + 1 refs.append( { "file": str(file_path), "line": line_num, "old": match.group(0), "old_label": old_label, "new_label": new_label, "type": "ref_label", } ) # Match hardcoded URLs for old_url, new_url in URL_MAP.items(): if old_url in content: line_num = content[: content.index(old_url)].count("\n") + 1 refs.append( { "file": str(file_path), "line": line_num, "old_url": old_url, "new_url": new_url, "type": "url", } ) return refs def apply_fix(file_path: Path, refs: list[dict]) -> bool: """Apply reference fixes to a file.""" content = file_path.read_text() modified = False for ref in refs: if ref["type"] == "doc_ref": old = ref["old_path"] new = ref["new_path"] new_content = content.replace(old, new) if new_content != content: content = new_content modified = True elif ref["type"] == "url": old = ref["old_url"] new = ref["new_url"] new_content = content.replace(old, new) if new_content != content: content = new_content modified = True if modified: file_path.write_text(content) return modified def main(): parser = argparse.ArgumentParser(description="Setup content migration script") parser.add_argument( "--fix", action="store_true", help="Apply changes (default is dry-run)" ) args = parser.parse_args() base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) rst_files = find_rst_files(base_dir) print(f"Scanning {len(rst_files)} .rst files...") print() all_refs = [] for rst_file in rst_files: content = rst_file.read_text() refs = find_references(content, rst_file) all_refs.extend(refs) if not all_refs: print("No old setup references found. Migration complete.") return # Group by file by_file = {} for ref in all_refs: by_file.setdefault(ref["file"], []).append(ref) print(f"Found {len(all_refs)} references in {len(by_file)} files:") print() for file_path, refs in sorted(by_file.items()): print(f" {file_path}:") for ref in refs: if ref["type"] == "doc_ref": print(f" L{ref['line']}: {ref['old_path']} → {ref['new_path']}") elif ref["type"] == "ref_label": print(f" L{ref['line']}: {ref['old_label']} → {ref['new_label']}") elif ref["type"] == "url": print(f" L{ref['line']}: {ref['old_url']} → {ref['new_url']}") print() if args.fix: fixed_count = 0 for file_path, refs in by_file.items(): if apply_fix(Path(file_path), refs): fixed_count += 1 print(f" ✓ Fixed: {file_path}") print(f"\nFixed {fixed_count} files.") else: print("Dry run — no files modified. Use --fix to apply changes.") if __name__ == "__main__": main() ================================================ FILE: _utilities/old-nki-apis.txt ================================================ https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.benchmark.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.profile.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.baremetal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.simulate_kernel.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.mod_alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.sbuf.auto_alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.mod_alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.psum.auto_alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.skip_middle_end_transformations.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.enable_stack_allocator.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.compiler.force_auto_alloc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.tensor.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.load.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.store.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.load_transpose2d.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.atomic_rmw.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.copy.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.broadcast_to.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.empty_like.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.zeros_like.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.ones.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.full.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rand.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.random_seed.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.shared_constant.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.shared_identity_matrix.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.arange.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mgrid.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.expand_dims.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.where.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gather_flattened.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.all_reduce.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.par_dim.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.spmd_dim.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.nc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.device_print.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.loop_reduce.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.fp32.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.add.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.subtract.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.multiply.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.divide.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.power.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.maximum.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.minimum.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.max.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.min.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mean.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.var.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sum.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.prod.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.all.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.abs.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.negative.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sign.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.trunc.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.floor.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.ceil.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mod.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.fmod.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.exp.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.log.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.cos.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sin.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.tan.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.tanh.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.arctan.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sqrt.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rsqrt.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.sigmoid.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.relu.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_dx.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_apprx_tanh.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.gelu_apprx_sigmoid.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.silu.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.silu_dx.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.erf.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.erf_dx.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.softplus.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.mish.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.square.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.softmax.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.rms_norm.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.dropout.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.matmul.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.transpose.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.reciprocal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_and.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_or.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.bitwise_xor.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.invert.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.left_shift.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.right_shift.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.equal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.not_equal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.greater.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.greater_equal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.less.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.less_equal.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_and.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_or.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_xor.html https://awsdocs-neuron.readthedocs-hosted.com/en/v2.26.1/nki/api/generated/nki.language.logical_not.html ================================================ FILE: _utilities/setup_jira_token.sh ================================================ #!/bin/bash # Setup script to fetch Jira API token from AWS Secrets Manager # and configure it for the Atlassian MCP server set -e echo "Setting up Jira API token..." # Check if AWS CLI is available if ! command -v aws &> /dev/null; then echo "Error: AWS CLI is not installed" echo "Install with: brew install awscli" exit 1 fi # Check if ada is available if ! command -v ada &> /dev/null; then echo "Error: ada credentials tool is not installed" echo "Install with: toolbox install ada" exit 1 fi # Set AWS profile to kaena export AWS_PROFILE=kaena echo "Fetching Jira API token from AWS Secrets Manager..." JIRA_TOKEN=$(aws secretsmanager get-secret-value \ --secret-id NKI_JIRA_API_TOKEN \ --region us-west-2 \ --query SecretString \ --output text 2>&1) if [ $? -ne 0 ]; then echo "Error: Failed to fetch Jira API token" echo "Make sure you have:" echo " 1. Run 'ada credentials setup' with account 621547421844, role Admin, profile kaena" echo " 2. Added kaena profile to ~/.aws/config with ada credential_process" echo " 3. Have IAM permissions to access the secret" echo "" echo "Error details:" echo "$JIRA_TOKEN" exit 1 fi echo "✓ Successfully fetched Jira API token" # Update the MCP config with the actual token MCP_CONFIG="$HOME/.kiro/settings/mcp.json" if [ ! -f "$MCP_CONFIG" ]; then echo "Error: MCP config not found at $MCP_CONFIG" exit 1 fi # Create a temporary file with the token substituted python3 << EOF import json import os config_path = os.path.expanduser('$MCP_CONFIG') with open(config_path, 'r') as f: config = json.load(f) # Update the Jira API token if 'atlassian-jira' in config['mcpServers']: config['mcpServers']['atlassian-jira']['env']['JIRA_API_TOKEN'] = '''$JIRA_TOKEN''' with open(config_path, 'w') as f: json.dump(config, f, indent=2) print("✓ Updated MCP configuration with Jira API token") else: print("Error: atlassian-jira server not found in MCP config") exit(1) EOF echo "" echo "Setup complete! You can now use Jira tools in Kiro." echo "" echo "To use Jira MCP tools:" echo " 1. Restart Kiro CLI" echo " 2. Use Jira tools through the MCP server" echo "" echo "Example queries:" echo " - Search for NKI tickets" echo " - Get ticket details" echo " - Create new tickets" ================================================ FILE: about-neuron/amazonq-getstarted.rst ================================================ .. image:: /images/q-logo.png :scale: 30% :alt: Amazon Q :align: left :target: https://aws.amazon.com/q/ .. _amazon-q-dev: Ask Amazon AI helper tools =========================== Use Kiro, Quick, and Amazon Q in the AWS console as your Neuron Experts for general Neuron technical guidance and to jumpstart your NKI kernel developement. .. card:: Ask Q on AWS apps and websites :link: https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/q-on-aws.html .. card:: Ask Kiro IDE :link: https://kiro.dev/ .. card:: Ask Kiro CLI :link: https://kiro.dev/cli .. card:: Ask Quick :link: https://aws.amazon.com/quick/ .. card:: Guidelines for Quality Results :link: amazon-q-dev-guidelines :link-type: ref .. _amazon-q-dev-guidelines: Guidelines for Quality Results ------------------------------ 1. Be Specific: Clearly state the task, desired output, and any constraints. 2. Provide Context: Mention specific versions, strategies, and any relevant performance requirements. 3. Request Complete Code: Ask for full implementations including imports, decorators, and main functions. Remember to always review and test the generated code before using it in production. 4. Ask for Explanations: Request comments or separate explanations for complex parts of the code. 5. Iterate: If the initial response isn’t satisfactory, refine your prompt based on the output. If you encounter issues or inaccuracies, consider rephrasing your prompt or breaking down complex tasks into smaller, more specific questions. 6. Fact check: Use Q as a starting point and supplement its output with official documentation, AWS NKI Samples repository, and your own expertise. Example Prompts ~~~~~~~~~~~~~~~~~ .. note:: Amazon AI helper tools may not be fully synched with the latest Neuron features. Therefore, they may not always produce optimal or fully accurate results. 1. “Explain the key features and benefits of AWS Neuron Kernel Interface (NKI).” 2. "How do different parallelism strategies (data, pipeline, tensor) affect training performance on Neuron?" 3. “What are the best practices for optimizing matrix multiplication operations using Neuron Kernel Interface (NKI)?” 4. “Provide complete Neuron Kernel Interface (NKI) code for a matrix multiplication kernel, including imports, decorators, and explanations of key optimizations. Focus on efficient tiling and data movement strategies.” ================================================ FILE: about-neuron/announcements/index.rst ================================================ .. _announcements-main: Announcements ============= This page will be replaced by ABlog. It's here to make sure it's in the TOC. ================================================ FILE: about-neuron/announcements/neuron1.x/announce-eol-mx-before-1-5.rst ================================================ .. post:: May 01, 2023 01:00 :language: en :tags: announce-eol mxnet-neuron .. _announce-eol-mxnet-before-1-5: Announcing end of support for ``mxnet-neuron`` versions 1.5 ----------------------------------------------------------- :ref:`Neuron release 2.10 ` will be the last release that will include ``mxnet-neuron`` versions 1.5. Future Neuron releases will not include ``mxnet-neuron`` versions 1.5 Current users of those versions are advised to migrate to latest ``mxnet-neuron`` version. ================================================ FILE: about-neuron/announcements/neuron1.x/announce-eol-pt-1-5.rst ================================================ .. post:: Mar 25, 2022 :language: en :tags: announce-eol torch-neuron .. _announce-eol-pt-1-5: Announcing end of support for torch-neuron version 1.5 starting with Neuron 1.19.0 release ------------------------------------------------------------------------------------------ Starting with *Neuron 1.19.0* release, *torch-neuron version 1.5* will no longer be supported. Last release of *torch-neuron version 1.5* will be issued as part of *Neuron 1.18.0* release. Current users of those versions are advised to migrate to latest *torch-neuron* version. ================================================ FILE: about-neuron/announcements/neuron1.x/announce-eol-pt-before-1-8.rst ================================================ .. post:: Nov 22, 2022 :language: en :tags: announce-eol torch-neuron .. _announce-eol-pt-before-1-8: Announcing end of support for ``torch-neuron`` versions 1.7 and 1.8 ------------------------------------------------------------------- :ref:`Neuron release 2.5 ` will be the last release that will include ``torch-neuron`` versions 1.7 and 1.8. Future Neuron releases will not include ``torch-neuron`` versions 1.7 and 1.8. Current users of those versions are advised to migrate to latest ``torch-neuron`` version. ================================================ FILE: about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-5.rst ================================================ .. post:: Nov 22, 2022 01:00 :language: en :tags: announce-eol tensorflow-neuron .. _announce-eol-tf-before-2-5: Announcing end of support for ``tensorflow-neuron`` versions 2.5 and 2.6 ------------------------------------------------------------------------ :ref:`Neuron release 2.5 ` will be the last release that will include ``tensorflow-neuron`` versions 2.5 and 2.6. Future Neuron releases will not include ``tensorflow-neuron`` versions 2.5 and 2.6. Current users of those versions are advised to migrate to latest ``tensorflow-neuron`` version. ================================================ FILE: about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-7.rst ================================================ .. post:: May 01, 2023 01:00 :language: en :tags: announce-eol tensorflow-neuron .. _announce-eol-tf-before-2-7: Announcing end of support for ``tensorflow-neuron`` versions 2.7 ---------------------------------------------------------------- :ref:`Neuron release 2.10 ` will be the last release that will include ``tensorflow-neuron`` versions 2.7. Future Neuron releases will not include ``tensorflow-neuron`` versions 2.7 Current users of those versions are advised to migrate to latest ``tensorflow-neuron`` version. ================================================ FILE: about-neuron/announcements/neuron1.x/announcements.rst ================================================ .. post:: Feb 17, 2022 :language: en :tags: announcements .. _prev-announcements: Previous Announcements ====================== .. contents:: Table of contents :local: :depth: 1 .. _maintenance_tf21_tf24: 02/17/2022 - tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4 enter maintenance mode ------------------------------------------------------------------------------------ Starting with *Neuron 1.17.2* release, *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* are entering maintenance mode. Future releases of *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will address critical security issues only. Current users of those versions are advised to migrate to latest *tensorflow-neuron* version. 10/27/2021 - Introducing Neuron Runtime 2.x (libnrt.so) ------------------------------------------------------- Starting with *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and is replaced by *Neuron Runtime 2.x*, a shared library named (``libnrt.so``). For more information on Runtime 1.x see :ref:`Neuron Runtime 1.x enters maintenance mode `. For more information please see :ref:`introduce-libnrt`. .. _maintenance_rtd: 10/27/2021 - Neuron Runtime 1.x (``neuron-rtd``) enters maintenance mode ------------------------------------------------------------------------ Starting with *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and replaced with *Neuron Runtime 2.x*, a shared library named ``libnrt.so``. Future releases of *Neuron Runtime 1.x* (``neuron-rtd``) will address critical bug fixes and security issues only. Previous releases of *Neuron Runtime 1.x* (``neuron-rtd``) will continue to be available via ``rpm`` and ``deb`` packages. For more information please see: * :ref:`introduce-libnrt` * :ref:`install-guide-index` * :ref:`neuron-maintenance-policy` .. _maintenance_mxnet_1_5: 10/27/2021 - Neuron support for *Apache MXNet 1.5* enters maintenance mode -------------------------------------------------------------------------- Starting *Neuron release 1.16.0*, Neuron support for *MXNet 1.5* is entering maintenance mode. Future releases of Neuron supporting *MXNet 1.5* will address critical bug fixes and security issues only. Previous releases of *Apache MXNet 1.5* will continue to be available via ``pip`` packages. Current users of *MXNet Neuron 1.5* can migrate their applications to *MXNet Neuron 1.8*, for more information about MXNet Neuron support and how to upgrade to latest *MXNet Neuron 1.8*, please see visit :ref:`neuron-mxnet`. .. _maintenance_neuron-cli: 10/27/2021 - ``neuron-cli`` enters maintenance mode --------------------------------------------------- Starting *Neuron release 1.16.0*, with the introduction of *Neuron Runtime 2.x*, ``neuron-cli`` is entering maintenance mode. ``neuron-cli`` functionality will be available only if *Neuron Runtime 1.x* (``neuron-rtd``) is being used by the application. If the application is using *Neuron Runtime 2.x* shared library(``libnrt.so``), ``neuron-cli`` functionality will not be available. If you have used ``neuron-cli`` in previous releases, and you are migrating to newer Neuron releases where applications require *Neuron Runtime 2.x* shared library, please see the below :ref:`neuron-cli-mntnce-faq`. Future releases of ``neuron-cli`` will address critical bug fixes and security issues only. Previous releases of ``neuron-cli`` will continue to be available via ``rpm`` and ``deb`` packages. .. _eol-ncg: 10/27/2021 - End of support for NeuronCore Groups (NCG) ------------------------------------------------------- Before the introduction of *Neuron Runtime 2.x*, NeuronCore Group (NCG) has been used by Neuron Runtime 1.x to define an execution group of one or more NeuronCores where models can be loaded and executed. It also provided separation between processes. With the introduction of *Neuron Runtime 2.x*, the strict separation of NeuronCores into groups is no longer needed and NeuronCore Groups (NCG) is deprecated. *Neuron Runtime 2.x* enables each process to own a set of NeuronCores, and within each process, Neuron Runtime 2.x supports loading and executing multiple models on separate , different or overlapping sets of NeuronCores. Please note that ``NEURONCORE_GROUP_SIZES`` environment variable is in the process of being :ref:`unsupported `, and for a transition period ``NEURONCORE_GROUP_SIZES`` can be used to preserve the old NeuronCore Group behavior. The frameworks internally would convert ``NEURONCORE_GROUP_SIZES`` to use runtime's new mode of mapping models to NeuronCores. For more information see details about ``NEURON_RT_VISIBLE_CORES`` at :ref:`nrt-configuration` and and :ref:`neuron-migrating-apps-neuron-to-libnrt`. .. _eol-ncgs-env: 10/27/2021 - Announcing end of support for ``NEURONCORE_GROUP_SIZES`` --------------------------------------------------------------------- ``NEURONCORE_GROUP_SIZES`` environment variable is in the process of being deprecated, future Neuron releases may no longer support the ``NEURONCORE_GROUP_SIZES`` environment variable. Please start using ``NEURON_RT_VISIBLE_CORES`` instead. See :ref:`eol-ncg`, :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information. .. _neuron-cli-mntnce-faq: Frequently Asked questions (FAQ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Is there another tool that provide the same functionality as ``neuron-cli list-model``? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes, please see :ref:`neuron-ls-ug` or :ref:`neuron-monitor-ug`. Is there another tool that provide the same functionality as ``neuron-cli create-ncg``, ``neuron-cli destroy-ncg``, and ``neuron-cli list-ncg``? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No, these functionalities are no longer needed with *Neuron Runtime 2.x*,NeuronCore Groups (NCG) :ref:`is deprecated ` and ``NEURONCORE_GROUP_SIZES`` environment variable :ref:`is in the process of being deprecated `, Please start using ``NEURON_RT_VISIBLE_CORES`` instead. See :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information. Is there another tool that provide the same functionality as ``neuron-cli reset``? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ No, this functionality is no longer needed with *Neuron Runtime 2.x*. Before introducing ``libnrt.so``, in certain cases after an application crashed models had to be unloaded manually by calling neuron-cli reset. With ``libnrt.so``, applications runs in the context of the ``libnrt.so`` shared library and when an application exits the Neuron driver will free all resources associated with the application. For more information please see: * :ref:`introduce-libnrt` * :ref:`neuron-tools` * :ref:`install-guide-index` * :ref:`neuron-maintenance-policy` .. _eol-conda-packages: 05/28/2021 - End of support for Neuron Conda packages in Deep Learning AMI starting Neuron 1.14.0 ------------------------------------------------------------------------------------------------- 05/28/2021 - Starting with Neuron SDK 1.14.0, we will no longer support conda packages to install Neuron SDK framework in DLAMI and we will no longer update conda packages used to install Neuron SDK framework (Neuron conda packages) with new versions. Starting with Neuron SDK 1.14.0, pip packages (Neuron pip packages) will be used to install Neuron SDK framework in DLAMI conda environment. To upgrade Neuron SDK framework DLAMI users should use pip upgrade commands instead of conda update commands. Instructions are available in this blog and in Neuron SDK documentation (:ref:`setup-guide-index`). Starting with Neuron SDK 1.14.0, run one of the following commands to upgrade to latest Neuron framework of your choice: * To upgrade PyTorch Neuron: .. code-block:: source activate aws_neuron_pytorch_p36 pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com * To upgrade TensorFlow Neuron: .. code-block:: source activate aws_neuron_tensorflow_p36 pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com * To upgrade MXNet Neuron: .. code-block:: source activate aws_neuron_mxnet_p36 pip install --upgrade torch-neuron neuron-cc[tensorflow] torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com For more information please check the `blog `__. .. _eol-ubuntu16: 05/01/2021 - End of support for Ubuntu 16 starting Neuron 1.14.0 ---------------------------------------------------------------- Ubuntu 16.04 entered end of life phase officially in April 2021 (see https://ubuntu.com/about/release-cycle) and will not receive any public software or security updates. Starting with Neuron SDK 1.14.0, Ubuntu 16 is no longer supported for Neuron, users who are using Ubuntu 16 are requested to migrate to Ubuntu18 or Amazon Linux 2. Customers who choose to upgrade libc on Ubuntu 16 to work with Neuron v1.13.0 (or higher versions) are highly discouraged from doing that since Ubuntu 16 will no longer receive public security updates. .. _eol-classic-tensorboard: 05/01/2021 - End of support for classic TensorBoard-Neuron starting Neuron 1.13.0 and introducing Neuron Plugin for TensorBoard ------------------------------------------------------------------------------------------------------------------------------- Starting with Neuron SDK 1.13.0, we are introducing :ref:`Neuron Plugin for TensorBoard ` and we will no longer support classic TensorBoard-Neuron. Users are required to migrate to Neuron Plugin for TensorBoard. Starting with Neuron SDK 1.13.0, if you are using TensorFlow-Neuron within DLAMI Conda environment, attempting to run ``tensorboard`` with the existing version of TensorBoard will fail. Please update the TensorBoard version before installing the Neuron plugin by running ``pip install TensorBoard --force-reinstall``, for installation instructions see :ref:`neuron-plugin-tensorboard`. Users who are using Neuron SDK releases before 1.13.0, can find classic TensorBoard-Neuron documentation at `Neuron 1.12.2 documentation `__. For more information see see :ref:`neuron-tensorboard-rn` and :ref:`neuron-plugin-tensorboard`. .. _eol_python_3_5: 02/24/2021 - End of support for Python 3.5 ------------------------------------------- As Python 3.5 reached end-of-life in October 2020, and many packages including TorchVision and Transformers have stopped support for Python 3.5, we will begin to stop supporting Python 3.5 for frameworks, starting with PyTorch-Neuron version :ref:`neuron-torch-11170` in this release. You can continue to use older versions with Python 3.5. 11/17/2020 - End of support for ONNX ------------------------------------ ONNX support is limited and from this version onwards we are not planning to add any additional capabilities to ONNX. We recommend running models in TensorFlow, PyTorch or MXNet for best performance and support. 07/16/2020 - End of support for PyTorch 1.3 -------------------------------------------- Starting this release we are ending the support of PyTorch 1.3 and migrating to PyTorch 1.5.1, customers are advised to migrate to PyTorch 1.5.1. ================================================ FILE: about-neuron/announcements/neuron1.x/eol-ncgs-env_2.rst ================================================ .. post:: Mar 25, 2022 :language: en :tags: announce-eol Announcing end of support for ``NEURONCORE_GROUP_SIZES`` starting with Neuron 1.20.0 release -------------------------------------------------------------------------------------------- Starting with Neuron SDK 1.20.0, ``NEURONCORE_GROUP_SIZES`` environment variable will no longer be supported. Setting ``NEURONCORE_GROUP_SIZES`` environment variable will no longer affect applications behavior. Current customers using ``NEURONCORE_GROUP_SIZES`` environment variable are advised to use ``NEURON_RT_VISIBLE_CORES`` environment variable or ``NEURON_RT_NUM_CORES`` environment variable instead. See :ref:`eol-ncg`, :ref:`nrt-configuration` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information. ================================================ FILE: about-neuron/announcements/neuron1.x/eol-pt-15.rst ================================================ .. post:: Apr 29, 2022 :language: en :tags: eol .. _eol-pt-15: End of support for torch-neuron version 1.5 ------------------------------------------- Starting with *Neuron 1.19.0* release, *torch-neuron 1.5* will no longer be supported, and no further releases of *torch-neuron version 1.5* will be issued. Current users of torch-neuron version 1.5 are advised to migrate to latest *torch-neuron* version. ================================================ FILE: about-neuron/announcements/neuron1.x/eol-tf-21-24.rst ================================================ .. post:: Mar 25, 2022 :language: en :tags: eol .. _eol-tf-21-24: End of support for tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4 -------------------------------------------------------------------- Starting with *Neuron 1.18.0* release, *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will no longer be supported, and no further releases of *tensorflow-neuron versions 2.1, 2.2, 2.3 and 2.4* will be issued. Current users of those versions are advised to migrate to latest *tensorflow-neuron* version. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-component-change.rst ================================================ .. post:: December 21, 2023 :language: en :tags: announce-name-change, neuron-component .. _announce-component-name-change: Announcing Name Change for Neuron Components --------------------------------------------- Starting with :ref:`Neuron release 2.16 `, the name of the following Neuron components will change as follows: ======================= =================== ==================== Package name Current Name New Name ======================= =================== ==================== torch-neuronx PyTorch Neuron PyTorch NeuronX tensorflow-neuronx TensorFlow Neuron TensorFlow NeuronX neuronx-cc Neuron Compiler NeuronX Compiler aws-neuronx-runtime-lib Neuron Runtime NeuronX Runtime tensorflow-neuronx Transformers Neuron Transformers NeuronX neuronx-distributed Neuron Distributed NeuronX Distributed ======================= =================== ==================== ================================================ FILE: about-neuron/announcements/neuron2.x/announce-correction-neuron-driver-support-inf1.rst ================================================ .. post:: March 12, 2026 :language: en :tags: announce-correction-neuron-driver-inf1, neuron-driver-version, inf1 .. _announce-correction-neuron-driver-inf1-support: Correction: Neuron Driver support for Inf1 — version 2.24 (not 2.21) --------------------------------------------------------------------- We are correcting a previous announcement regarding last Neuron Driver version to support Inf1. The last supported version is 2.24 Neuron driver versions above 2.24 only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types). For ``Inf1`` instance users, only Neuron driver version 2.24 will remain supported with regular security patches. As part of this correction, Neuron Driver version **2.24.13.0** has been released as a patch for ``Inf1`` users, adding compatibility with Linux kernel 6.18. ``Inf1`` instance users are advised to pin the Neuron driver version to ``2.24.*`` in their installation script: For Ubuntu: .. code-block:: bash sudo apt-get install aws-neuronx-dkms=2.24.* -y For Amazon Linux 2 / Amazon Linux 2023: .. code-block:: bash sudo yum install aws-neuronx-dkms-2.24.* -y Refer to the :ref:`Neuron Driver release notes ` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-deprecation-containers-rtd.rst ================================================ .. post:: December 20, 2023 :language: en :tags: announce-deprecating-containers, runtime-rtd .. _announce-update-containers: Announcing end-of-support for Neuron Containers with Runtime 1.x ----------------------------------------------------------------- :ref:`Neuron release 2.3 ` was the last release to support Neuron Runtime 1.x (neuron-rtd). Current users of Neuron DLC/DLAMI with Neuron Runtime 1.x are required to :ref:`update their image ` to support latest Neuron Runtime versions. For instructions, see the :ref:`Setup Guide `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-deprecation-nxd-path-trace-api.rst ================================================ .. post:: September 18, 2025 :language: en :tags: announce-deprecation-nxd-path-trace-api, al2 .. _announce-deprecation-nxd-path-trace-api: Announcing the deprecation of the NeuronX Deep Learning Inference API path_trace function ----------------------------------------------------------------------------------------- :ref:`Neuron release 2.26.0 ` is the last release supporting ``parallel_model_trace``. This NxD Inference function will be deprecated in the next version of the Neuron SDK in favor of the ``ModelBuilder.trace()`` method, which provides a more robust and flexible approach for tracing and compiling models for Neuron devices, enabling more advanced features such as weight layout optimization support, as well as other quality-of-life and stability improvements for SPMD tracing. For customers directly invoking ``parallel_model_trace``, they can now use ModelBuilderV2 APIs. For more details on these APIS, see :ref:`nxd-core-model-builder-v2`. For customers that are directly using models in NxDI, there is no impact since NxDI models are already built on MBv1 which has no issues. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-deprecation-transformer-flag.rst ================================================ .. post:: September 15, 2023 :language: en :tags: announce-end-of-support, transformer-flag .. _announce-end-of-support-transformer-flag: Announcing end-of-support for ``--model-type=transformer-inference`` compiler flag ----------------------------------------------------------------------------------- Starting with :ref:`Neuron release 2.14 `, the ``--model-type=transformer-inference`` compiler flag is deprecated. Neuron SDK users using the ``--model-type=transformer-inference`` compiler flag are highly encouraged to migrate to the ``--model-type=transformer`` compiler flag. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eol-megatron-lm.rst ================================================ .. post:: Aug 8, 2023 :language: en :tags: announce-eol, trn1, trn1n .. _announce-eol-megatronlm: Announcing end of support for AWS Neuron reference for Megatron-LM ------------------------------------------------------------------- :ref:`Neuron release 2.12 ` will be the last release that will include support for `AWS Neuron reference for Megatron-LM `_. Future releases will not include Neuron support for Megatron-LM. Current Neuron Megatron-LM users are advised to migrate to `AWS Neuron reference for NeMo Megatron `_ or `Neuron Distributed `_. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eol-python-3-7.rst ================================================ .. post:: Jul 26, 2023 10:00 :language: en :tags: announce-eol, python37 .. _announce-eol-python37: Announcing end of support for ``Python 3.7`` --------------------------------------------- :ref:`Neuron release 2.12 ` will be the last release that will include support for ``Python 3.7`` . Future Neuron releases will not include support for ``Python 3.7`` Current users using ``Python 3.7`` are advised to migrate to latest supported Python version. (``Python 3.10`` ) ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eol-ubuntu-18.rst ================================================ .. post:: Jul 13, 2023 11:00 :language: en :tags: announce-eol, ubuntu18 .. _announce-eol-ubuntu18: Announcing end of support for ``Ubuntu 18`` ------------------------------------------- :ref:`Neuron release 2.12 ` will be the last release that will include support for ``Ubuntu 18`` . Future Neuron releases will not include support for ``Ubuntu 18`` Current users using ``Ubuntu 18`` are advised to migrate to ``Ubuntu 20`` version. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-al2.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-eos-al2, al2 .. _announce-eos-al2: Announcing end of support for Neuron Runtime support of Amazon Linux 2 (AL2) ------------------------------------------------------------------------------ :ref:`Neuron release 2.19 ` will be the last release that will include Neuron Runtime support for ``Amazon Linux 2`` . Future Neuron releases will not include Neuron Runtime support for ``Amazon Linux 2``. Current users using ``Amazon Linux 2`` are advised to migrate to Amazon Linux 2023 (AL2023) or Ubuntu 20/22. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-beta-pytorch-neuroncore-placement-apis.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-no-longer-support-pytorch-neuroncore-placement .. _announce-no-longer-support-beta-pytorch-neuroncore-placement-apis: Announcing end of support for Beta PyTorch NeuronCore Placement APIs starting next release -------------------------------------------------------------------------------------------- :ref:`Neuron Release 2.24 ` is the last release to support the Beta PyTorch NeuronCore Placement APIs. Customers using Beta PyTorch NeuronCore Placement APIs are recommended to migrate to using generally available (GA) PyTorch Neuron Core Placement APIs. Please refer to the :ref:`PyTorch Neuron documentation ` for guidance on using the supported functionality. Any models using the beta APIs will need to be updated to use the generally available APIs. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-bf16-vars.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-no-longer-support-xla-env-vars .. _announce-eos-longer-support-xla-bf16-vars: Announcing end of support XLA_USE_BF16 and XLA_DOWNCAST_BF16 environment variables starting next release --------------------------------------------------------------------------------------------------------- :ref:`Neuron Release 2.24 ` will be the last release to support the following environment variables: - XLA_USE_BF16 - XLA_DOWNCAST_BF16 **I currently utilize these environment variables in my model code. What do I do?** Customers are recommended to migrate to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert their model to BF16 format. For detailed migration guidance, please refer to :ref:`migration_from_xla_downcast_bf16`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-block-dimension-nki.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-eos-block-dimension-nki .. _announce-eos-block-dimension-nki: Announcing end of support for NKI block dimension starting next release -------------------------------------------------------------------------- :ref:`Neuron release 2.24 ` will be the last release to include support for the NKI block dimension in NKI tensor creation routines. Starting with this release, using the block dimension will generate EOS warnings. In the next release (Neuron Release 2.25), these warnings will be upgraded to errors. Customers are recommended to refer to the :ref:`nki_block_dimension_migration_guide` for detailed instructions on updating their code. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-dlami-ubuntu-22-04.rst ================================================ .. post:: December 18, 2025 :language: en :tags: announce-eos-dlami-ubuntu-22-04 .. _announce-eos-dlami-ubuntu-22-04: Announcing End of Support for Ubuntu 22.04 single framework DLAMIs for PyTorch and JAX in future release ======================================================================================================== Ubuntu 22.04 single framework DLAMIs for PyTorch and JAX will reach end of support in a future release. Customers are advised to use multi-framework or previously released DLAMIs for Ubuntu 22.04. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-dlami.rst ================================================ .. post:: April 24, 2024 :language: en :tags: announce-eos-dlami, neuron-dlami .. _announce-eos-dlami: Announcing end of support for Neuron Release 2.18.0 Deep Learning AMIs ------------------------------------------------------------------------ We are announcing end of support for :ref:`Neuron release 2.18.0 ` Deep Learning AMIs. DLAMIs released between March 26,2024 (2024-03-26) and April 10, 2024 (2024-04-10) were shipped without the audit package. The following are the affected DLAMIs: - Deep Learning AMI Neuron (Ubuntu 22.04) 20240401 - Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240328 - Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240402 - Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) 20240409 - Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240328 - Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240402 - Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) 20240409 - Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240328 - Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240402 - Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) 20240409 - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240328 - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240402 - Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) 20240409 - Deep Learning Base Neuron AMI (Amazon Linux 2) 20240401 - Deep Learning Base Neuron AMI (Amazon Linux 2) 20240408 - Deep Learning Base Neuron AMI (Ubuntu 20.04) 20240401 - Deep Learning Base Neuron AMI (Ubuntu 20.04) 20240408 Current users of the above :ref:`Neuron release 2.18 ` Deep Learning AMIs are required to upgrade to the latest DLAMIs in order to consume those with the audit package installed. For instructions to upgrade to the latest AMI, see the :ref:`DLAMI User Guide ` or find the specific DLAMI image id for the latest Neuron release with :ref:`SSM parameters `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-inf1-virtual-environments.rst ================================================ .. post:: December 18, 2025 :language: en :tags: announce-eos-inf1-virtual-environments .. _announce-eos-inf1-virtual-environments: Neuron no longer supports Inf1 virtual environments and AMIs starting with Neuron 2.27 ====================================================================================== Starting with Neuron release 2.27, Neuron no longer supports Inf1 virtual environments and AMIs. If you are a customer who is currently using Inf1 virtual environments or AMIs, use Neuron DLAMIs with Neuron version 2.26.1 or earlier. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-jax-neuronx-nki-call.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-eos-jax-neuronx-features .. _announce-eos-jax-neuronx-features-2: Announcing end of support for ``jax_neuronx.nki_call`` API in ``jax-neuronx`` from starting next release ------------------------------------------------------------------------------------------------------------ Starting with :ref:`Neuron Release 2.23 `, Neuron will end support for ``jax_neuronx.nki_call`` API in ``jax-neuronx`` package. For a full list of features that require ``jax-neuronx``, please see :ref:`jax-neuron-known-issues`. Customers using ``jax_neuronx.nki_call`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-megatronlm-2-13.rst ================================================ .. post:: Aug 28, 2023 :language: en :tags: announce-eos, trn1, trn1n .. _announce-eos-megatronlm: AWS Neuron reference for Megatron-LM no longer supported ---------------------------------------------------------- :ref:`Neuron release 2.13 ` no longer includes support for `AWS Neuron reference for Megatron-LM `_. Current Neuron Megatron-LM users are required to migrate to `AWS Neuron reference for NeMo Megatron `_ or `Neuron Distributed `_. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-mllama-checkpoint.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-eos-mllama-checkpoint .. _announce-eos-mllama-checkpoint: Announcing end of support for mllama 3.2 Meta Checkpoint API starting next release -------------------------------------------------------------------------------------- :ref:`Neuron Release 2.23 ` will be the last release to include support for the mllama 3.2 Meta checkpoint API. In the next release (Neuron 2.24), Neuron will end support. All previously converted checkpoints will continue to function without disruption. Customers' existing workflows and converted models remain fully operational. For new checkpoint conversions, the HuggingFace solution provides equivalent functionality. Customers are recommended to use HuggingFace's official conversion script, available here: `Hugging Face Conversion Script `_ ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-multiframework-dlamis-inf1.rst ================================================ .. post:: April 24, 2024 :language: en :tags: announce-eos-dlamis-inf1, dlami-inf1 .. _announce-update-multiframework-dlami: Announcing end of support for Neuron virtual environments in AWS Deep Learning AMI (Amazon Linux 2) ---------------------------------------------------------------------------------------------------- :ref:`Neuron release 2.18.2 ` will be the last release that will include support for the following virtual environments in AWS Deep Learning AMI (Amazon Linux 2): - ``aws_neuron_pytorch_p38: PyTorch 1.13, Python 3.8`` - ``aws_neuron_tensorflow2_p38: TensorFlow 2.10, Python 3.8`` Future releases will not include Neuron support for these virtual environments. Current users of Neuron virtual environments in `AWS Deep Learning AMI (Amazon Linux 2) `_ are required to migrate to the `Neuron multi framework DLAMI `_. To see a list of Neuron supported virtual environments, please refer to :ref:`Neuron Multi Framework DLAMI User Guide `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-nemo.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-eos-nemo-megatron .. _announce-eos-nnm: Announcing end of support for Neuron support for NeMo Megatron starting next release ------------------------------------------------------------------------------------- Starting with Neuron Release 2.23, Neuron will end support for :ref:`NeMo Megatron `. We recommend all users of :ref:`NeMo Megatron ` to migrate their training workloads to :ref:`NxD Training `. Please refer to :ref:`Neuron NeMo Megatron to NeuronX Distributed Training Migration Guide ` for guidance. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neuron-det.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-eos-neuron-det .. _announce-eos-neuron-det: Announcing end of support for Neuron DET tool starting next release ------------------------------------------------------------------- :ref:`Neuron Release 2.21 ` will be the last release to support the Neuron Distributed Event Tracing (NDET/neuron-det) tool. We recommend all customers using the NDET tool for debugging runtime hangs/issues in large-scale settings transition to the Neuron Profiler 2.0. This tool offers the same runtime function level traces with improved ease of use and optimized performance. For more information on Neuron Profiler 2.0, please refer to the :ref:`neuron-profiler-2-0-guide`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neuron-driver-support-inf1.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-eos-neuron-driver-2.21-version, neuron-driver-version, inf1 .. _announce-upcoming-neuron-driver-2.21-version support changes for inf1 instance: Upcoming changes to Neuron driver 2.21 support for Inf1 starting Neuron 2.26 release ------------------------------------------------------------------------------------ .. note:: This announcement has been superseded. The correct last supported Neuron driver version for ``Inf1`` is **2.24**, not 2.21. See :ref:`announce-correction-neuron-driver-inf1-support` for details. Starting with Neuron Release 2.26, Neuron driver versions above 2.21 will only support non-Inf1 instances (such as ``Trn1``, ``Inf2``, or other instance types). For ``Inf1`` instance users, Neuron driver versions < 2.21 will remain supported with regular security patches. ``Inf1`` instance users are advised to pin the Neuron driver version to ``2.21.*`` in their installation script. Refer to the :ref:`Neuron Driver release [2.22.2.0] ` for detailed instructions on pinning the Neuron Driver. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-2.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-eos-neuron-profiler .. _announce-eos-neuron-profiler-2: Neuron Explorer Replaces Neuron Profiler, Starting with Neuron 2.29 ------------------------------------------------------------------- Starting with Neuron 2.29, **Neuron Profiler and Profiler 2.0 (UI and CLI) will reach end of support** and be replaced by Neuron Explorer. If you are currently using the Neuron Profiler, migrate to Neuron Explorer before the Neuron 2.29 release. For migration guidance, see the :doc:`/tools/neuron-explorer/migration-faq`. What is Neuron Explorer? ~~~~~~~~~~~~~~~~~~~~~~~~ Neuron Explorer is the next-generation suite of tools, guiding developers through their development journey on Trainium. It enables ML performance engineers to: * **Trace execution end-to-end** — from source code down to hardware operations. * **Analyze model behavior at every layer of the stack** — with detailed breakdowns per operation, per core, and per device. * **Profile distributed workloads** — with native support for multi-node and multi-worker analysis at scale. For more details, see :doc:`/tools/neuron-explorer/index`. How does this impact current Neuron Profiler users? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. important:: Neuron strongly recommends migrating to Neuron Explorer **before** the Neuron 2.29 release. There are two things to be aware of when migrating: * **Existing NTFF profile files are supported**, but must be reprocessed before they can be viewed in the Neuron Explorer UI. * **New features require new profiles.** To access the full set of Neuron Explorer capabilities, you must recapture your profiles using the updated tooling. For detailed migration steps, see the :doc:`/tools/neuron-explorer/migration-faq` and the :ref:`Neuron Explorer FAQ `. What happens to Neuron Profiler after Neuron 2.29? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After Neuron 2.29, Neuron Profiler will: * **No longer receive** bug fixes, feature updates, or technical support. * **No longer be distributed** as part of the Neuron SDK. If you need to continue using Neuron Profiler temporarily, you must pin your environment to Neuron 2.28 or earlier. This is **not recommended**, as you will not receive any SDK updates or security fixes. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-v230.rst ================================================ .. post:: March 31, 2026 :language: en :tags: announce-eos-neuron-profiler .. _announce-eos-neuron-profiler-v230: Neuron Explorer Replaces Neuron Profiler, Starting with Neuron 2.30.0 ---------------------------------------------------------------------- Starting with Neuron 2.30.0, Neuron Profiler and Profiler 2.0 (UI and CLI) will reach end of support and be replaced by Neuron Explorer. If you are currently using the Neuron Profiler, migrate to Neuron Explorer before the Neuron 2.30.0 release. For migration guidance, see the :doc:`/tools/neuron-explorer/migration-faq`. What is Neuron Explorer? ~~~~~~~~~~~~~~~~~~~~~~~~ Neuron Explorer is the next-generation suite of tools, guiding developers through their development journey on Trainium. It enables ML performance engineers to: * **Trace execution end-to-end** — from source code down to hardware operations. * **Analyze model behavior at every layer of the stack** — with detailed breakdowns per operation, per core, and per device. * **Profile distributed workloads** — with native support for multi-node and multi-worker analysis at scale. For more details, see :doc:`/tools/neuron-explorer/index`. How does this impact current Neuron Profiler users? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. important:: Neuron strongly recommends migrating to Neuron Explorer **before** the Neuron 2.30.0 release. There are two things to be aware of when migrating: * **Existing NTFF profile files are supported**, but must be reprocessed before they can be viewed in the Neuron Explorer UI. * **New features require new profiles.** To access the full set of Neuron Explorer capabilities, you must recapture your profiles using the updated tooling. For detailed migration steps, see the :doc:`/tools/neuron-explorer/migration-faq` and the :ref:`Neuron Explorer FAQ `. What happens to Neuron Profiler after Neuron 2.30.0? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After Neuron 2.30.0, Neuron Profiler will: * **No longer receive** bug fixes, feature updates, or technical support. * **No longer be distributed** as part of the Neuron SDK. If you need to continue using Neuron Profiler temporarily, you must pin your environment to Neuron 2.28 or earlier. This is **not recommended**, as you will not receive any SDK updates or security fixes. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announce-eos-neuron-profiler .. _announce-eos-neuron-profiler: End of Support for Neuron Profiler and Neuron Profiler 2.0 UI and CLI coming in a future Neuron release -------------------------------------------------------------------------------------------------------- What's changing ^^^^^^^^^^^^^^^^ Neuron will end support for the legacy Neuron Profiler and Neuron Profiler 2.0 UI and CLI tools in a coming release (planned for v2.29.0). We launched Neuron Explorer in Neuron SDK 2.27, replacing these tools with a unified developer experience that will include device and system profiling in a single view, eager mode support, enhanced memory profiling, improved visualization capabilities, as well as support for the full developer lifecycle. Why are we making this change ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Consolidating to Neuron Explorer allows us to focus development efforts on a single, modern profiling solution while providing you with enhanced features and a better user experience. How does this impact you ^^^^^^^^^^^^^^^^^^^^^^^^^ If you are currently using the legacy Neuron Profiler UI or CLI, please do the following before Neuron 2.29: * Begin using Neuron Explorer (available since Neuron 2.27). See https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/get-started.html# * Reprocess your existing NTFF files for the new UI: see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-profile-workload.html Note: Neuron Explorer is backwards compatible with existing Profiler NTFF files, but they must be reprocessed to view in the new UI. For new features (eager mode, memory viewer, certain NKI tools), you'll need to recapture profiles. After Neuron 2.29.0 releases (planned): * Legacy UI will no longer receive bug fixes, updates, or technical support * To continue using legacy UI, you must pin to the last version that supports it (not recommended) ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neurondevice-version.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-eos-neuron-device-version, neuron-device-version .. _announce-eos-neuron-device-version: Announcing end of support for 'neuron-device-version' field in neuron-monitor ------------------------------------------------------------------------------- :ref:`Neuron release 2.19 ` will be the last release to include the field 'neuron-device-version' in neuron-monitor. In future releases, customers who are using the field 'neuron-device-version' will instead need to use 'instance_type' field in the 'instance_info' section and the 'neuroncore_version' field to obtain neuron device information. Please see :ref:`neuron-monitor-ug` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-neurondevice.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-eos-neuron-device, neuron-device .. _announce-eos-neurondevice: Announcing end of support for 'neurondevice' resource name in Neuron Device K8s plugin ---------------------------------------------------------------------------------------- :ref:`Neuron release 2.19 ` will be the last release to include resource name 'neurondevice'. Neuron device plugin is a Neuron Software component that gets installed in Kubernetes environment. The resource name 'neurondevice' enables customers to allocate devices to the Neuron K8s container. In future releases, we will rename resource name 'neurondevice' to 'neuron' to maintain consistency. Customers who are using the resource name 'neurondevice' in their YAML file will need to update to use 'neuron'. Please see :ref:`k8s-neuron-device-plugin` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-nxd-examples.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-eos-nxd-examples .. _announce-eos-nxd-examples: Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release -------------------------------------------------------------------------------------------------------------- :ref:`Neuron Release 2.21 ` will be the last release to include NxD Core repository inference examples under the NxD Core repository: https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference. Starting with :ref:`Neuron Release 2.21 `, the models and modules in NxD Core inference examples are now available through NxD Inference package. We recommend customers to update their applications to use examples from the NxD Inference repository. See :ref:`nxdi-overview` In Neuron Release 2.22, the NxD Core inference samples will only reside under the NxD Inference repository. Current users are advised to start using samples/tutorials under the NxD Inference repository: https://github.com/aws-neuron/neuronx-distributed-inference. I currently utilize an inference sample from the NxD Core repository in my model code. What do I do? ====================================================================================================== If your applications depend on the inference examples from NxD Core, we recommend that you update your code to use the new NxD Inference package. With NxD Inference, you can import and use these models and modules in your applications. Any models compiled with inference code from the NxD Core repository will need to be re-compiled. Please refer to the :ref:`nxd-examples-migration-guide` for guidance. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-nxdt-nxd-core-training.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-eos-nxdt .. _announce-eos-nxdt-nxd-core-training: Announcing end of support for NxDT and NxD Core Training APIs starting with Neuron SDK release 2.29 (PyTorch 2.10) ------------------------------------------------------------------------------------------------------------------- Neuron SDK release 2.28 (PyTorch 2.9) will be the last release to include the NeuronX Distributed Training (NxDT) library. Starting with Neuron SDK release 2.29 (PyTorch 2.10), the use of NxD Core training APIs and the PyTorch/XLA package for training will no longer be supported. How does this impact you? ~~~~~~~~~~~~~~~~~~~~~~~~~~ Existing NxDT/NxD Core users should stay on Neuron SDK 2.28 (PyTorch 2.9) until ready to migrate to native PyTorch on Neuron. Native PyTorch on Neuron uses standard distributed primitives (DTensor, FSDP, DDP). A migration guide will be published in a coming release. See :doc:`Native PyTorch on Neuron Overview ` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-probuf.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-eos-probuf, probuf .. _announce-eos-probuf319: Announcing end of support for Probuf versions <= 3.19 for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries ------------------------------------------------------------------------------------------------------------------------------------ :ref:`Neuron release 2.19 ` will be the last release that will include Probuf <= 3.19 support for PyTorch NeuronX, NeuronX Distributed, and Transformers NeuronX libraries. Future Neuron releases will not include Probuf <= 3.19 support for PyTorch NeuronX. Current PyTorch NeuronX, NeuronX Distributed, or Transformers NeuronX users using Probuf <= 3.19 are advised to migrate to latest supported Probuf version. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pt-versions.rst ================================================ .. post:: December 20, 2023 :language: en :tags: announce-eos-pt, pt-versions .. _announce-eos_pytorch110: Announcing End of Support for PyTorch Neuron version 1.10 ----------------------------------------------------------- :ref:`Neuron release 2.16 ` will be the last release that will include support for PyTorch Neuron version 1.10. Future Neuron releases will not include support for PyTorch Neuron version 1.10. Current users of PyTorch Neuron version 1.10 are advised to migrate to latest supported PyTorch Neuron version. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pt2.rst ================================================ .. post:: December 20, 2023 :language: en :tags: announce-eos-pt-two, pt-versions-two .. _announce-eos_pytorch2: Announcing End of Support for PyTorch NeuronX version 2.0 (beta) ----------------------------------------------------------------- :ref:`Neuron release 2.16 ` will be the last release that will include beta support for PyTorch NeuronX version 2.0 (beta). Future Neuron releases will not include support for PyTorch NeuronX version 2.0. Current users of PyTorch NeuronX version 2.0 are advised to upgrade to PyTorch NeuronX 2.1 (beta). ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-python38.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-python-eos .. _announce-python-eos: Announcing end of support for Python 3.8 in future releases ----------------------------------------------------------- Due to Python 3.8 reaching its end-of-life status, future Neuron releases will no longer include support for this version. ========================= How does this impact me? ========================= I currently use Python 3.8. ============================ To avoid security issues and bugs, current users of Python 3.8 are advised to migrate to a Neuron supported Python version (3.9, 3.10, or 3.11) as Neuron will no longer support Python 3.8. For a list of supported Python versions according to Neuron package, please see :ref:`latest-neuron-release-artifacts`. I currently use Ubuntu 20, which has Python 3.8 as the default version. Am I affected? ======================================================================================= Although Python 3.8 is the default version of Ubuntu 20.04, Neuron will continue to support Ubuntu 20.04 until April 2025, due to extended standard support of Python 3.8 in Ubuntu 20. Please see the :ref:`sdk-maintenance-policy` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-1-3.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-eos-pytorch-version .. _announce-eos-pytorch-eos-113: Announcing end of support for PyTorch 1.13 starting next release ---------------------------------------------------------------- :ref:`Neuron Release 2.21 ` is the last release to support PyTorch 1.13, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS) for Trn1, Trn2, and Inf2 instances. We recommend that all customers using torch-neuron 1.13, related DLCs, and DLAMIS on Trn2, Trn1, and Inf2 instances upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`. Please note that PyTorch 1.13 will continue to be supported for Inf1 instances. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-9.rst ================================================ .. post:: August 28, 2023 :language: en :tags: announce-eol, torch-neuron .. _announce-eol-pytorch19: Announcing end of support for ``torch-neuron`` version 1.9 ----------------------------------------------------------- :ref:`Neuron release 2.13 ` will be the last release that will include support for ``torch-neuron`` version 1.9. Future Neuron releases will not include support for ``torch-neuron`` version 1.9. Current users of ``torch-neuron`` version 1.9 are advised to migrate to latest supported ``torch-neuron`` version. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-1.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-eos-pytorch-version .. _announce-eos-pytorch-2-1: Announcing end of support for PyTorch 2.1 starting next release --------------------------------------------------------------- :ref:`Neuron Release 2.21 ` is the last release to support PyTorch 2.1, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS). We recommend that all customers using PyTorch 2.1, related DLCs, and DLAMIS upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8-v229.rst ================================================ .. post:: March 31, 2026 :language: en :tags: announce-eos-pytorch-version .. _announce-eos-pytorch-2-7-2-8-v229: Neuron no longer supports PyTorch versions 2.7 and 2.8 starting with Neuron 2.29 ---------------------------------------------------------------------------------- Starting with Neuron 2.29, Neuron no longer supports PyTorch versions 2.7 and 2.8. We recommend that all customers upgrade to the latest supported PyTorch version. Customers currently using PyTorch versions 2.7 and 2.8 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-eos-pytorch-version .. _announce-eos-pytorch-2-7-2-8: Announcing end of support for PyTorch versions 2.7 and 2.8 starting next release --------------------------------------------------------------------------------- :ref:`Neuron Release 2.28 ` is the last release to support PyTorch versions 2.7 and 2.8. Future Neuron releases will not include support for PyTorch versions 2.7 and 2.8. Current users of PyTorch version 2.7 or 2.8 are advised to upgrade to PyTorch 2.9. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-pytorch-profiling-api.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announce-eos-pytorch-profling-api .. _announce-eos-pytorch-profling-api: End of Support for PyTorch Experimental Profiling API starting in a future release ------------------------------------------------------------------------------------ What's changing ^^^^^^^^^^^^^^^^ Neuron will end support for the ``torch_neuronx.experimental.profiler.profile`` API in a future release of Neuron (planned for v2.29.0). This experimental API will be replaced by native PyTorch profiling support using the standard ``torch.profiler.profile()`` API. How does this impact you ^^^^^^^^^^^^^^^^^^^^^^^^^ If you are using ``torch_neuronx.experimental.profiler.profile,`` before April/May 2026: * Update your code to use native PyTorch profiling API: .. code-block:: python # Before (Experimental API) from torch_neuronx.experimental import profiler with profiler.profile(output_path="/tmp/profile") as prof: output = model(input) # After (Native API) import torch.profiler with torch.profiler.profile( activities=[torch.profiler.ProfilerActivity.NEURON], on_trace_ready=torch.profiler.tensorboard_trace_handler("/tmp/profile") ) as prof: output = model(input) After Neuron 2.29.0 releases (planned): * Experimental API will no longer be supported * To continue using the experimental API, you must pin to Neuron SDK 2.28 or earlier (not recommended) ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-tensorboard-tools.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announce-eos-tensorboard-tools .. _announce-eos-tensorboard-tools: Announcing End of Support for TensorBoard Plugin for Neuron Profiler in Neuron 2.27 ----------------------------------------------------------------------------------- Neuron 2.27 will be the last release to support TensorBoard Plugin. Future Neuron releases will not include the support for TensorBoard plugin. All customers using the TensorBoard plugin to visualize and analyze model performance are recommended to migrate to Neuron Explorer. To begin using Neuron Explorer (available since Neuron 2.27) for profiling, see :doc:`the Neuron Explorer documentation
`. Neuron Explorer was introduced with :doc:`the release of the AWS Neuron SDK version 2.27.0 `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-tensorflow-2-8-9.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-tensorflow-versions-eos .. _announce-tfx-2-8-9-eos: Announcing end of support for Tensorflow 2.8 and 2.9 starting next release ---------------------------------------------------------------------------- Starting with Neuron Release 2.23, Neuron will end support for TensorFlow 2.8 and 2.9. Future Neuron releases will not include support for Tensorflow-Neuron 2.8 and 2.9 versions. Current users of those versions are advised to migrate to latest TensorFlow version (2.10). For a list of supported versions, please see :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-tensorflow-inf2.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-eos-tensorflow .. _announce-eos-tensorflow-inf2: Announcing end of support for TensorFlow for Inferentia2 (Inf2) starting with Neuron 2.29 ------------------------------------------------------------------------------------------ :ref:`Neuron Release 2.28 ` is the last release to support TensorFlow for Inferentia2 (``Inf2``). Future Neuron releases will not include support for TensorFlow for ``Inf2`` instance users. Current Inf2 instance users are advised to use the latest PyTorch version 2.9. For a list of supported PyTorch versions, see :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-tensorflow1-x.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-tensorflow-eos, tf-versions-1-x .. _announce-tfx-eos: Announcing end of support for Tensorflow-Neuron 1.x ----------------------------------------------------- :ref:`Neuron release 2.19 ` will be the last release to support Tensorflow-Neuron 1.x. Future Neuron releases will not include support for Tensorflow-Neuron 1.x versions. Current users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-torch-neuron.rst ================================================ .. post:: September 16, 2024 :language: en :tags: announce-torch-neuron-eos, torch-neuron .. _announce-torch-neuron-eos: Announcing maintenance mode for torch-neuron 1.9 and 1.10 versions --------------------------------------------------------------------- Starting with :ref:`Neuron release 2.20 `, torch-neuron 1.9 and 1.10 versions will enter maintenance mode. Future Neuron releases will not include support for torch-neuron 1.9 and 1.10 versions. Current users of torch-neuron 1.9 and 1.10 versions are advised to migrate to torch-neuron 1.13. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-torch-neuronx-nki-jit.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-eos-torch-neuronx-nki-jit .. _announce-eos-torch-neuronx-nki-jit: Announcing end of support for ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` starting next release --------------------------------------------------------------------------------------------------------- :ref:`Neuron Release 2.23 ` will be the last release to include support for ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` package. Customers using ``torch_neuronx.nki_jit`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-u20-dlamis.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-u20-dlami-dlc-eos .. _announce-u20-dlami-dlc-eos: Announcing end of support for Ubuntu20 DLCs and DLAMIs ------------------------------------------------------ Starting with :ref:`Neuron Release 2.21 `, AWS Neuron will begin phasing out support for Ubuntu20 Deep Learning Containers (DLCs) and Deep Learning AMIs (DLAMIs). Neuron 2.21 will be the last release to provide bug fixes, and by Neuron 2.22, these offerings will no longer be available. We recommend that all customers using Ubuntu20 DLCs and DLAMIs migrate to newer versions based on Ubuntu22 or Amazon Linux 2023. For customers who need to continue using Ubuntu20, you can create custom AMIs based on the Ubuntu20 base image and install Neuron components manually. Please see :ref:`container-faq` and :ref:`neuron-dlami-overview`. Please note that this does not affect support for the base Ubuntu20 operating system, which will continue to receive updates as per our standard support policy. For more information, please see :ref:`sdk-maintenance-policy` ================================================ FILE: about-neuron/announcements/neuron2.x/announce-eos-xla-bf16.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-eos-xla-bf .. _announce-eos-xla-bf: Announcing end of support for XLA_USE_BF16 and XLA_DOWNCAST_BF16 starting next release ---------------------------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.23 `, Neuron will begin phasing out support for the ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` environment variables. In this release, usage of these variables will trigger warnings. Neuron will end support in a subsequent release, aligned with the torch-xla maintenance schedule. Customers are recommended to migrate to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert their model to BF16 format. For detailed migration guidance, please refer to :ref:`migration_from_xla_downcast_bf16`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eol-nemo-arg.rst ================================================ .. post:: Oct 26, 2023 :language: en :tags: announce-intent-end-of-support-nemo-arg, nemo-arg .. _announce-intent-deprecate-nemo-arg: Announcing End of Support for ``nemo`` option-argument ------------------------------------------------------- :ref:`Neuron release 2.15 ` will be the last release that will include support for ``nemo`` option-argument in the existing `--distribution_strategy` :ref:`compiler option `. Future releases will not include Neuron support for ``nemo`` option-argument. Users are advised to migrate to the new ``llm-training`` option-argument. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eos-opt.rst ================================================ .. post:: Oct 26, 2023 :language: en :tags: announce-intent-eos-opt, opt .. _announce-intent-eos-opt: Announcing End Of Support for OPT example in Transformers NeuronX ------------------------------------------------------------------ :ref:`Neuron release 2.15 ` will be the last release that will include OPT example in Transformers NeuronX. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eos-pt-version.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-eos-pt-two-five .. _announce-eos_pytorch25: Announcing End of Support for PyTorch NeuronX version 2.5 starting next release --------------------------------------------------------------------------------- :ref:`Neuron release 2.24 ` will be the last release that will include support for PyTorch NeuronX version 2.5. Future Neuron releases will not include support for PyTorch NeuronX version 2.5. Current users of PyTorch NeuronX version 2.5 are advised to upgrade to PyTorch NeuronX 2.6 or 2.7. Please see release artifacts for more details on supported versions. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eos-pt2-6.rst ================================================ .. post:: September 18, 2025 :language: en :tags: announce-eos-pt2-6 .. _announce-eos_pt2-6: Announcing End of Support for PyTorch NeuronX version 2.6 starting next release --------------------------------------------------------------------------------- :ref:`Neuron release 2.26 ` will be the last release that will include support for PyTorch NeuronX version 2.6. Future Neuron releases will not include support for PyTorch NeuronX version 2.6. Current users of PyTorch NeuronX version 2.6 are advised to upgrade to PyTorch NeuronX 2.7 or 2.8. See :ref:`Neuron release artifacts ` for more details on supported versions. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eos-tensorflow-tutorial-inf.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-eos-tensorflow-tutorial .. _announce-eos-tensorflow-tutorial: Announcing End of Support for Tensorflow Neuron Inf1 SSD300 tutorial starting next release -------------------------------------------------------------------------------------------- :ref:`Neuron release 2.24 ` will be the last release that will include support for :ref:`Tensorflow Neuron Inf1 SSD300 ` tutorial. Future Neuron releases will not include support for :ref:`Tensorflow Neuron Inf1 SSD300 ` tutorial due to security issues. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-eos-tnx.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-eos-tnx .. _announce-eos-tnx: Announcing end of support for Transformers NeuronX library starting in Neuron 2.26 release -------------------------------------------------------------------------------------------- Starting from :ref:`Neuron Release 2.24 `, Transformers NeuronX library is in maintenance mode. ``transformers-neuronx`` releases will now only address critical security issues. In Neuron Release 2.26, Neuron will end support for ``transformers-neuronx``. Current users of ``transformers-neuronx`` are advised to migrate to :ref:`NeuronX Distributed Inference `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-intent-maintenance-tnx.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-transformers-neuronx-maintenance, tnx .. _announce-tnx-maintenance: Announcing maintenance mode for Transformers NeuronX library starting next release ------------------------------------------------------------------------------------ Starting from Neuron release 2.24, Transformers NeuronX library is entering maintenance mode. Future releases of ``transformers-neuronx`` will address critical security issues only and we will gradually end support. Current users of ``transformers-neuronx`` are advised to migrate to :ref:`NeuronX Distributed Inference `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-maintenance-mxnet.rst ================================================ .. post:: June 28, 2024 :language: en :tags: announce-mxnet-maintenance, mxnet .. _announce-mxnet-maintenance: Neuron support for MxNet enters maintenance mode --------------------------------------------------- Starting with :ref:`Neuron release 2.19 `, Neuron support for MxNet (``mxnet-neuron``) is entering maintenance mode. Future releases of ``mxnet-neuron`` will address critical security issues only and we will gradually end support. Current users of ``mxnet-neuron`` are advised to migrate to PyTorch NeuronX or TensorFlow NeuronX. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-maintenance-nxdi-nxd-core-inference.rst ================================================ .. post:: March 31, 2026 :language: en :tags: announce-maintenance-nxdi .. _announce-maintenance-nxdi-nxd-core-inference: Announcing maintenance mode for NxD Inference and NxD Core Inference APIs starting next release ----------------------------------------------------------------------------------------------- Starting with Neuron 2.30.0, NxD Inference library and NxD Core Inference APIs are entering maintenance mode. Future releases will address critical security issues only and we will gradually end support. We are actively investing in an enhanced vLLM Neuron plugin that will not require a separate NxD Inference library. More information about the vLLM Neuron plugin enhancements and migration guidance will be shared in the upcoming release. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-maintenance-nxdt-nxd-core-training.rst ================================================ .. post:: March 31, 2026 :language: en :tags: announce-maintenance-nxdt .. _announce-maintenance-nxdt-nxd-core-training: Announcing maintenance mode for NxDT and NxD Core Training APIs starting next release ------------------------------------------------------------------------------------- Starting with Neuron 2.30.0, NxDT and NxD Core Training APIs are entering maintenance mode. Future releases will address critical security issues only and we will gradually end support. How does this impact you? ~~~~~~~~~~~~~~~~~~~~~~~~~ Existing NxDT/NxD Core users should stay on Neuron 2.28 and PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.30.0 and PyTorch 2.10. A migration guide will be published in a coming release. See :doc:`/frameworks/torch/pytorch-native-overview` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-maintenance-tf.rst ================================================ .. post:: April 1, 2024 :language: en :tags: announce-tensorflow-maintenance, tf-versions .. _announce-tfx-maintenance: Tensorflow-Neuron 1.x enters maintenance mode ----------------------------------------------- Starting with :ref:`Neuron release 2.18 `, Tensorflow-Neuron 1.x is entering maintenance mode. Future releases of Tensorflow-Neuron 1.x will address critical security issues only and we will gradually end support. Current users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-moving-samples.rst ================================================ .. post:: December 20, 2023 :language: en :tags: announce-moving-nxd-samples, nxd-samples .. _announce-moving-samples: Announcing end-of-support for NeuronX Distributed Training Samples in Neuron Samples Repository ------------------------------------------------------------------------------------------------ :ref:`Neuron release 2.16 ` will be the last release to include support for NeuronX Distributed Training Samples (Llama-2, GPT-NeoX 20B, and GPT-NeoX 6.9B) under the `AWS Neuron Samples Github repository `_. In future releases, NeuronX Distributed samples will reside under the `NeuronX Distributed Github repository `_. Current users are advised to start using samples under the NeuronX Distributed repository for all NeuronX Distributed tutorials. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-nki-library-namespace-changes-2-28.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-nki-library-changes .. _announce-nki-library-namespace-changes-2-28: NKI Library namespace changes starting with Neuron 2.28 -------------------------------------------------------- Starting with Neuron 2.28, the open source repository namespace has changed from ``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between the open source repository and the shipped version. If customers want to add or modify NKI Library kernels, they can build and install them to replace the default implementation without changing model imports. See :ref:`NKI Library ` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-nki-namespace-migration.rst ================================================ .. post:: March 31, 2026 :language: en :tags: announce-nki-namespace .. _announce-nki-namespace-migration: Announcing NKI Library Kernel Migration to New nki.* Namespace starting Neuron 2.29 ------------------------------------------------------------------------------------ Starting with Neuron 2.29, all NKI Library kernels are migrated to the new ``nki.*`` namespace. The new ``nki.*`` namespace introduces changes to NKI APIs and language constructs that improve usability and performance. This transition ensures consistency across all NKI kernels and allows us to focus development efforts on a single, modern namespace. See the :doc:`/nki/deep-dives/nki-migration-guide` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-neuron-det.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-no-longer-support-neuron-det .. _announce-no-longer-support-neuron-det: Neuron no longer includes support for Neuron DET tool starting with this release --------------------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.22 `, Neuron no longer supports the Neuron Distributed Event Tracing (NDET/neuron-det) tool. We recommend customers transition to the Neuron Profiler 2.0 for debugging runtime hangs and issues in large-scale settings. This tool offers the same runtime function level traces with improved ease of use and optimized performance. For more information about the Neuron Profiler 2.0, see :ref:`neuron-profiler-2-0-guide`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-nxd-examples.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-eol-nxd-examples .. _announce-eol-nxd-examples: Announcing migration of NxD Core inference examples from NxD Core repository to NxD Inference repository starting this release ================================================================================================================================== Starting with :ref:`Neuron Release 2.23 `, the following models and modules in NxD Core inference examples are now only available through NxD Inference package: - Llama - Mixtral - DBRX I currently utilize one of the mentioned inference samples from the NxD Core repository in my model code. What do I do? ------------------------------------------------------------------------------------------------------------------------ For customers who want to deploy models out of the box, please use the NxD Inference model hub, which is the recommended option. With NxD Inference, you can import and use these models and modules in your applications. Customers will need to update their applications to use examples under the NxD Inference repository: https://github.com/aws-neuron/neuronx-distributed-inference. Any models compiled with inference code from the NxD Core repository will need to be re-compiled. Please refer to the :ref:`nxd-examples-migration-guide` for guidance and see :ref:`nxdi-overview` for more information. I would like to continue using NxD Core. What do I do? -------------------------------------------------------- For customers who want to continue using NxD Core without NxD Inference, please refer to the Llama3.2 1B sample as a reference implementation: https://github.com/aws-neuron/neuronx-distributed/tree/main/examples/inference/llama ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-113.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-no-longer-support-pytorch-version .. _announce-no-longer-support-pytorch-113: Neuron no longer supports PyTorch 1.13 starting this release ------------------------------------------------------------- Starting with :ref:`Neuron Release 2.22 `, Neuron no longer supports PyTorch 1.13, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS) for Trn1, Trn2, and Inf2 instances. We recommend that all customers using PyTorch 1.13, related DLCs, and DLAMIS on Trn2, Trn1, and Inf2 instances upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`. Please note that PyTorch 1.13 will continue to be supported for Inf1 instances. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-1.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-no-longer-support-pytorch-version .. _announce-no-longer-support-pytorch-2-1: Neuron no longer supports PyTorch 2.1 starting this release ------------------------------------------------------------ Starting with :ref:`Neuron Release 2.22 `, Neuron no longer includes support for PyTorch 2.1, its associated Deep Learning Containers (DLCs), and Deep Learning AMIs (DLAMIS). We recommend that all customers using PyTorch 2.1, related DLCs, and DLAMIS upgrade to the latest supported PyTorch version. For more information on supported versions, please refer to :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-7-2-8.rst ================================================ .. post:: March 30, 2026 :language: en :tags: announce-no-longer-support-pytorch-version .. _announce-no-longer-support-pytorch-2-7-2-8: Neuron no longer supports PyTorch versions 2.7 and 2.8 starting with Neuron 2.29 ---------------------------------------------------------------------------------- Starting with Neuron 2.29, Neuron no longer supports PyTorch versions 2.7 and 2.8. We recommend that all customers upgrade to the latest supported PyTorch version. Customers currently using PyTorch versions 2.7 and 2.8 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-tensorflow-inf2.rst ================================================ .. post:: March 30, 2026 :language: en :tags: announce-no-longer-support-tensorflow .. _announce-no-longer-support-tensorflow-inf2: Neuron no longer supports TensorFlow for Inferentia2 (Inf2) starting with Neuron 2.29 --------------------------------------------------------------------------------------- Starting with Neuron 2.29, Neuron no longer supports TensorFlow for Inferentia2 (Inf2). Current Inf2 instance users are advised to use the latest PyTorch version 2.9. For a list of supported PyTorch versions, see :doc:`/release-notes/releasecontent`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-longer-support-u20-dlc-dlami.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-u20-dlami-dlc-no-longer-support .. _announce-u20-dlami-dlc-eos: Neuron no longer includes support for Ubuntu20 DLCs and DLAMIs starting this release ------------------------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.22 `, Neuron no longer includes offerings for Ubuntu20 Deep Learning Containers (DLCs) and Deep Learning AMIs (DLAMIs). Customers using Ubuntu20 DLCs and DLAMIs should migrate to newer versions based on Ubuntu22 or Amazon Linux 2023. For customers who need to continue using Ubuntu20, you can create custom AMIs based on the Ubuntu20 base image and install Neuron components manually. Please see :ref:`container-faq` and :ref:`neuron-dlami-overview`. Please note that this does not affect support for the base Ubuntu20 operating system, which will continue to receive updates as per our standard support policy. For more information, please see :ref:`sdk-maintenance-policy` ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-al2.rst ================================================ .. post:: September 16, 2024 :language: en :tags: end-support-al2 .. _eos-al2: Neuron Runtime no longer supports Amazon Linux 2 (AL2) ======================================================== Starting from :ref:`Neuron release 2.20 `, the Neuron Runtime (``aws-neuronx-runtime-lib``) will no longer support Amazon Linux 2 (AL2). The Neuron Driver (``aws-neuronx-dkms``) is now the only Neuron package that supports Amazon Linux 2. However, the Neuron Driver requires Linux kernel 5.10 or higher. Since default AL2 AMIs ship with kernel 4.14, you must upgrade your AL2 kernel to 5.10+ before installing driver versions 2.18 and later, or migrate to Amazon Linux 2023 or Ubuntu which include compatible kernels by default. This change introduces the following constraint: Customers cannot run their full Neuron-powered applications natively on an AL2-based Amazon Machine Image (AMI). To leverage Neuron functionality on an AL2 AMI, customers must containerize their applications using a Neuron supported container with non-AL2 Linux distribution (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) and then deploy those containers on an AL2-based AMI that has the Neuron Driver (``aws-neuronx-dkms``) installed. How does this impact me? ------------------------ **I have an AL2 DLAMI** If you are using one of the following Amazon Linux 2 DLAMIs, please migrate to a supported DLAMI (e.g., Ubuntu 22.04, Amazon Linux 2023 (AL2023), etc.). Please see :ref:`neuron-dlami-overview` for a list of all supported DLAMIs to migrate to. +-----------------+------------------+-----------------------------------------------------------+ | Framework | Operating System | DLAMI Name | +=================+==================+===========================================================+ | PyTorch 1.13 | Amazon Linux 2 | Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) | +-----------------+------------------+-----------------------------------------------------------+ | TensorFlow 2.10 | Amazon Linux 2 | Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) | +-----------------+------------------+-----------------------------------------------------------+ **I am using my own AL2 Container** If you using your own AL2 Container, please migrate to a Neuron supported container with non-AL2 Linux distribution (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) **I am using a base AL2 DLAMI** If you are using a base Amazon Linux 2 DLAMI, please ensure the Neuron Driver (``aws-neuronx-dkms``) is the only Neuron package installed. Please use non AL2 (e.g., Ubuntu 22.04, Amazon Linux 2023, etc.) containers to run your Neuron applications. .. note:: Neuron does not supports Linux kernel versions < 5.10. Customers using Linux kernel versions < 5.10 must migrate to >= 5.10. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-device-version.rst ================================================ .. post:: September 16, 2024 :language: en :tags: eos-neuron-device, neuron-device- .. _eos-neurondevice: 'neurondevice' resource name in Neuron Device K8s plugin no longer supported ------------------------------------------------------------------------------ Starting with :ref:`Neuron release 2.20 `, Neuron no longer supports resource name 'neurondevice'. Neuron device plugin is a Neuron Software component that gets installed in Kubernetes environment. The resource name 'neurondevice' enables customers to allocate devices to the Neuron K8s container. In this release, we renamed resource name 'neurondevice' to 'neuron' to maintain consistency. Customers who are using the resource name 'neurondevice' in their YAML file need to update to use 'neuron'. Please see :ref:`k8s-neuron-device-plugin` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-jax-neuronx-nki-call.rst ================================================ .. post:: May 15, 2025 :language: en :tags: .. _announce-eos-jax-neuronx-features: Neuron no longer supports ``jax_neuronx.nki_call`` API in ``jax-neuronx`` starting this release ------------------------------------------------------------------------------------------------- :ref:`Neuron Release 2.23 ` no longer supports ``jax_neuronx.nki_call`` API in ``jax-neuronx`` package. For a full list of features that require ``jax-neuronx``, please see :ref:`jax-neuron-known-issues`. Customers using ``jax_neuronx.nki_call`` API will need to switch invocations to directly call functions annotated with ``@nki.jit``. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-llama3-2-checkpoint.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-no-longer-support-llama-checkpoint .. _announce-no-longer-support-llama-32-meta-checkpoint: Announcing end of support for Llama 3.2 Meta checkpoint --------------------------------------------------------- Starting with :ref:`Neuron Release 2.24 `, the mllama 3.2 Meta checkpoint API is no longer be supported. **I currently use the mllama 3.2 Meta checkpoint in my applications. What do I do?** All previously converted checkpoints will continue to function without disruption. Customers' existing workflows and converted models remain fully operational. For new checkpoint conversions, customers are advised to use the Hugging Face solution which provides equivalent functionality. Hugging Face's official conversion script is available here: `HuggingFace Conversion Script `_ ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-nemo-megatron.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-no-support-nemo-megatron .. _announce-no-support-nemo-megatron: Neuron no longer supports NeMo Megatron starting this release --------------------------------------------------------------- Starting with :ref:`Neuron release 2.23 `, Neuron no longer supports :ref:`NeMo Megatron `. All users of :ref:`nemo-megatron-index` are requested to migrate their training workloads to :ref:`NxD Training `. Please refer to :ref:`Neuron NeMo Megatron to NeuronX Distributed Training Migration Guide ` for guidance. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-neurondevice.rst ================================================ .. post:: September 16, 2024 :language: en :tags: eos-neuron-device-version, neuron-device-version .. _eos-neuron-device-version: 'neuron-device-version' field in neuron-monitor no longer supported -------------------------------------------------------------------- Starting with :ref:`Neuron release 2.20 `, Neuron no longer supports the field 'neuron-device-version' in neuron-monitor. Customers who are using the field 'neuron-device-version' will instead need to use 'instance_type' field in the 'instance_info' section and the 'neuroncore_version' field to obtain neuron device information. Please see :ref:`neuron-monitor-ug` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-nki-jit-torch.rst ================================================ .. post:: June 24, 2025 :language: en :tags: announce-no-longer-support-nki-jit .. _announce-no-longer-support-nki-jit: Neuron no longer supports nki_jit API in PyTorch Neuron starting this release -------------------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.24 `, ``torch_neuronx.nki_jit`` API in ``torch-neuronx`` package is no longer supported. **I currently use nki_jit in my PyTorch models. What do I do?** Customers using ``torch_neuronx.nki_jit`` API are recommended to switch invocations to directly call functions annotated with ``@nki.jit``. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-tensorboard-plugin.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-no-support-tensorboard .. _announce-no-support-tensorboard-plugin: Neuron no longer supports TensorBoard Plugin for Neuron Profiler starting with Neuron 2.28 ------------------------------------------------------------------------------------------- Starting with Neuron 2.28, Neuron no longer supports TensorBoard Plugin for Neuron Profiler. All customers using TensorBoard Plugin to visualize and analyze model performance are recommended to migrate to Neuron Explorer. To start using Neuron Explorer (available since Neuron 2.27) to profile your workloads, please see the :doc:`Neuron Explorer Getting Started guide `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-tensorflow1-x.rst ================================================ .. post:: September 16, 2024 :language: en :tags: no-support-tensorflow-eos, tf-versions-1-x-no-support .. _announce-tfx-no-support: Tensorflow-Neuron 1.x no longer supported ------------------------------------------ Starting with :ref:`Neuron release 2.20 `, Neuron no longer supports Tensorflow-Neuron 1.x. Current users of those versions are advised to migrate to latest tensorflow-neuron version 2.10.1. Please see :ref:`TensorFlow Neuron ` for more details. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-tensorflow2-10.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announce-no-support-tensorflow2-10 .. _announce-no-support-tensorflow2-10: Neuron no longer supports tensorflow_2_10 single framework DLAMI and virtual environment in multi-framework DLAMIs starting with Neuron 2.27 ---------------------------------------------------------------------------------------------------------------------------------------------- Starting with the release of Neuron 2.27.0, the ``tensorflow_2_10`` single framework Deep Learning AMI (DLAMI) and the TensorFlow 2.10 virtual environment in multi-framework DLAMIs are no longer supported. Users are advised to use previously released DLAMIs for TensorFlow 2.10 support, or migrate to newer supported TensorFlow versions. For more information on supported versions, refer to :doc:`the list of current Neuron-supported package and library versions `. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-tf-versions.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-no-support-tensorflow-eos .. _announce-no-support-tensorflow-eos: Neuron no longer supports Tensorflow 2.8 and 2.9 starting this release ----------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.23 `, Neuron no longer supports for TensorFlow-Neuron 2.8 and 2.9 versions. Current users of those versions are advised to migrate to latest TensorFlow version (2.10). For a list of supported versions, please see :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-torch-neuron-versions.rst ================================================ .. post:: December 20, 2024 :language: en :tags: announce-no-support-torch-neuron .. _announce-no-support-torch-neuron: PyTorch Neuron versions 1.9 and 1.10 no longer supported ---------------------------------------------------------- Starting with :ref:`Neuron Release 2.21 `, Neuron no longer supports torch-neuron 1.9 and 1.10 versions. Current users of torch-neuron 1.9 and 1.10 versions are advised to migrate to the latest torch-neuron supported version. Please see :ref:`latest-neuron-release-artifacts`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-ubuntu-20-base.rst ================================================ .. post:: May 15, 2025 :language: en :tags: announce-u20-base-no-support .. _announce-u20-base-no-support: Neuron no longer supports base Ubuntu 20 operating system starting this release -------------------------------------------------------------------------------- :ref:`Neuron Release 2.23 ` no longer includes support for base Ubuntu 20.04 operating system. Customers using Ubuntu 20.04 are required to migrate their workloads to Ubuntu 22.04 or another supported operating system. Please refer to :ref:`neuron-dlami-overview` for guidance on Neuron supported operating systems. For more information on the Neuron operating system support policy, please see :ref:`sdk-maintenance-policy`. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-no-support-vllm-v0.rst ================================================ .. post:: February 26, 2026 :language: en :tags: announce-no-support-vllm .. _announce-no-support-vllm-v0: Neuron no longer supports vLLM V0 starting with Neuron 2.28 ------------------------------------------------------------ Starting with Neuron 2.28 release, vLLM V0 will no longer be supported. This includes the vLLM V0 Neuron forks in the AWS Neuron `upstreaming-to-vllm GitHub repo `__ and vLLM V0-based Neuron Inference Deep Learning Containers. Customers are recommended to use vLLM V1-based inference containers as documented in the :doc:`vLLM V1 user guide `. Additionally, Neuron will be updating existing vLLM-based tutorials to use vLLM V1 in the coming release. See :ref:`vLLM on Neuron ` for more information on vLLM V1 support. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-nxdi-changes.rst ================================================ .. post:: December 19, 2025 :language: en :tags: announce-nxdi-changes .. _announce-nxdi-changes: Announcing changes to NxDI in the upcoming releases ==================================================== As part of our transition to native PyTorch support, we are simplifying NxDI to provide a more streamlined developer experience. **What's changing:** In the upcoming releases, we will introduce NxDI v2 that will not use NxDI ModelBuilder APIs. Instead, it will use ``torch.compile`` for model compilation. We will also simplify the NxDI APIs for modeling to align with native PyTorch primitives. **Timeline and migration:** While we introduce these changes, we will maintain both NxDI v1 and NxDI v2 simultaneously to ensure a smooth migration path for our customers. We will provide detailed migration guidance, timelines, and updated documentation as we approach the transition. More information about the migration path and specific release dates will be shared in the next release (Neuron 2.28). ================================================ FILE: about-neuron/announcements/neuron2.x/announce-package-change.rst ================================================ .. post:: September 16, 2024 :language: en :tags: announce-nxdcore, neuron-component-nxdcore .. _announce-component-name-change-nxdcore: Announcing Name Change for Neuron Component --------------------------------------------- Starting with :ref:`Neuron release 2.20 `, the name of the following Neuron component will change as follows: ======================= ======================= ============================ ================== Package name Current Name New Name Abbreviation ======================= ======================= ============================ ================== neuronx-distributed NeuronX Distributed NeuronX Distributed Core NxD Core ======================= ======================= ============================ ================== ================================================ FILE: about-neuron/announcements/neuron2.x/announce-python38-no-longer-support.rst ================================================ .. post:: April 3, 2025 :language: en :tags: announce-python-version-no-longer-support .. _announce-python-no-longer-support: Neuron no longer includes Python 3.8 support starting this release ------------------------------------------------------------------- Starting with :ref:`Neuron Release 2.22 `, Neuron no longer includes support for Python 3.8 as it has its reached end-of-life status. ========================= How does this impact me? ========================= I currently use Python 3.8. ============================ To avoid security issues and bugs, current users of Python 3.8 are advised to migrate to a Neuron supported Python version (3.9, 3.10, or 3.11) as Neuron no longer supports Python 3.8. For a list of supported Python versions according to Neuron package, please see :ref:`latest-neuron-release-artifacts`. I currently use Ubuntu 20, which has Python 3.8 as the default version. Am I affected? ======================================================================================= Although Python 3.8 is the default version of Ubuntu 20.04, Neuron will continue to support Ubuntu 20.04 until April 2025, due to extended standard support of Python 3.8 in Ubuntu 20. Please see the :ref:`sdk-maintenance-policy` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announce-transition-pytorch-trainium.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announce-transition-pytorch-trainium .. _announce-transition-pytorch-trainium: Announcing Transition to PyTorch Native Support for AWS Trainium in the Next Neuron Release Supporting PyTorch 2.10 ------------------------------------------------------------------------------------------------------------------------ Starting with the introduction of Neuron support for PyTorch 2.10, AWS Neuron will begin a transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 will be the last version based on PyTorch/XLA. What's changing ^^^^^^^^^^^^^^^^ * If you are using PyTorch 2.9, it will be the last version of it that uses the PyTorch/XLA backend in Neuron. * For PyTorch 2.10 and later users, Neuron will provide Native PyTorch support via TorchNeuron. Customers using PyTorch/XLA-based training should migrate to native PyTorch with TorchNeuron, which provides: * Native PyTorch eager execution mode * Standard distributed primitives (DTensor, FSDP, DDP) * ``torch.compile`` support * Compatibility with frameworks like TorchTitan (PyTorch Training Library) For more information about native PyTorch on Neuron and migration guidance, see :doc:`Native PyTorch for AWS Trainium `. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-end-of-support-neuronxcc-nki.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-end-of-support-neuronxcc-nki .. _announcement-end-of-support-neuronxcc-nki: Announcing End of Support for neuronxcc.nki Namespace Starting with Neuron 2.28 -------------------------------------------------------------------------------- Neuron 2.27 will be the last to include support for the neuronxcc.nki.* namespace. Starting with Neuron 2.28, this namespace will no longer be supported. The new ``nki.*`` namespace introduces changes to NKI APIs and language constructs. Existing kernels using ``neuronxcc.nki.*`` must migrate to the new nki.* namespace. A kernel migration guide is available in the Neuron 2.27 documentation. See :doc:`the NKI Kernel Migration Guide
` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-end-of-support-nxdt-nxd-core.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-end-of-support-nxdt-nxd-core .. _announcement-end-of-support-nxdt-nxd-core: Announcing End of Support for NxDT and NxD Core Training APIs Starting with PyTorch 2.10 ----------------------------------------------------------------------------------------- Neuron support for PyTorch 2.9 will be the last to include the NeuronX Distributed Training (NxDT) libraries, NxD Core training APIs, and PyTorch/XLA for training. Starting with Neuron support for PyTorch 2.10, these components will no longer be supported. How does this impact you ^^^^^^^^^^^^^^^^^^^^^^^^^ Existing NxDT/NxD Core users should stay on PyTorch 2.9 until ready to migrate to native PyTorch on Neuron (starting PyTorch 2.10). Customers are recommended to use native PyTorch with standard distributed primitives (DTensor, FSDP, DDP) and TorchTitan starting with Neuron 2.28 and PyTorch 2.10. A migration guide will be published in a coming release. See :doc:`Native PyTorch on Neuron Overview ` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-end-of-support-parallel-model-trace.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-end-of-support-parallel-model-trace .. _announcement-end-of-support-parallel-model-trace: Neuron no longer supports parallel_model_trace API starting with Neuron 2.27 ----------------------------------------------------------------------------- Starting with the Neuron 2.27 release, the :ref:`parallel_model_trace API ` is no longer supported for inference. We introduced the :doc:`Model Builder V2 API ` in Neuron 2.25 as an alternative to the tracing API, and it is now the default API in Neuron for model tracing. Customers can migrate to the Model Builder V2 API by following the reference `Llama-3.2-1B inference sample `__. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-end-of-support-pytorch-2-6.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-end-of-support-pytorch-2-6 .. _announcement-end-of-support-pytorch-2-6: Neuron no longer supports PyTorch 2.6 starting with Neuron 2.27 --------------------------------------------------------------- Starting with Neuron 2.27, Neuron no longer supports PyTorch 2.6. We recommend that all customers using PyTorch 2.6 to upgrade to the latest supported PyTorch version. Customers currently using PyTorch 2.6 must upgrade to a newer supported PyTorch version. For more information on supported versions, refer to :doc:`the list of current Neuron-supported package and library versions `. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-end-of-support-vllm-v0.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-end-of-support-vllm-v0 .. _announcement-end-of-support-vllm-v0: Announcing End of Support for vLLM V0 starting with Neuron 2.28 ---------------------------------------------------------------- Neuron Release 2.27 will be the last release to support vLLM V0. In Neuron 2.27 release, vLLM V1 support is introduced for Neuron using the ``vllm-neuron`` plugin. Review the sources in the `Neuron vLLM GitHub Repository `__. Starting with the Neuron 2.28 release, vLLM V0 will not be supported. Support will be dropped for vLLM V0 Neuron forks of the `upstreaming-to-vllm `__ Neuron GitHub repo, along with vLLM V0-based Neuron Inference Deep Learning Containers. Customers should migrate to vLLM V1 using the :doc:`vLLM V1 user guide `. Customers are recommended to start using vLLM V1 based inference containers that are released with Neuron v2.27.0. We plan to update the existing vLLM-based tutorials to use vLLM V1 in the coming release. See :doc:`vLLM on Neuron ` for more information on vLLM V1. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-nki-library-kernel-migration.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-nki-library-kernel-migration .. _announcement-nki-library-kernel-migration: Announcing NKI Library Kernel Migration to New nki.* Namespace in Neuron 2.28 ------------------------------------------------------------------------------ Some NKI Library kernels currently use the legacy ``neuronxcc.nki.*`` namespace. Starting with Neuron 2.28, all NKI Library kernels will migrate to the new ``nki.*`` namespace. The new ``nki.*`` namespace introduces changes to NKI APIs and language constructs that improve usability and performance. This transition ensures consistency across all NKI kernels and allows us to focus development efforts on a single, modern namespace. See :doc:`the NKI Kernel Migration Guide ` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-nki-library-namespace-changes.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-nki-library-namespace-changes .. _announcement-nki-library-namespace-changes: Announcing NKI Library Namespace Changes in Neuron 2.28 -------------------------------------------------------- NKI Library kernels are published in the `NKI Library GitHub repository `__. In Neuron 2.27, these kernels are also shipped as part of neuronx-cc using the nkilib.* namespace. To avoid namespace conflicts when customers use kernels from the open source repository, the repository uses the ``nkilib_standalone.nkilib.*`` namespace. Starting with Neuron 2.28 the open source repository namespace will change from ``nkilib_standalone.nkilib.*`` to ``nkilib.*``, providing a consistent namespace between the open source repository and the shipped version. See :doc:`NKI Library ` for more information. ================================================ FILE: about-neuron/announcements/neuron2.x/announcement-python-3-9-eol.rst ================================================ .. post:: December 16, 2025 :language: en :tags: announcement-python-3-9-eol .. _announcement-python-3-9-eol: Neuron no longer supports Python 3.9 starting with Neuron version 2.27 ----------------------------------------------------------------------- Starting with Neuron Release 2.27, Neuron no longer includes support for Python 3.9 as it has reached its end-of-life status. If you currently use Python 3.9, you are advised to migrate to a Neuron supported Python version (3.10, 3.11 or 3.12) to avoid security issues and bugs. For a list of supported Python versions according to Neuron package, refer to :doc:`the list of current Neuron-supported package and library versions `. ================================================ FILE: about-neuron/announcements/neuron2.x/dlami-neuron-2.10.rst ================================================ .. post:: May 02, 2023 11:00 :language: en :tags: dlami, pytorch, trn1, inf2, inf1 .. _announce-dlc-sm-neuron-2.9.1: AWS Deep Learning AMIs now available with Neuron 2.10 version ------------------------------------------------------------- We are happy to announce that the following Deep Learning AMIs are now available with latest Neuron Version 2.10. These DLAMIs now support all the Neuron EC2 instances including Inf1, Inf2, Trn1/Trn1n. You can access the AMIs at the following URLs * `AWS Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) `__ * `AWS Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) `__ * `AWS Deep Learning AMI Base Neuron (Ubuntu 20.04) `__ * `AWS Deep Learning AMI Base Neuron (Amazon Linux 2) `__ ================================================ FILE: about-neuron/announcements/neuron2.x/dlami-neuron-2.12.rst ================================================ .. post:: July 26, 2023 11:00 :language: en :tags: dlami, pytorch, trn1, inf2, inf1 .. _announce-dlami-neuron-2.12: AWS Deep Learning AMIs now available with Neuron 2.12 version ------------------------------------------------------------- We are happy to announce that the following Deep Learning AMIs are now available with latest Neuron Version 2.12. You can see more about the AMIs at the following URLs * `AWS Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04) `__ * `AWS Deep Learning AMI Neuron PyTorch 1.13 (Amazon Linux 2) `__ * `AWS Deep Learning AMI Neuron TensorFlow 2.10 (Ubuntu 20.04) `__ * `AWS Deep Learning AMI Neuron TensorFlow 2.10 (Amazon Linux 2) `__ * `AWS Deep Learning AMI Base Neuron (Ubuntu 20.04) `__ * `AWS Deep Learning AMI Base Neuron (Amazon Linux 2) `__ ================================================ FILE: about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.rst ================================================ .. post:: Nov 02, 2022 00:01 :language: en :tags: dlami, pytorch .. _announce-dlami-neuron-pytorch: Introducing AWS Deep Learning AMI Neuron PyTorch ------------------------------------------------ We are happy to announce that Deep Learning AMI (DLAMI) with pre-installed PyTorch Neuron (``torch-neuronx``) is now available, for more information see: * `AWS Deep Learning AMI Neuron PyTorch 1.11 \(Amazon Linux 2\) `_ * `AWS Deep Learning AMI Neuron PyTorch 1.11 \(Ubuntu 20.04\) `_ The Neuron Setup Guide will be updated soon to include the DLAMI PyTorch Neuron. ================================================ FILE: about-neuron/announcements/neuron2.x/end-of-support-pt2.rst ================================================ .. post:: February 2, 2024 :language: en :tags: eos-pt-two, pt-two .. _eos_pytorch2: PyTorch NeuronX version 2.0 (Beta) no longer supported ------------------------------------------------------- :ref:`Neuron release 2.17 ` no longer supports PyTorch NeuronX version 2.0 (Beta). Current users of PyTorch NeuronX version 2.0 are advised to migrate to PyTorch NeuronX 2.1 (Beta). ================================================ FILE: about-neuron/announcements/neuron2.x/github-changes.rst ================================================ .. post:: Oct 10, 2022 02:00 :language: en :tags: github .. _announce-aws-neuron-github-org: Introducing New Neuron GitHub Repositories ------------------------------------------ Starting with Neuron release 2.3, Neuron Github repositories will be migrated to the new `AWS Neuron GitHub Organization `_. The new AWS Neuron GitHub Organization will include the `Neuron SDK GitHub `_ repository and will include the following additional new GitHub repositories: .. list-table:: AWS Neuron GitHub Organization :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - New GitHub repository - Description * - `AWS Neuron Samples `_ - Repository that hosts examples and scripts used in the Neuron documentation tutorials * - `AWS Neuron Reference for Megatron-LM `_ - Repository that hosts Neuron support for Megatron-LM * - `AWS Neuron Samples for AWS ParallelCluster `_ - Repository that hosts Neuron support for AWS ParallelCluster ================================================ FILE: about-neuron/announcements/neuron2.x/gpg-expiration.rst ================================================ .. post:: Nov 10, 2022 00:01 :language: en :tags: dlami, pytorch .. _announce-dlami-neuron-pytorch: Neuron GPG key for Ubuntu installation has expired -------------------------------------------------- GPG, or GNU Privacy Guard, is a public key cryptography implementation. This allows for the secure transmission of information between parties and can be used to verify that the origin of a message is genuine. The GPG key for the Neuron repository (https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB) is installed on the Ubuntu (Canonical) server, the key was uploaded originally with an expiry date of three (3) years, which has expired on 11/10/22. Please see :ref:`gpg_key_update` for instructions how to update the Neuron repository GPG keys. ================================================ FILE: about-neuron/announcements/neuron2.x/neuron-rtd-eol.rst ================================================ .. post:: Oct 10, 2022 01:00 :language: en :tags: eol, neuron2.x .. _announce-neuron-rtd-eol: Announcing Neuron Runtime 1.x (``neuron-rtd``) end-of-support ------------------------------------------------------------- Starting with Neuron release 2.3, Neuron components like Neuron System Tools and Neuron Driver will no longer support Neuron Runtime 1.x. In addition, starting with Neuron release 2.3, the `AWS Neuron Runtime Proto GitHub `_ and `AWS Neuron Driver GitHub `_ repositories will no longer be supported. Why are we removing support for Neuron Runtime 1.x? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron Runtime 1.x (``neuron-rtd``) entered :ref:`maintenance mode ` when Neuron 1.16.0 was released. While Neuron components like Neuron Driver and Neuron System Tools continued to support Neuron Runtime 1.x in addition to supporting Neuron Runtime 2.x, Neuron supported frameworks (e.g. PyTorch Neuron, TensorFlow Neuron, and MXNet Neuron) stopped supporting Neuron Runtime 1.x starting with Neuron 1.16.0. For detailed information see :ref:`introduce-libnrt`. ================================================ FILE: about-neuron/announcements/neuron2.x/neuron2-intro.rst ================================================ .. post:: Oct 10, 2022 04:00 :language: en :tags: neuron2.x .. _neuron2-intro: Introducing the first release of Neuron 2.x enabling EC2 Trn1 General Availability (GA) ======================================================================================= Neuron release 2.3 is the first release of Neuron 2.x that enables GA of the new EC2 Trn1 instances. Neuron release 2.3 extends the latest release of Neuron 1.x (Neuron 1.19.2), adding support for Deep Learning training on the AWS Trainium chips. Starting with Neuron release 2.3, developers can run Deep Learning training workloads on Trn1 instances, saving training costs by up to 50% over equivalent GPU-based EC2 instances, while achieving the highest training performance in the AWS cloud for popular NLP models. Neuron 2.x introduces new capabilities and major architectural updates to support training neural-networks with the Trn1 instances. In addition, starting with this release, Neuron introduces new packages, renames several packages, and updates Neuron installation and update instructions. This release also ends support for Neuron Runtime 1.x. More about the release ---------------------- .. include:: /release-notes/templates/n2.x-trn1-ga-quick.txt ================================================ FILE: about-neuron/announcements/neuron2.x/neuron230-packages-changes.rst ================================================ .. post:: Oct 10, 2022 03:00 :language: en :tags: neuron2.x .. _neuron-packages-changes: Introducing Packaging and installation changes ---------------------------------------------- Starting with Neuron release 2.3, Neuron introduces changes in Neuron packages and installation instructions. .. contents:: Table of contents :local: :depth: 2 .. _neuron-new-packages: New Neuron packages ^^^^^^^^^^^^^^^^^^^ Starting with Neuron release 2.3, Neuron introduces the following new packages: .. list-table:: New Neuron packages :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - New Package - Package Type - Description - Supported Instances (At the time of releasing Neuron release 2.3) * - ``torch-neuronx`` - .whl (pip) - PyTorch Neuron package using `PyTorch XLA `_ - Trn1 * - ``neuronx-cc`` - .whl (pip) - Neuron Compiler with XLA front-end - Trn1 * - ``aws-neuronx-runtime-lib`` - .deb (apt), .rpm (yum) - Neuron Runtime library - Trn1 * - ``aws-neuronx-collective`` - .deb (apt), .rpm (yum) - Collective Communication library - Trn1 * - ``aws-neuronx-tools`` - .deb (apt), .rpm (yum) - Neuron System Tools - Trn1 .. note:: In next releases ``aws-neuronx-tools`` and ``aws-neuronx-runtime-lib`` will add support for Inf1. Why are we introducing new Neuron packages? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To add Neuron support for training neural-networks, Neuron 2.x introduces new capabilities and major architectural updates. For example, Neuron adds support for Collective Communication Operations, in :ref:`new packages ` such as ``aws-neuron-collective``. In addition, some of those updates and new capabilities are not backward compatible, for example the Pytorch Neuron package that adds support for training neural-networks uses `PyTorch XLA `_ as a backend. To reduce the possibility of customers using features that are not backward compatible, the new capabilities are introduced in new Neuron packages. For example, PyTorch Neuron and Neuron Compiler will support different packages for Inf1 and for Trn1: ``torch-neuron`` and ``neuron-cc`` will support Inf1 instances, and ``torch-neuronx`` and ``neuronx-cc`` will support Trn1 instances. .. _neuron-packages-renaming: Renamed Neuron Packages ^^^^^^^^^^^^^^^^^^^^^^^ Starting with Neuron release 2.3, the following Neuron packages will change names: .. list-table:: Neuron package with changed names :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - New name - Old name (deprecated package) - Package Type - Description - Supported Instances * - ``aws-neuronx-oci-hooks`` - ``aws-neuron-runtime-base`` - .deb (apt), .rpm (yum) - OCI Hooks support - Trn1, Inf1 * - ``aws-neuronx-dkms`` - ``aws-neuron-dkms`` - .deb (apt), .rpm (yum) - Neuron Driver - Trn1, Inf1 * - ``aws-neuronx-k8-plugin`` - ``aws-neuron-k8-plugin`` - .deb (apt), .rpm (yum) - Neuron Kubernetes plugin - Trn1, Inf1 * - ``aws-neuronx-k8-scheduler`` - ``aws-neuron-k8-scheduler`` - .deb (apt), .rpm (yum) - Neuron Scheduler plugin - Trn1, Inf1 Why are we changing package names? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To avoid situations where customers may accidentally install Neuron packages with features that are not backward compatible, we have introduced additional packages with different names for the same Neuron component. .. _neuron-installation-instruction-change: Updated installation and update instructions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Starting with Neuron release 2.3, Neuron installation and update instructions will include pinning of the major version of the Neuron package. For example, to install latest Neuron tools package, call ``sudo apt-get install aws-neuronx-tools=2.*`` and to install latest PyTorch Neuron package for Trn1, call ``pip install torch-neuronx==1.11.0.1.*``. Why are we changing installation and update instructions? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Neuron installation and update instructions now guide customers to pin the major version of the different Neuron packages as mentioned in :ref:`neuron-installation-instruction-change`. This is done to future-proof instructions for new, backwards-incompatible major version releases. .. note:: The change of the installation and update instructions will not include instruction to install or update ``torch-neuron`` and ``neuron-cc``. What do I need to do? ~~~~~~~~~~~~~~~~~~~~~ Please follow the :ref:`Neuron setup guide ` to update to latest Neuron releases. ================================================ FILE: about-neuron/announcements/neuron2.x/neuron250-packages-changes.rst ================================================ .. post:: Nov 22, 2022 03:00 :language: en :tags: neuron2.x .. _neuron250-packages-changes: Introducing Neuron packaging and installation changes for Inf1 customers ------------------------------------------------------------------------ Starting with :ref:`Neuron release 2.5 `, Neuron introduces changes in Neuron packages and installation instructions for Inf1, the following Neuron packages will change names: .. list-table:: Neuron package with changed names for Inf1 :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - New name - Old name (deprecated package) - Package Type - Description - Supported Instances * - ``aws-neuronx-tools`` - ``aws-neuron-tools`` - .deb (apt), .rpm (yum) - System Tools - Trn1, Inf1 * - ``aws-neuronx-dkms`` - ``aws-neuron-dkms`` - .deb (apt), .rpm (yum) - Neuron Driver - Trn1, Inf1 * - ``aws-neuronx-k8-plugin`` - ``aws-neuron-k8-plugin`` - .deb (apt), .rpm (yum) - Neuron Kubernetes plugin - Trn1, Inf1 * - ``aws-neuronx-k8-scheduler`` - ``aws-neuron-k8-scheduler`` - .deb (apt), .rpm (yum) - Neuron Scheduler plugin - Trn1, Inf1 * - ``tensorflow-model-server-neuronx`` - ``tensorflow-model-server-neuron`` - .deb (apt), .rpm (yum) - tensorflow-model-server - Trn1, Inf1 Please follow the :ref:`Neuron setup guide ` to update to latest Neuron releases. ================================================ FILE: about-neuron/announcements/neuron2.x/release-neuron2.4.rst ================================================ ================================================ FILE: about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1.rst ================================================ .. post:: Apr 26, 2023 11:00 :language: en :tags: sagemaker, pytorch, trn1, inf2 .. _announce-dlc-sm-neuron-2.9.1: PyTorch 1.13 Deep Learning Container for Inf2 & Trn1/Trn1n now available for SageMaker -------------------------------------------------------------------------------------- We are happy to announce that an updated Deep Learning Container that supports PyTorch 1.13 and Neuron 2.9.1 versions is now available for Sagemaker Training. For more information see `Neuron Containers `_ ================================================ FILE: about-neuron/announcements/neuron2.x/sm-training-trn1-introduce.rst ================================================ .. post:: Nov 03, 2022 00:01 :language: en :tags: sagemaker, pytorch, trn1 .. _announce-dlami-neuron-pytorch: Amazon SageMaker now supports Trn1 training jobs ------------------------------------------------ We are happy to announce that Amazon SageMaker now supports running training jobs on ml.trn1 instance types. For more information see `Distributed Training with PyTorch Neuron on Trn1 instances `_ The Neuron Developer Flows section will be updated soon. ================================================ FILE: about-neuron/appnotes/index.rst ================================================ .. _neuron-appnotes-index: .. _neuron-appnotes: .. meta:: :description: AWS Neuron SDK application notes for support announcements, performance optimization, migration guides, and framework-specific implementations. :date-modified: 2025-10-03 Neuron application notes ======================== .. toctree:: :maxdepth: 2 :hidden: Neuron Runtime Library Performance Parallel execution PyTorch for Neuron PyTorch for NeuronX Application notes provide specific documentation for support announcements, migration guides, performance optimization techniques, and framework-specific implementations for AWS Neuron SDK components. Framework integration --------------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: torch-neuron-r-cnn-app-note :link-type: ref **PyTorch Neuron (Inf1)** ^^^ R-CNN implementation and optimization techniques for PyTorch on ``Inf1`` .. grid-item-card:: :link: torch-neuronx-graph-partitioner-app-note :link-type: ref **PyTorch NeuronX Graph Partitioner** ^^^ Advanced graph partitioning strategies for distributed training and inference .. grid-item-card:: :link: torch-neuronx-dataparallel-app-note :link-type: ref **Data Parallel Inference on Torch NeuronX** ^^^ Guide to using ``torch.neuronx.DataParallel`` for scalable inference on ``Inf1`` .. grid-item-card:: :link: torch-neuron-dataparallel-app-note :link-type: ref **Data Parallel Inference on Torch Neuron** ^^^ Guide to using ``torch.neuron.DataParallel`` for scalable inference on ``Inf1`` .. grid-item-card:: :link: migration_from_xla_downcast_bf16 :link-type: ref **Migrate from XLA_USE_BF16/XLA_DOWNCAST_BF16** ^^^ Guide to migrating from deprecated XLA environment variables to recommended PyTorch mixed-precision options on NeuronX .. grid-item-card:: :link: introduce-pytorch-2-9 :link-type: ref **PyTorch 2.9 Support** ^^^ New features and migration guide for PyTorch 2.9 on Neuron ================================================ FILE: about-neuron/appnotes/mxnet-neuron/flex-eg.rst ================================================ .. _flexeg: Flexible Execution Group (FlexEG) in Neuron-MXNet ================================================= Introduction ------------ Inf1 instances are available with a different number of Inferentia chips, each Inferentia chip is combined of 4 NeuronCores and an Inf1 instance includes 4 to 64 NeuronCores depending on the instance size. With Neuron Runtime 1.x (neuron-rtd server), NeuronCores could be combined into NeuronCore Groups (NCG), which were basic scheduling units of compiled neural network in Neuron. Creation of desired sized NCGs was done at the start of the application and could not be modified afterwards. Starting with Neuron SDK 1.16.0, and with the introduction of Neuron Runtime 2.x, MXNet Neuron 1.8 introduces Flexible Execution Groups (FlexEG) feature. With FlexEG, you do not have to create NCGs at the start of the process, instead you will set the index of the first NeuronCore you want to load models onto, and FlexEG feature will enable the flexibility of loading models onto any available NeuronCore on the inf1 instance starting from the first NeuronCore you set. This guide will show you how to efficiently utilize NeuronCores using FlexEG feature in NeuronMXNet. FlexEG ------ With the introduction of FlexEG, you don’t need to create NCGs and can load models onto a group of consecutive NeuronCores by providing the index of the first NeuronCore in the group. Neuron runtime takes care of figuring out the number of NeuronCores required for the given compiled model and loads the model using the required number of cores (sequentially starting with the NeuronCore index provided by the user). For example, assuming that you have an Inf1.6xl machine and there are 4 models A, B, C, D compiled to 2, 4, 3, and 4 NeuronCores respectively, you can map any model to any core by context ``mx.neuron(neuron_core_index)`` where ``neuron_core_index`` is the NeuronCore index (0,1,2,3,4 … ). In the example below, you map model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)`` context, model C to ``mx.neuron(6)`` context and model D to ``mx.neuron(9)`` context. .. figure:: /images/mx_FlexEG_arch_1.png :scale: 80 % The above configuration is achieved by using application code similar to below: .. code :: python # Load models (MXNet) # loaded into the 2 cores starting with core 0 sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0) model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # loaded into the 4 cores starting with core 2 sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0) model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # loaded into the 3 cores starting with core 6 sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0) model2 = sym.bind(ctx=mx.neuron(6), args=args, aux_states=aux, grad_req='null') # loaded into the 4 cores starting with core 9 sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0) model3 = sym.bind(ctx=mx.neuron(9), args=args, aux_states=aux, grad_req='null') # run inference by simply calling the loaded model results0 = model0.forward(data=inputs0) results1 = model1.forward(data=inputs1) results2 = model2.forward(data=inputs2) results3 = model3.forward(data=inputs3) Since there is no NCG creation at the start of the process, you can load the same four models but in a different configuration by changing the context being used for inference. For example, you could map model C to ``mx.neuron(0)`` context, model A to ``mx.neuron(3)`` context, model D to ``mx.neuron(5)`` context and model B to ``mx.neuron(9)`` context. .. figure:: /images/mx_FlexEG_arch_2.png :scale: 80 % Migration from NeuronCore Groups to FlexEG ------------------------------------------ NeuronCore Groups are defined by setting the environment variable ``NEURONCORE_GROUP_SIZES`` with a comma separated list of number of cores in each group. In this mode of operation, number of devices (defined in ``NEURONCORE_GROUP_SIZES``) are grouped together to create a single entity. ``NEURONCORE_GROUP_SIZES`` environment variable is set at runtime: .. code :: python #!/bin/bash export NEURONCORE_GROUP_SIZES=2,4,3,4 python your_neuron_application.py NeuronCore groups are created once at the start of the application and cannot be modified / re-created till the application process runs. The above flow creates 4 neuron devices with 2,4,3 and 4 devices each. In order to get the same configuration as the example from before , you map model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(1)`` context, model C to ``mx.neuron(2)`` context and model D to ``mx.neuron(3)`` context. .. figure:: /images/mx_FlexEG_arch_1.png :scale: 80 % This can be achieved programmatically as shown below: .. code :: python # Set Environment os.environ['NEURONCORE_GROUP_SIZES']='2,4,3,4' # Load models (MXNet) # loaded into the first group of NC0-NC1 sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0) model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # loaded into the second group of NC2-NC5 sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0) model1 = sym.bind(ctx=mx.neuron(1), args=args, aux_states=aux, grad_req='null') # loaded into the third group of NC6-NC8 sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0) model2 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # loaded into the fourth group of NC9-NC12 sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0) model3 = sym.bind(ctx=mx.neuron(3), args=args, aux_states=aux, grad_req='null') # run inference by simply calling the loaded model results0 = model0.forward(data=inputs0) results1 = model1.forward(data=inputs1) results2 = model2.forward(data=inputs2) results3 = model3.forward(data=inputs3) So comparing to FlexEG, we see that in case of NCGs neuron context requires the index of the execution group, while in FlexEG neuron context requires the NeuronCore index of the first NeuronCore on which the model is supposed to be loaded and executed. For example, with ``NEURONCORE_GROUP_SIZES='2,4,3,4'``, ``ctx=mx.neuron(1)`` loads the model on execution group 1 which effectively loads the model on the 2nd NCG group which has 4 NeuronCores. Best practices when using FlexEG -------------------------------- FlexEG gives the user most flexibility in terms of accessing cores and loading models on specific cores. With this the users can effortlessly load and execute new models on NeuronCores without closing the application. Here we shall outline some of the best practices that should be kept in mind while using FlexEG. Choosing starting core ~~~~~~~~~~~~~~~~~~~~~~ FlexEG tries to use the required number of cores (based on the input model) starting with the core index provided by the user. Incase the system, doesnt have the required number of cores after the starting core index, model load will fail. For example: We have a model X which needs 2 cores and an inf1.xl machine with 4 NeuronCores (NeuronCore indexes are: 0, 1, 2 and 3). As the model needs at least 2 cores, valid start indexes for this model are: 0, 1, 2. However if the user gives 3 as the neuron context, then there are no 2 cores available starting from core 3. So it will fail. Performance vs. Flexibility tradeoff ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While using data parallel model of operation (were models are executed in parallel), for optimal performance the user should make sure that the models are not sharing any cores. That is because NeuronCores can execute one model at a time, when two or more models are executed on the same core (assuming that they are already loaded), it executes the first model, stops it, starts the second model and then executes it. This is called model switiching and involves additional overhead and prevents execution on model in parallel. For example: assuming that you have an Inf1.6xl machine and there are 4 models A, B, C, D compiled to 2, 4, 3, and 4 NeuronCores respectively. Loading model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)`` context, model C to ``mx.neuron(6)`` context and model D to ``mx.neuron(9)`` context is a good configuration because no two models are sharing NeuronCores and thus can be executed in parallel. However, Loading model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)`` context, model C to ``mx.neuron(5)`` context and model D to ``mx.neuron(9)`` context is a not a good configuration as models B and C share NeuronCore 5 and thus cannot be executed in parallel. .. figure:: /images/mx_FlexEG_arch_bad.png :scale: 80 % ================================================ FILE: about-neuron/appnotes/neuron-cc/mixed-precision.rst ================================================ .. _neuron-cc-training-mixed-precision: Mixed precision and performance-accuracy tuning (``neuron-cc``) =============================================================== .. contents:: Table of contents :local: :depth: 2 The Neuron Compiler supports machine learning models with FP32, FP16 and BF16 (Bfloat16) tensors and operators. The Neuron hardware supports a mix of 32 and 16 bit datatypes. The available auto-cast methods and their performance / accuracy trade-offs are explained in this document. Neuron Hardware ------------------- The Neuron hardware supports matrix multiplication using FP16 or BF16 on its Matmult Engine, and accumulations using FP32. Similarly, operators such as activations or vector operations are supported using FP16, BF16 and FP32. Neuron supports tensor transpose in two ways - by fast matrix multiplication in FP16/BF16 or by slower byte-by-byte data movements. Performance-accuracy tradeoffs for models trained in FP32 --------------------------------------------------------- Models that are trained using FP32 data types can be deployed on Neuron through ahead of time compilation using the :ref:`Neuron Compiler `. .. important:: **By default**, the Neuron Compiler disables auto-casting and uses the data types defined within the model. This provides the best accuracy for FP32 trained models, but does not provide the best performance. weights and operations to BF16**. Only partial sums are left in FP32. The default, casting will generate the highest performance for a FP32 trained model. Using the ``--fast-math`` CLI option, you can choose the right tradeoff between performance and accuracy. The tradeoff usually is between achieving high performance or optimal accuracy, and decision what settings to use will be application specific. It is recommended that the you start with compiling the model to achieve the high performance (default), you can then test the accuracy of the application and, if needed, try the next higher precision casting option until the desired accuracy and performance are achieved. A typical flow can be: 1. You can compile without options (default) or with ``--fast-math all`` which will optimize for performance. 2. If accuracy is not sufficient you can try ``--fast-math fp32-cast-matmult`` 3. If accuracy is not sufficient you can try ``--fast-math fp32-cast-matmult no-fast-relayout`` 4. If accuracy is not sufficient you can try ``--fast-math none`` which will optimize for accuracy . Between step 2 and step 3, and between step 3 and step 4 you have additional options that can provide different level of accuracy and which are explained in the below section. Note that compiler has to preserve the input/output (i/o) tensor types requested by Framework, therefore no casting is done on the i/o tensors. Additional speedup can be obtained by casting them in the Framework prior compilation. To learn how to use compiler command line interface (CLI) options with your application's framework, please see :ref:`torch_neuron_trace_api`, :ref:`tensorflow-ref-neuron-compile-api` and :ref:`tensorflow-ref-neuron-tracing-api`. Compiler casting options ------------------------ ``--fast-math`` option ^^^^^^^^^^^^^^^^^^^^^^^^ The ``--fast-math`` option is intended to replace the ``--fp32-cast`` option. It is recommended to to start using or migrating to ``--fast-math`` option. The ``--fast-math`` option provides the same level of functionality as the ``--fp32-cast`` option in addition to the following: * The ``--fast-math`` option introduces the ``no-fast-relayout`` option to enable lossless transpose operation. This was not possible with the ``--fp32-cast`` option. * The ``--fast-math`` option provides finer control than the ``--fp32-cast`` option. The transpose operation and the cast operation are controlled independently: - ``no-fast-relayout`` and ``fast-relayout`` provide control for the transpose operation. - ``fp32-cast-*`` provide control for casting. See the detailed list of the options in :ref:`/compiler/neuron-cc/command-line-reference.rst`. ================================================ FILE: about-neuron/appnotes/neuron1x/important-neuronx-dkms.txt ================================================ .. important :: Starting with Neuron version 2.3, the ``aws-neuron-dkms`` package name has been changed to ``aws-neuronx-dkms``. See :ref:`neuron2-intro` ================================================ FILE: about-neuron/appnotes/neuron1x/introducing-libnrt.rst ================================================ .. _introduce-libnrt: Introducing Neuron Runtime 2.x (libnrt.so) ========================================== .. contents:: Table of contents :local: :depth: 2 What are we changing? --------------------- Starting with the *Neuron 1.16.0* release, *Neuron Runtime 1.x* (``neuron-rtd``) is entering maintenance mode and is being replaced by *Neuron Runtime 2.x*, a shared library named (``libnrt.so``). For more information on Runtime 1.x see :ref:`maintenance_rtd`. Upgrading to ``libnrt.so`` simplifies the Neuron installation and upgrade process, introduces new capabilities for allocating NeuronCores to applications, streamlines container creation, and deprecates tools that are no longer needed. This document describes the capabilities of *Neuron Runtime 2.x* in detail, provides information needed for successful installation and upgrade, and provides information needed for successful upgrade of Neuron applications using *Neuron Runtime 1.x* (included in releases before *Neuron 1.16.0*) to *Neuron Runtime 2.x* (included in releases *Neuron 1.16.0* or newer). .. _introduce-libnrt-why: Why are we making this change? ------------------------------ Before *Neuron 1.16.0*, Neuron Runtime was delivered as a daemon (``neuron-rtd``), and communicated with Neuron framework extensions through a ``gRPC`` interface. ``neuron-rtd`` was packaged as an ``rpm`` or ``debian`` package (``aws-neuron-runtime``) and required a separate installation step. Starting with *Neuron 1.16.0*, *Neuron Runtime 2.x* is delivered as a shared library (``libnrt.so``) and is directly linked to Neuron framework extensions. ``libnrt.so`` is packaged and installed as part of the Neuron framework extensions (e.g. TensorFlow Neuron, PyTorch Neuron or MXNet Neuron), and does not require a separate installation step. Installing Neuron Runtime as part of the Neuron framework extensions simplifies installation and improves the user experience. In addition, since ``libnrt.so`` is directly linked to the Neuron framework extensions, faster communication between the Neuron Runtime and Neuron Frameworks is enabled by eliminating the ``gRPC`` interface overhead. For more information see :ref:`introduce-libnrt-how-sdk` and :ref:`neuron-migrating-apps-neuron-to-libnrt`. .. _libnrt-neuron-cmponents: .. _introduce-libnrt-how-sdk: How will this change affect the Neuron SDK? ------------------------------------------- Neuron Driver ^^^^^^^^^^^^^ Use the latest Neuron Driver. For successful installation and upgrade to *Neuron 1.16.0* or newer, you must install or upgrade to Neuron Driver (``aws-neuron-dkms``) *version 2.1.5.0* or newer. Neuron applications using *Neuron 1.16.0* will fail if they do not detect *Neuron Driver version 2.1.5.0* or newer. For installation and upgrade instructions see :ref:`install-guide-index`. .. include:: ./important-neuronx-dkms.txt To see details of Neuron component versions please see :ref:`latest-neuron-release-artifacts`. .. important :: For successful installation or update to Neuron 1.16.0 and newer from previous releases: * Stop Neuron Runtime 1.x daemon (``neuron-rtd``) by running: ``sudo systemctl stop neuron-rtd`` * Uninstall ``neuron-rtd`` by running: ``sudo apt remove aws-neuron-runtime`` or ``sudo dnf remove aws-neuron-runtime`` * Install or upgrade to the latest Neuron Driver (``aws-neuron-dkms``) by following the :ref:`install-guide-index` instructions. * Starting with Neuron version 2.3, ``aws-neuron-dkms`` the package name has been changed to ``aws-neuronx-dkms``, see :ref:`neuron2-intro` Neuron Runtime ^^^^^^^^^^^^^^ * Installation Starting from *Neuron 1.16.0*, Neuron releases will no longer include the ``aws-neuron-runtime packages`` and Neuron Runtime will be part of the Neuron framework extension of choice (TensorFlow Neuron, PyTorch Neuron or MXNet Neuron). Installing any Neuron framework package will install the Neuron Runtime library (``libnrt.so``). * For installation and upgrade instructions see :ref:`install-guide-index`. * Configuring *Neuron Runtime* Before *Neuron 1.16.0*, *Neuron Runtime 1.x* was configured in configuration files (e.g. /opt/aws/neuron/config/neuron-rtd.config). Starting from *Neuron 1.16.0*, *Neuron Runtime 2.x* can be configured through environment variables. See :ref:`nrt-configuration` for details. * Starting and Stopping *Neuron Runtime* Before introducing ``libnrt.so``, ``neuron-rtd`` ran as a daemon that communicated through a ``gRPC`` interface. Whenever ``neuron-rtd`` took ownership of a Neuron device, it continued owning that device until it was stopped. This created the need to stop ``neuron-rtd`` in certain cases. With the introduction of ``libnrt.so``, *Neuron Runtime* as it runs inside the context of the application. With *Neuron Runtime 2.x*, the act of starting and stopping a Neuron application causes ``libnrt.so`` to automatically claim or release ownership of the required Neuron devices. * NeuronCore Groups (NCG) end-of-support Before the introduction of *Neuron Runtime 2.x*, NeuronCore Group (NCG) was used to define an execution group of one or more NeuronCores where models could be loaded and executed. It also provided separation between processes. With the introduction of *Neuron Runtime 2.x*, strict separation of NeuronCores into groups is no longer necessary and NeuronCore Groups (NCG) has been deprecated. See :ref:`eol-ncg` for more information. * Running multiple *Neuron Runtimes* Before the introduction of ``libnrt.so``, it was necessary to run multiple ``neuron-rtd`` daemons to allocate Neuron devices for each ``neuron-rtd``, using configuration files. After the introduction of ``libnrt.so``, it will no longer necessary to run multiple ``neuron-rtd`` daemons to allocate Neuron devices to a specific Neuron application. With ``libnrt.so`` NeuronCores (A Neuron device includes multiple NeuronCores) are allocated to a particular application by using ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES`` environment variables, for example: .. code :: NEURON_RT_VISIBLE_CORES=0-3 myapp1.py NEURON_RT_VISIBLE_CORES=4-11 myapp2.py Or .. code :: NEURON_RT_NUM_CORES=3 myapp1.py & NEURON_RT_NUM_CORES=4 myapp2.py & See :ref:`nrt-configuration` for details. * Logging Similar to *Neuron Runtime 1.x*, *Neuron Runtime 2.x* logs into syslog (verbose logging). To make debugging easier, *Neuron Runtime 2.x* also logs into the console (error-only logging). Refer to :ref:`nrt-configuration` to see how to increase or decrease logging verbosity. * Multi-process access to NeuronCores With the introduction of ``libnrt.so``, it is no longer possible to load models from multiple processes on the same NeuronCore. A NeuronCore can only be accessed from the same process. Instead you can load models on a specific NeuronCore, using multiple threads from the same process. .. note:: For optimal performance of multi-model execution, each NeuronCore executes a single model. * Neuron Runtime architecture *Neuron Runtime 2.x* is delivered as a shared library (``libnrt.so``) and is directly linked to Neuron framework extensions. ``libnrt.so`` is packaged and installed as part of Neuron framework extensions (e.g. TensorFlow Neuron, PyTorch Neuron, or MXNet Neuron), and does not require a separate installation step. Installing Neuron Runtime as part of the Neuron framework extensions simplifies installation and improves the user experience. In addition, since ``libnrt.so`` is directly linked to Neuron framework extensions, it enables faster communication between Neuron Runtime and Neuron Frameworks by eliminating ``gRPC`` interface overhead. Neuron framework extensions ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Starting from *Neuron 1.16.0*, Neuron framework extensions (TensorFlow Neuron, PyTorch Neuron, or MXNet Neuron) are packaged together with ``libnrt.so``. It is required to install the ``aws-neuron-dkms`` Driver version 2.1.5.0 or newer for proper operation. The ``neuron-rtd`` daemon that was installed in previous releases no longer works starting with Neuron 1.16.0. To see details of Neuron component versions see :ref:`latest-neuron-release-artifacts`. .. :important: Starting Neuron version 2.3, the ``aws-neuron-dkms`` package name is changed to ``aws-neuronx-dkms``, see :ref:`neuron2-intro` TensorFlow model server ^^^^^^^^^^^^^^^^^^^^^^^ Starting from *Neuron 1.16.0*, the TensorFlow Neuron model server is packaged together with ``libnrt.so`` and expects ``aws-neuron-dkms`` *version 2.1.5.0* or newer for proper operation. .. note:: The TensorFlow Neuron model server included in *Neuron 1.16.0* runs from the directory in which it was installed and will not run properly if copied to a different location, due to its dependency on ``libnrt.so``. .. include:: ./important-neuronx-dkms.txt Neuron tools ^^^^^^^^^^^^ * ``neuron-cli`` - Starting from *Neuron 1.16.0*, ``neuron-cli`` enters maintenance mode. See :ref:`maintenance_neuron-cli` for more information. * ``neuron-top`` - Starting from *Neuron 1.16.0*, ``neuron-top`` has a new user interface. See :ref:`neuron-top-ug` for more information. * ``neuron-monitor`` - ``neuron-monitor`` was updated to support Neuron Runtime 2.x (``libnrt.so``) * See :ref:`neuron-monitor-ug` for an updated user guide of ``neuron-monitor``. * See neuron-monitor upgrade notes for a list of changes between *Neuron Monitor 2.x* and *Neuron Monitor 1.0* * See neuron-monitor backward compatibility notes for instructions for using *Neuron Monitor 2.x* with *Neuron Runtime 1.x* (``neuron-rtd``) . .. _introduce-libnrt-how-user: How will this change affect me? ------------------------------- Neuron installation and upgrade ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As explained in ":ref:`libnrt-neuron-cmponents`", starting from *Neuron 1.16.0*, ``libnrt.so`` requires the latest Neuron Driver (``aws-neuron-dkms``). In addition, it is no longer necessary to install ``aws-neuron-runtime``. To install Neuron or to upgrade to latest Neuron version, follow the installation and upgrade instructions below: * PyTorch Neuron * :ref:`install-neuron-pytorch`. * :ref:`update-neuron-pytorch`. * TensorFlow Neuron * :ref:`install-neuron-tensorflow`. * :ref:`update-neuron-tensorflow`. * MXNet Neuron * :ref:`install-neuron-mxnet`. * :ref:`update-neuron-mxnet`. .. include:: ./important-neuronx-dkms.txt .. _neuron-migrating-apps-neuron-to-libnrt: Migrate your application to Neuron Runtime 2.x (libnrt.so) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For a successful migration from previous releases of your application to *Neuron 1.16.0* or newer, make sure you perform the following: #. Prerequisite Read ":ref:`libnrt-neuron-cmponents`". #. Make sure you are not using *Neuron Runtime 1.x* (``aws-neuron-runtime``) * Remove any code that installs ``aws-neuron-runtime`` from any CI/CD scripts. * Stop ``neuron-rtd`` by running ``sudo systemctl stop neuron-rtd`` * Uninstall ``neuron-rtd`` by running ``sudo apt remove aws-neuron-runtime`` or ``sudo dnf remove aws-neuron-runtime`` #. Upgrade to your Neuron Framework of choice: * :ref:`update-neuron-pytorch`. * :ref:`update-neuron-tensorflow`. * :ref:`update-neuron-mxnet`. #. If you have code that starts and/or stops ``neuron-rtd`` Remove any code that starts or stops ``neuron-rtd`` from any CI/CD scripts. #. Application running multiple ``neuron-rtd`` If your application runs multiple processes and requires running multiple ``neuron-rtd`` daemons: * Remove the code that runs multiple ``neuron-rtd`` daemons. * Instead of allocating Neuron devices to ``neuron-rtd`` through configuration files, use ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES`` environment variables to allocate NeuronCores. See :ref:`nrt-configuration` for details. If you application uses ``NEURONCORE_GROUP_SIZES``, see the next item. .. note:: ``NEURON_RT_VISIBLE_CORES`` and ``NEURON_RT_NUM_CORES`` environment variables enable you to allocate NeuronCores to an application. Allocating NeuronCores improves application granularity, because Neuron devices include multiple NeuronCores. #. Application running multiple processes using ``NEURONCORE_GROUP_SIZES`` * Consider using ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES`` environment variables instead of ``NEURONCORE_GROUP_SIZES``, which is being deprecated. See :ref:`nrt-configuration` for details. * If you are using TensorFlow Neuron (``tensorflow-neuron (TF2.x)``) and you are replacing ``NEURONCORE_GROUP_SIZES=AxB`` which enables auto multicore replication, see the new API :ref:`tensorflow-ref-auto-replication-python-api` for usage and documentation. * The behavior of your application will remain the same as before if you do not set ``NEURON_RT_VISIBLE_CORES`` and do not set ``NEURON_RT_NUM_CORES``. * If you are considering migrating to ``NEURON_RT_VISIBLE_CORES`` or ``NEURON_RT_NUM_CORES``: * ``NEURON_RT_VISIBLE_CORES`` takes precedence over ``NEURON_RT_NUM_CORES``. * If you are migrating to ``NEURON_RT_VISIBLE_CORES``: * For TensorFlow applications or PyTorch applications make sure that ``NEURONCORE_GROUP_SIZES`` is unset, or that ``NEURONCORE_GROUP_SIZES`` allocates the same or smaller number of NeuronCores as allocated by ``NEURON_RT_VISIBLE_CORES``. * For MXNet applications, setting ``NEURONCORE_GROUP_SIZES`` and ``NEURON_RT_VISIBLE_CORES`` environment variables at the same time is not supported. Use ``NEURON_RT_VISIBLE_CORES`` only. * See :ref:`nrt-configuration` for more details on how to use ``NEURON_RT_VISIBLE_CORES``. * If you are migrating to ``NEURON_RT_NUM_CORES``: * Make sure that ``NEURONCORE_GROUP_SIZES`` is unset. * See :ref:`nrt-configuration` for more details on how to use ``NEURON_RT_NUM_CORES``. #. Application running multiple processes accessing the same NeuronCore If your application accesses the same NeuronCore from multiple processes, this is no longer possible with ``libnrt.so``. Instead, modify your application to access the same NeuronCore from multiple threads. .. note:: Optimal performance of multi-model execution is achieved when each NeuronCore executes a single model. #. Neuron Tools * If you are using Neuron Monitor, see the neuron-monitor upgrade notes for details. * If you are using ``neuron-cli`` remove any call to ``neuron-cli``. For more information, see :ref:`maintenance_neuron-cli`. #. Containers If your application is running within a container, and it previously executed ``neuron-rtd`` within the container, you need to re-build your container, so it will not include or install ``aws-neuron-runtime``. See :ref:`neuron-containers` for details. Troubleshooting --------------- Application fails to start ^^^^^^^^^^^^^^^^^^^^^^^^^^ Description ~~~~~~~~~~~ Starting with the *Neuron 1.16.0* release, Neuron Runtime (``libnrt.so``) requires *Neuron Driver 2.0* or greater (``aws-neuron-dkms``). Neuron Runtime requires the Neuron Driver (``aws-neuron-dkms`` package) to access Neuron devices. If ``aws-neuron-dkms`` is not installed, the application will fail with an error message on the console and syslog similar to the following: .. code:: NRT:nrt_init Unable to determine Neuron Driver version. Please check aws-neuron-dkms package is installed. If an old ``aws-neuron-dkms`` is installed, the application will fail with an error message on the console and syslog similar to the following: .. code:: NRT:nrt_init This runtime requires Neuron Driver version 2.0 or greater. Please upgrade aws-neuron-dkms package. Solution ~~~~~~~~ Follow the installation steps in :ref:`install-guide-index` to install ``aws-neuron-dkms``. .. include:: ./important-neuronx-dkms.txt Application fails to start although I installed latest ``aws-neuron-dkms`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Description ~~~~~~~~~~~ Starting from the *Neuron 1.16.0* release, Neuron Runtime (``libnrt.so``) requires *Neuron Driver 2.0* or greater (``aws-neuron-dkms``). If an old ``aws-neuron-dkms`` is installed, the application will fail. You may try to install ``aws-neuron-dkms`` and still face application failure, because the ``aws-neuron-dkms`` installation failed as a result of ``neuron-rtd`` daemon that was still running. Solution ~~~~~~~~ * Stop ``neuron-rtd`` by running: ``sudo systemctl stop neuron-rtd`` * Uninstall ``neuron-rtd`` by running: ``sudo apt remove aws-neuron-runtime`` or sudo ``dnf remove aws-neuron-runtime`` * Install ``aws-neuron-dkms`` by following steps in :ref:`install-guide-index` .. include:: ./important-neuronx-dkms.txt Application unexpected behavior when upgrading to release *Neuron 1.16.0* or newer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Description ~~~~~~~~~~~ When upgrading to release *Neuron 1.16.0* or newer from previous releases, the OS may include two different versions of *Neuron Runtime*: the ``libnrt.so`` shared library and ``neuron-rtd`` daemon. This can happen if the user did not stop ``neuron-rtd`` daemon or did not make sure to uninstall the existing Neuron version before upgrade. In this case the user application may behave unexpectedly. Solution ~~~~~~~~ If the OS includes two different versions of *Neuron Runtime*, ``libnrt.so`` shared library and ``neuron-rtd`` daemon: * Before running applications that use ``neuron-rtd``, restart ``neuron-rtd`` by calling ``sudo systemctl restart neuron-rtd``. * Before running applications linked with ``libnrt.so``, stop ``neuron-rtd`` by calling ``sudo systemctl stop neuron-rtd``. Application unexpected behavior when downgrading to releases before *Neuron 1.6.0* (from *Neuron 1.16.0* or newer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Description ~~~~~~~~~~~ When upgrading to release *Neuron 1.16.0* or newer from previous releases, and then downgrading back to releases before *Neuron 1.6.0*, the OS may include two different versions of *Neuron Runtime*: the ``libnrt.so`` shared library and ``neuron-rtd`` daemon. This can happen if the user did not make sure to uninstall the existing Neuron version before the upgrade or downgrade. In this case the user application may behave unexpectedly. Solution ~~~~~~~~ If the OS include two different versions of *Neuron Runtime*, ``libnrt.so`` shared library and ``neuron-rtd`` daemon: * Before running applications that use ``neuron-rtd``, restart ``neuron-rtd`` by calling ``sudo systemctl restart neuron-rtd``. * Before running applications linked with ``libnrt.so``, stop ``neuron-rtd`` by calling ``sudo systemctl stop neuron-rtd``. Neuron Core is in use ^^^^^^^^^^^^^^^^^^^^^ Description ~~~~~~~~~~~ A Neuron Core cannot be shared between two applications. If an application started using a Neuron Core all other applications trying to use the NeuronCore will fail during runtime initialization with the following message in the console and in syslog: .. code:: bash ERROR NRT:nrt_allocate_neuron_cores NeuronCore(s) not available - Requested:nc1-nc1 Available:0 Solution ~~~~~~~~ Terminate the the process using NeuronCore and then try launching the application. Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my model to run it with Neuron Runtime 2.x (``libnrt.so``)? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No. Do I need to change my application launch command? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No. Can ``libnrt.so`` and ``neuron-rtd`` co-exist in the same environment? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Although we recommend upgrading to the latest Neuron release, we understand that for a transition period you may continue using ``neuron-rtd`` for old releases. If you are using Neuron Framework (PyTorch,TensorFlow or MXNet) from releases before *Neuron 1.16.0*: * Install the latest Neuron Driver (``aws-neuron-dkms``) .. include:: ./important-neuronx-dkms.txt * For development, we recommend using different environments for Neuron Framework (PyTorch,TensorFlow or MXNet) from releases before *Neuron 1.16.0* and for Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. If that is not possible, make sure to stop ``neuron-rtd`` before executing models using Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. * For deployment, when you are ready to upgrade, upgrade to Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. See :ref:`neuron-migrating-apps-neuron-to-libnrt` for more information. .. warning :: Executing models using Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer in an environment where ``neuron-rtd`` is running may cause undefined behavior. Make sure to stop ``neuron-rtd`` before executing models using Neuron Framework (PyTorch,TensorFlow or MXNet) from *Neuron 1.16.0* and newer. Are there Neuron framework versions that will not support Neuron Runtime 2.x (``libnrt.so``)? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ All supported PyTorch Neuron and TensorFlow framework extensions, in addition to Neuron MXnet 1.8.0 framework extensions support Neuron Runtime 2.x. Neuron MxNet 1.5.1 does not support Neuron Runtime 2.x (``libnrt.so``) and has now entered maintenance mode. See :ref:`maintenance_mxnet_1_5` for details. ================================================ FILE: about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.rst ================================================ .. _neuronx-cc-training-mixed-precision: Mixed Precision and Performance-accuracy Tuning (``neuronx-cc``) ================================================================ .. contents:: Table of contents :local: :depth: 2 Overview -------- The Neuron Compiler supports machine learning models with FP32, TF32, FP16 and BF16 (Bfloat16) tensors and operators. The Neuron hardware supports a mix of 32, 16, and 8 bit datatypes. This guide explains how to apply the available auto-cast methods and their performance / accuracy trade-offs when compiling a model with Neuron. .. note:: Neuron Compiler support for INT8 is planned for a future Neuron SDK release. See `Neuron Compiler: Enable Neuron INT8 support `_ for details. Neuron Hardware --------------- The Neuron v2 hardware supports matrix multiplication using FP16, BF16, TF32, and FP32 on its matrix multiply ("matmult") engine, and accumulations using FP32. Operators such as activations or vector operations are supported using FP32, TF32, FP16, and BF16. Supporting FP16 and BF16 allows Neuron to have significantly higher performance than executing everything as FP32. Performance-accuracy tradeoffs ------------------------------ **By default**, the Neuron Compiler will **automatically cast FP32 matrix multiplication operations to BF16**. The remaining operations are performed in the data type specified by the model. The Neuron Compiler provides CLI options that direct the compiler to cast to other data types, thereby giving the ability to choose an accuracy-to-performance tradeoff in model execution. Deciding what CLI settings to use will be application specific and may require some experimentation. See :ref:`Neuron Compiler CLI Reference Guide` for details. What is the difference between Data Types? ------------------------------------------- The NeuronCore v2 support multiple data types (see :ref:`NeuronCore v2 Data Types`). Each data type provides benefits and drawbacks due to its dynamic range and numeric precision. +------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+ | Type | Minimum | Maximum | Strength | Weakness | +======+===========+==========+========================================================+===================================================+ | FP16 | -65504 | 65504 | Numeric Precision, High granularity, Mid-range numbers | Low range, medium precision | +------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+ | BF16 | -3.40E+38 | 3.40E+38 | Dynamic Range, Extremely small/large numbers | Low precision | +------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+ | TF32 | -3.40E+38 | 3.40E+38 | Dynamic Range, Extremely small/large numbers | Medium precision | +------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+ | FP32 | -3.40E+38 | 3.40E+38 | N/A | Larger model size, potentially slower computation | +------+-----------+----------+--------------------------------------------------------+---------------------------------------------------+ * FP16 provides a high density of representable values that are neither extremely small or extremely large. The density of representable values within the range is approximately an order of magnitude greater than BF16. * Conversion from FP32 to FP16 will perform well when values are relatively small but non-extreme (either very small or very large). * Conversion from FP32 to FP16 will perform badly if the original FP32 values are outside of the range of FP16. This will produce inf/-inf values and may result in NaN depending on the operation. * BF16 provides a wider range of representable values which includes both very small and very large values. However, the overall density of representable values is usually lower than FP16 for more non-extreme values. The range is nearly identical to the range of FP32 but because the number of bits is halved, this means the individual values are sparse. * Conversion from FP32 to BF16 will perform well when the values are well-distributed throughout the range. Since BF16 covers the entire FP32 range, this means each original value can map to a relatively close downcast value. * Conversion from FP32 to BF16 will perform badly when fine granularity is needed. Since BF16 granularity is sacrificed for greater range it will almost always map worse to values that are within the FP16 range. Should I downcast operations to smaller Data Types? --------------------------------------------------- This choice here is driven entirely by accuracy vs performance tradeoff. Casting operations to smaller 16-bit data types will provide a significant performance benefit but may end up sacrificing accuracy. The compiler uses BF16 casting **by default** for matrix multiplication operations. The speedup from casting operations gives a significant performance boost and the range of representable values in BF16 allows for more safety compared to FP16 when the possible numeric range of input values is unknown. The Neuron Compiler's ``--auto-cast`` and ``--auto-cast-type`` CLI options are used to direct the compiler to perform alternate casting operations. See the detailed list of the options in :ref:`Neuron v2 Compiler CLI Reference Guide`. The default setting is ``--auto-cast=none``, which is applied if the ``--auto-cast`` flag is not provided. The option combinations to consider in a typical flow are: +---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+ | Compiler autocast | Options Effect | Performance | Accuracy | +=========================================================+==========================================================================+=====================================================+=================================================+ | ``--auto-cast none`` (default) | Disables all auto-casting, using the data types defined within the model | Lowest performance | Highest accuracy | +---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+ | ``--auto-cast matmult --auto-cast-type tf32`` | | Performance *increases* as you move down the table | Accuracy *decreases* as you move down the table | +---------------------------------------------------------+--------------------------------------------------------------------------+ | | | ``--auto-cast all —-auto-cast-type tf32`` | Balance of performance, dynamic range, and precision | | | +---------------------------------------------------------+--------------------------------------------------------------------------+ | | | ``--auto-cast matmult --auto-cast-type fp16`` | | | | +---------------------------------------------------------+--------------------------------------------------------------------------+ | | | ``--auto-cast all —-auto-cast-type fp16`` | Best performance at the expense of dynamic range | | | +---------------------------------------------------------+--------------------------------------------------------------------------+ | | | ``--auto-cast matmult --auto-cast-type bf16`` | Best performance at the expense of precision | | | +---------------------------------------------------------+ + ----------------------------------------------------+-------------------------------------------------+ | ``--auto-cast all --auto-cast-type bf16`` | | Highest performance | Lowest accuracy | +---------------------------------------------------------+--------------------------------------------------------------------------+-----------------------------------------------------+-------------------------------------------------+ Note that compiler has to preserve the input/output (i/o) tensor types requested by Framework, therefore no casting is done on the i/o tensors. Additional speedup can be obtained by casting them in the Framework prior to compilation. To learn how to configure the compiler options from within your application’s framework, please see: * :ref:`Developer Guide for Training with PyTorch Neuron ` ================================================ FILE: about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference.rst ================================================ .. _introduce-nxd-inference: Introducing NeuronX Distributed (NxD) Inference ================================================= .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the Neuron SDK 2.21 release, we are introducing NxD Inference, an open-source PyTorch-based inference library that simplifies deep learning model deployment on AWS Inferentia and Trainium instances. NxD Inference is designed for optimized inference, enabling quick onboarding of PyTorch models with minimal changes. It features a modular architecture that facilitates easy integration of HuggingFace PyTorch models and is compatible with serving engines like vLLM. Please see :ref:`nxdi-index` for NxD Inference overview and documentation. How can I install NxD Inference library? ----------------------------------------- Please refer to :ref:`nxdi-setup` for installation instructions. I am currently using the Transformers NeuronX library for inference. How does the NxD Inference library affect me? -------------------------------------------------------------------------------------------------------------------- If you are using Transformers NeuronX (TNx) in production, you can continue doing so. However, if you are planning to onboard new models to Neuron for inference, NxD Inference offers several advantages to consider. NxD Inference is designed to enable easy on-boarding of PyTorch models and comes with new features and enhanced support: * **Hardware Support**: While TNx is not supported on Trn2, NxD Inference supports all platforms (Trn1, Inf2, and Trn2) * **Simplified interface**: To simplify model development with NxD Inference, you write modeling code using PyTorch with standard Python, rather than using PyHLO as in TNx. * **Easy Migration**: NxD Inference was designed to provide seamless migration from TNx, especially if you are using it with vLLM. You can migrate your existing TNx inference scripts using the :ref:`migration guide ` * **Enhanced Capabilities**: NxD Inference offers more comprehensive support for MoE models and multimodal models (Llama 3.2) compared to TNx * **Future Development**: New inference features and support for advanced model architectures (like multi-modality/video models) will be focused on NxD Inference I am currently using vLLM with Transformers NeuronX library for inference. Does NxD Inference library support vLLM ? --------------------------------------------------------------------------------------------------------------------- Yes, NxD Inference library supports vLLM inference engine. Neuron vLLM integration in 2.21 release will start supporting both NxD Inference and Transformers NeuronX libraries. To use vLLM with NxD Inference library, you can refer to the :ref:`nxdi-vllm-user-guide-v1`. What features and models are available in Transformers NeuronX (TNx) but not yet in NeuronX Distributed Inference? ------------------------------------------------------------------------------------------------------------------- While NxD Inference supports most features and models available in TNx, there are some differences in current support that users should be aware of. **Features that are not yet supported in NxD Inference**: The following TNx features aren't supported yet in the NxD Inference library. * Multi-Node Inference support **Models not part of NxD Inference Model Hub**: The following models are included in Transformers NeuronX but not currently in NxD Inference library: * Bloom * GPT2 * GPT-J * GPT-NEOX If you need to use these models with NxD Inference, we encourage you to follow the :ref:`onboarding models developer guide `. The onboarding process in NxD Inference is more straightforward compared to TNx due to its PyTorch-based architecture. I currently use Hugging Face TGI serving engine for deploying and serving Large Language Models (LLMs) on Neuron. How does NxD Inference library affect me? ----------------------------------------------------------------------------------------------------------------------------------------------------------- If you are currently using Hugging Face TGI serving engine to deploy models on Neuron, the introduction of NxD Inference library will not have any impact and you can continue to use your existing inference workloads. Hugging Face TGI integrates with Neuron SDK Inference libraries in a way that abstracts the underlying library for the users. I am new to Neuron and have inference workloads, what library should I use? ---------------------------------------------------------------------------- We recommend you use NxD Inference for your model inference workloads. To learn how to get started using NxD Inference, see the :ref:`nxdi-index` documentation Additional Resources -------------------- * :ref:`nxdi-index` * :ref:`nxdi-overview` * :ref:`nxd-inference_rn` ================================================ FILE: about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training.rst ================================================ .. _introduce-nxd-training: Introducing NxD Training =================================================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the Neuron 2.20 release, we are introducing NxD Training. In doing so, we are expanding NeuronX Distributed library (previously called NxD that will now be called NxD Core) to NxD Training with data science/engineering modules, and end to end examples. NxD Training is a PyTorch based distributed training library that enables customers to train large-scale models. Some key distributed strategies supported by NxD Training include 3D-parallelism (data parallelism, tensor parallelism and pipeline parallelism) and ZeRO-1 (where optimizer states are partitioned across workers). NxD Training supports model training workflows like pretraining, supervised finetuning (SFT) and parameter efficient finetuning (PEFT) using Low-Rank Adapter (LoRA) techniques [#f1]_. For developers, NxD Training offers both API level access through NxD Core and PyTorch Lightning and an intuitive interface via YAML based configuration files. NxD Training offers a flexible approach that enables customers to leverage only the functionalities that align with their unique workflows and seamlessly integrate their machine learning training software at the appropriate level within NxD Training, ensuring a user experience tailored to their specific requirements. This is a beta preview version of NxD Training and feedback from the developer community is strongly encouraged for upcoming releases. .. _how-nxd-core-user-affected: I currently use NeuronX Distributed (NxD Core). How does NxD Training release affect me? --------------------------------------------------------------------------------------------------------------- Existing NxD Core customers can continue to use NxD Core APIs available under NxD Training. If workflows based on NxD Core meet your needs, you do not need to do anything different with NxD Training’s introduction. NxD Core APIs and functionalities for NxD Core continue to be available to you as before. You can choose to :ref:`install NxD Core only ` and skip all subsequent installation steps for NxD Training. However, NxD Training has additional support for YAML based configuration, a model hub and integration with PyTorch Lightning. If these capabilities are of interest to you, you may choose to evaluate and start using NxD Training. .. _should_nnm_usage_continue: Should the current Neuron NeMo Megatron (NNM) users continue to use NNM? ------------------------------------------------------------------------------------------------ NxD Training offers same capabilities as Neuron NeMo Megatron (NNM). Additionally, NNM will go into maintenance mode in the next release. If you are currently using NNM, the introduction of NxD Training toolkit means that you should start evaluating NxD Training for your training needs. With its YAML interface, NxD Training is very close in terms of usability to NNM and NeMo. Migrating from NNM to NxD Training should involve a relatively minor effort and instructions for doing so are provided :ref:`here `. .. _what_to_use_as_new_user: I am new to Neuron and have training workloads, what toolkits or libraries should I use? ---------------------------------------------------------------------------------------- If you are starting with Neuron and looking for solutions to your model pretraining or finetuning needs, then NxD Training is the recommended toolkit for you. Please start from :ref:`NxD Training page ` for overview, installation and usage instructions. Additional Resources ------------------------ Multiple NxD Training resources on getting started, using it and getting required support are listed below. If you encounter issues or have product related questions, please refer to FAQs and troubleshooting guides. Additionally, please feel free to reach out to us using resources in Support section. :ref:`How to get started ` :ref:`Release notes ` :ref:`Main section ` :ref:`Troubleshooting ` :ref:`Support ` .. [#f1] Supported through NxD Core. ================================================ FILE: about-neuron/appnotes/perf/neuron-cc/parallel-ncgs.rst ================================================ .. _parallel-exec-ncgs: Parallel Execution using NEURON_RT_NUM_CORES =============================================== .. important :: ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting with the Neuron 1.19.0 release. If your application uses ``NEURONCORE_GROUP_SIZES`` see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details. Introduction ------------ Inf1 instances are available with a different number of Inferentia chips. Each Inferentia chip consists of 4 NeuronCores and an Inf1 instance includes 4 to 64 NeuronCores, depending on the size of the instance. This guide shows you how to load one or more compiled models into different consecutive groups of NeuronCores using your framework of choice. Data Parallel Execution ----------------------- In PyTorch and TensorFlow, the same compiled model can run in parallel on an Inf1 instance by loading it multiple times, up to the total number of NeuronCores specified in NEURON_RT_NUM_CORES or NEURON_RT_VISIBLE_CORES. For more information about NEURON_RT_NUM_CORES and NEURON_RT_VISIBLE_CORES, refer to :ref:`Neuron Runtime Configuration `. Running multiple models using single process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To run multiple models using a single process, set the environment variable ``NEURON_RT_NUM_CORES`` with a list of the number of cores in each group, separated by commas. You can set the ``NEURON_RT_NUM_CORES`` environment variable at runtime: .. code :: bash #!/bin/bash NEURON_RT_NUM_CORES=13 python your_neuron_application.py Or from within the Python process running your models (NOTE: You can only set it once in the same process at the beginning of the script): .. code :: bash #!/usr/bin/env python import os # Set Environment os.environ['NEURON_RT_NUM_CORES']='13' # Load models and run inferences ... The following examples allow you to load 4 models into 4 groups of NeuronCores within one process. For example, if there are 4 models A, B, C, D compiled to 2, 4, 3, and 4 NeuronCores respectively, directly load the models A, B, C, D in sequence within your TensorFlow or PyTorch Neuron process. This example requires an inf1.6xlarge instance with 16 NeuronCores, as the total number of NeuronCores within the NeuronCore Groups is 13. In MXNet, mapping from models to NeuronCores is controlled by context ``mx.neuron(neuron_core_index)`` where ``neuron_core_index`` is the NeuronCore index at the start of the group. In the example above, map model A to ``mx.neuron(0)`` context, model B to ``mx.neuron(2)`` context, model C to ``mx.neuron(6)`` context and model D to ``mx.neuron(9)`` context. For further details, refer to :ref:`Flexible Execution Group (FlexEG) in Neuron-MXNet`. For PyTorch See :ref:`Data Parallel Inference on Torch Neuron` for more details. For Tensorflow .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='13' # Load models (TF2) model0 = tf.keras.models.load_model(model0_file) # loaded into the first group of NC0-NC1 model1 = tf.keras.models.load_model(model1_file) # loaded into the second group of NC2-NC5 model2 = tf.keras.models.load_model(model1_file) # loaded into the third group of NC6-NC8 model3 = tf.keras.models.load_model(model1_file) # loaded into the fourth group of NC9-NC12 # run inference by simply calling the loaded model results0 = model0(inputs0) results1 = model1(inputs1) results2 = model2(inputs2) results3 = model3(inputs3) For MXNet 2.x: .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='13' # Load models (MXNet) # loaded into the first group of NC0-NC1 sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0) model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # loaded into the second group of NC2-NC5 sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0) model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # loaded into the third group of NC6-NC8 sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0) model2 = sym.bind(ctx=mx.neuron(6), args=args, aux_states=aux, grad_req='null') # loaded into the fourth group of NC9-NC12 sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0) model3 = sym.bind(ctx=mx.neuron(9), args=args, aux_states=aux, grad_req='null') # run inference by simply calling the loaded model results0 = model0.forward(data=inputs0) results1 = model1.forward(data=inputs1) results2 = model2.forward(data=inputs2) results3 = model3.forward(data=inputs3) You can identify the NeuronCores used by each application with the ``neuron-top`` command line tool. For more information about the neuron-top user interface, see :ref:`Neuron Top User Guide `. .. code :: bash $ neuron-top .. figure:: /images/multi_1core_models_multi_processes.png :scale: 80 % Running multiple models using multiple processes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can also run multiple models in parallel processes, when you set ``NEURON_RT_NUM_CORES`` per process: .. code :: bash $ NEURON_RT_NUM_CORES=2 python your_1st_neuron_application.py $ NEURON_RT_NUM_CORES=2 python your_2nd_neuron_application.py The first process automatically selects a first set of 2 unused NeuronCores for its new group. The second process automatically selects a new set of 2 unused NeuronCores for its new group. .. figure:: /images/multi_2cores_models_multi_processes.png :scale: 80 % Running multiple models on the same NeuronCore group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can load more than one model in a NeuronCore group within one process. Neuron runtime handles switching from one model to the next model within the NeuronCore group, when the next model is run within the application. In TensorFlow or PyTorch, simply load the additional models after the initial number of models have been loaded, to fill the NeuronCore groups associated with the process. For PyTorch: .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='2' # Load models (PT) model0 = torch.jit.load(model0_file) # loaded into the first group of NC0-NC1 model1 = torch.jit.load(model1_file) # loaded into the first group of NC0-NC1 # run inference by simply calling the loaded model results0 = model0(inputs0) results1 = model1(inputs1) For TensorFlow 2.x: .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='2' # Load models (TF2) model0 = tf.keras.models.load_model(model0_file) # loaded into the first group of NC0-NC1 model1 = tf.keras.models.load_model(model1_file) # loaded into the first group of NC0-NC1 # run inference by simply calling the loaded model results0 = model0(inputs0) results1 = model1(inputs1) In MXNet, use context ``mx.neuron(neuron_core_index)`` and use the same NeuronCore start index for the additional models. .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='2' # Load models (MXNet) # loaded into the first group of NC0-NC1 sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0) model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # loaded into the first group of NC0-NC1 sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0) model1 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # run inference by simply calling the loaded model results0 = model0.forward(data=inputs0) results1 = model1.forward(data=inputs1) The total ``NEURON_RT_NUM_CORES`` across all processes cannot exceed the number of NeuronCores available on the instance. For example, on an inf1.xlarge with default configurations where the total number of NeuronCores visible to TensorFlow-Neuron is 4, you can launch one process with ``NEURON_RT_NUM_CORES=2`` (pipelined) and another process with ``NEURON_RT_NUM_CORES=2`` (data-parallel). Examples using ``NEURON_RT_NUM_CORES`` include: * :ref:`PyTorch example ` * :ref:`MXNet example ` Auto Model Replication in TensorFlow Neuron (``tensorflow-neuron``) (Beta) ---------------------------------------------------------------------------------- Refer to the following API documentation to see how to perform automatic replication on multiple cores. Note auto-replication will only work on models compiled with pipeline size 1: via ``--neuroncore-pipeline-cores=1``. If automatic replication is not enabled, the model will default to replicate on up to 4 cores. Python API (TF 2.x only): :ref:`tensorflow-ref-auto-replication-python-api` CLI API (TF 1.x and TF 2.x): :ref:`tensorflow-ref-auto-replication-cli-api` Auto Model Replication (Being Deprecated) ----------------------------------------- The Auto Model Replication feature in TensorFlow-Neuron enables you to load the model once and the data parallel replication will occur automatically. This reduces framework memory usage, as the same model is not loaded multiple times. This feature is beta and available in TensorFlow-Neuron only. To enable Auto Model Replication, set NEURONCORE_GROUP_SIZES to Nx1, where N is the desired replication count (the number of NeuronCore groups, each group has size 1). For example, NEURONCORE_GROUP_SIZES=8x1 would automatically replicate the single-NeuronCore model 8 times. .. code :: python os.environ['NEURONCORE_GROUP_SIZES'] = '4x1' or .. code :: bash NEURONCORE_GROUP_SIZES=4x1 python3 application.py When NEURONCORE_GROUP_SIZES is not set, the default is 4x1, where a single-NeuronCore model is replicated 4 times on any size of inf1 machine. This feature is only available for models compiled with neuroncore-pipeline-cores set to 1 (default). You will still need to use threads in the scaffolding code, to feed the loaded replicated model instance, to achieve high throughput. Example of auto model replication: :ref:`/src/examples/tensorflow/openpose_demo/openpose.ipynb` FAQ --- Can I mix data parallel and NeuronCore Pipelines? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes. You can compile the model using the neuroncore-pipeline-cores option. This tells the compiler to set compilation to the specified number of cores for :ref:`neuroncore-pipeline`. The Neuron Compiler returns a NEFF that fits within this limit. See the :ref:`neuron-compiler-cli-reference` for instructions on how to use this option. For example, on an inf1.2xlarge, you can load two model instances, each compiled with neuroncore-pipeline-cores set to 2, so they can run in parallel. The model instances can be loaded from different saved models or from the same saved model. Can I have a mix of multiple models in one Neuroncore group and single model in another one Neuroncore group? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Currently, you can do this in MXNet, by setting up two Neuroncore groups, then loading, for example, multiple models in one NCG, using context mx.neuron(0), and loading a single model in the second NCG, using context mx.neuron(2). You can also load a single model in the first NCG and multiple models in the second NCG. For example: .. code :: python # Set Environment os.environ['NEURON_RT_NUM_CORES']='6' # Load models (MXNet) # loaded into the first group of NC0-NC1 sym, args, aux = mx.model.load_checkpoint(mx_model0_file, 0) model0 = sym.bind(ctx=mx.neuron(0), args=args, aux_states=aux, grad_req='null') # loaded into the second group of NC2-NC5 sym, args, aux = mx.model.load_checkpoint(mx_model1_file, 0) model1 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # loaded into the second group of NC2-NC5 sym, args, aux = mx.model.load_checkpoint(mx_model2_file, 0) model2 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # loaded into the second group of NC2-NC5 sym, args, aux = mx.model.load_checkpoint(mx_model3_file, 0) model3 = sym.bind(ctx=mx.neuron(2), args=args, aux_states=aux, grad_req='null') # run inference by simply calling the loaded model results0 = model0.forward(data=inputs0) results1 = model1.forward(data=inputs1) results2 = model2.forward(data=inputs2) results3 = model3.forward(data=inputs3) Loading multiple models in one NCG and a single model in another NCG is currently not supported in TensorFlow and PyTorch. ================================================ FILE: about-neuron/appnotes/perf/neuron-cc/performance-tuning.rst ================================================ .. _appnote-performance-tuning: Performance Tuning ================== .. important :: NeuronCore Groups (NCG) have been deprecated. See :ref:`eol-ncg` and :ref:`neuron-migrating-apps-neuron-to-libnrt` for more details. This guide is intended to provide the reader with an in-depth understanding of how to optimize neural network performance on Inferentia for both throughput and latency. For simplicity, the guide uses the TensorFlow and ResNet-50 models as teaching examples to show how to choose between different compile-time optimizations (e.g., Batching and NeuronCore Pipeline), as well as model-serving optimizations (e.g., multi-threading and dynamic-batching) to improve inference performance. The following guides are considered to be prerequisites for this tutorial: - :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb` - TensorFlow Serving NeuronCore Group - :ref:`neuron-batching` - :ref:`neuroncore-pipeline` Batching and pipelining (technical background) ---------------------------------------------- Neuron provides developers with various performance optimization features. Two of the most widely used features are batching and pipelining. Both techniques aim to keep the data close to the compute engines, but they achieve this data locality in different ways. In batching it is achieved by loading the data into an on-chip cache and reusing it multiple times for multiple different model-inputs, while in pipelining it is achieved by caching all model parameters into the on-chip cache across multiple NeuronCores and streaming the calculation across them. As a general rule of thumb, batching is preferred for applications that aim to optimize throughput and cost at the expense of latency, while pipelining is preferred for applications with a high-throughput requirement under a strict latency budget. Compiling for batching optimization ----------------------------------- To enable batching optimization, the model must first be compiled for a target batch-size. This is done by specifying the batch size in the input tensor's batch dimension during compilation. Users are encouraged to evaluate multiple batch size, in order to determine the optimal latency/throughput deployment-point, which is application-dependent. For example, the code snippet below enables batching on a ResNet50 model, with a batch-size of 5: .. code:: python import numpy as np import tensorflow.neuron as tfn # To change the batch size, change the first dimension in example_input batch_size = 5 example_input = np.zeros([batch_size,224,224,3], dtype='float16') tfn.saved_model.compile("rn50_fp16", "rn50_fp16_compiled/1", model_feed_dict={'input_1:0': example_input }, dynamic_batch_size=True) .. note:: Depending on the size of the neural network, Neuron has a maximum batch size that works optimally on Inferentia. If an unsupported batch size is used, an internal compiler error message will be displayed. A simple way to explore optimal batch size for your specific model is to increment the batch size from 1 upward, one at a time, and test application performance. Compiling for pipeline optimization ----------------------------------- In NeuronCore Pipeline mode, Neuron stores the model parameters in Inferentias' local cache and streams inference requests across the available NeuronCores, as specified by the ``--neuroncore-pipeline-cores`` compiler argument. For example, to compile the model to fit a pipeline size of four Inferentia devices (16 NeuronCores) avaliable in the inf1.6xlarge instance size: .. code:: python import numpy as np import tensorflow.neuron as tfn compiler_args = ['--neuroncore-pipeline-cores', '16'] example_input = np.zeros([1,224,224,3], dtype='float16') tfn.saved_model.compile("rn50_fp16", "rn50_fp16_compiled/1", model_feed_dict={'input_1:0': example_input }, compiler_args=compiler_args) The minimum number of NeuronCores needed to run a compiled model can be found using the Neuron Check Model tool. See :ref:`neuron_check_model`. Model-serving inference optimizations ------------------------------------- To fully realize the maximum throughput of the compiled model (for either batching and pipelining), users need to launch multiple host CPU threads to feed inputs into the Neuron pipeline. The number of threads needs to be larger than the specified maximum number of NeuronCores. Additionally, dynamic batching can be used to process a larger client-side inference batch-size and the framework automatically breaks up the user-batch into smaller batch sizes, to match the compiled batch-size. This technique increases the achievable throughput by hiding the framework-to-neuron overhead, and amortizing it over a larger batch size. To use dynamic batching, set the argument ``--dynamic_batch_size=True`` during compilation and send a larger inference batch size (user inference batch size) that is equal to a multiple of the compiled batch size. Both methods can be applied together if this improves performance. However, multi-threading is always needed as a first step to achieve high throughput. You need to experiment to find optimal settings for your application. By default the framework sets the number of outstanding inference requests to the total number of NeuronCores plus three. This can be changed by setting the NEURON_MAX_NUM_INFERS environment variable. For example, if the compiled model includes CPU partitions (e.g., if the Neuron compiler decides that some operations are more efficient to execute on CPU), the number of threads needs to be increased to account for the additional compute performed on the CPU. Note that the available instance host memory size needs to be taken into consideration to prevent out-of-memory errors. As above, you need to experiment in order to find the optimal settings for your application. .. note:: By default the framework allocates a NeuronCore Group size to match the size of the compiled model. The size of the model is the number of NeuronCores limit passed to compiler during compilation (``--neuroncore-pipeline-cores`` option). For more information see the TensorFlow Serving NeuronCore Group documentation. Other considerations -------------------- Mixed Precision ~~~~~~~~~~~~~~~ You can find more information about performance and accuracy trade offs in :ref:`neuron-cc-training-mixed-precision`. Operator support ~~~~~~~~~~~~~~~~ The Neuron Compiler maintains an evolving list of supported operators for each framework: :ref:`neuron-supported-operators` AWS Neuron handles unsupported operators by partitioning the graph into subgraphs and executing them on different targets (e.g., NeuronCore partition, CPU partition). If the entire model can run on Inferentia (i.e., all operators are supported), then it will be compiled into a single subgraph, which will be executed by a NeuronCore Group. Debug ~~~~~ You can examine the post-compiled model to view the compilation results using the Neuron plugin for TensorBoard. See :ref:`tensorboard-plugin-visualize-graph`. ResNet-50 optimization example ------------------------------ For an example demonstrating the concepts described here, see :ref:`/src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb` ================================================ FILE: about-neuron/appnotes/torch-neuron/bucketing-app-note.rst ================================================ .. _bucketing_app_note: Running inference on variable input shapes with bucketing ========================================================= .. contents:: Table of contents :local: :depth: 2 Introduction ------------ With Inferentia, the shape of every input must be fixed at compile time. For applications that require multiple input sizes, we recommend using padding or bucketing techniques. Padding requires you to compile your model with the largest expected input size and pad every input to this maximum size. If the performance of your model using padding is not within your targets, you can consider implementing bucketing. This guide introduces bucketing, a technique to run inference on inputs with variable shapes on Inferentia. The following sections explain how bucketing can improve the performance of inference workloads on Inferentia. It covers an overview of how bucketing works and provides examples of using bucketing in :ref:`computer vision ` and :ref:`natural language processing` applications. Applications that benefit from bucketing ---------------------------------------- Bucketing refers to compiling your model multiple times with different target input shapes to create “bucketed models." :ref:`creating_buckets` provides an overview on selecting the input shapes that you use to create bucketed models. At inference time, each input is padded until its shape matches the next largest bucket shape. The padded input is then passed into the corresponding bucketed model for inference. By compiling the same model with multiple different input shapes, the amount of input padding is reduced compared to padding every input to the maximum size in your dataset. This technique minimizes the compute overhead and improves inference performance compared to padding every image to the maximum shape in your dataset. Bucketing works best when multiple different bucketed models are created to efficiently cover the full range of input shapes. You can fine-tune the model performance by experimenting with different bucket sizes that correspond to the distribution of input shapes in your dataset. Bucketing can only be used if there is an upper bound on the shape of the inputs. If necessary, an upper bound on the input shape can be enforced using resizing and other forms of preprocessing. .. _num_buckets: The upper bound on the number of bucketed models that you use is dictated by the total size of the compiled bucketed models. Each Inferentia chip has 8GB of DRAM, or 2GB of DRAM per NeuronCore. An inf1.xlarge and inf1.2xlarge have 1 Inferentia chip, an inf1.6xlarge has 4 Inferentia chips, and an inf1.24xlarge has 16 Inferentia chips. Thus, you should limit the total size of all bucketed models to around 8GB per Inferentia chip or 2GB per NeuronCore. The following formula provides an approximation for the number of compiled bucketed models you can fit on each NeuronCore: :: number-of-buckets = round(10^9 / number-of-weights-in-model) We recommend using :ref:`neuron-top ` to monitor the memory usage on your inf1 instance as you load multiple bucketed models. Implementing bucketing ----------------------- Implementing bucketing consists of two main parts: creating multiple bucketed models at compile-time and running inference using the bucketed models on (padded) inputs. The following sections describe how to implement bucketing to run inference in applications that have variable input shapes. .. _creating_buckets: Creating bucketed models ^^^^^^^^^^^^^^^^^^^^^^^^^ Before running inference, models should be compiled for different input shapes that are representative of the input dataset. The input shapes that are used to compile the models determine the bucket shapes that are used during inference. The bucket shapes should be chosen to minimize the amount of padding on each new input. Additionally, there should always be a bucket that’s large enough to handle the maximum input shape in the dataset. The limit on the number of compiled bucketed models that can be used is described in this :ref:`section`. Running inference with bucketing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ At inference time, each input should be padded to match the size of the next largest bucket, such that the height and width (or sequence length) of the padded input equals the size of the bucket. Then, the padded input should be passed into the corresponding bucket for inference. If necessary, it’s important to remove and/or crop any aberrant predictions that occur in the padded region. For example, in object detection applications, bounding box predictions that occur in the padded regions should be removed to avoid erroneous predictions. .. _bucketing_examples: Examples -------- The following sections provide examples of applying the bucketing technique to run inference in applications that have variable input shapes. .. _bucketing_example_cv: Computer vision bucketing ^^^^^^^^^^^^^^^^^^^^^^^^^^ As an example of implementing bucketing for computer vision models, consider an application where the height and width of images in dataset are uniformly distributed between `[400, 400]` and `[800, 800]`. Given that every input shape between `[400, 400]` and `[800, 800]` is equally likely, it could make sense to create bucketed models that divide up the range of input shapes into equally sized chunks. For example, we could create bucketed models for the input shapes `[500, 500]`, `[600, 600]`, `[700, 700]`, and `[800, 800]`. As an example of running inference with bucketing, let’s assume that we created bucketed models for the input shapes `[500, 500]`, `[600, 600]`, `[700, 700]`, and `[800, 800]`. If we receive an input with shape `[640, 640]`, we would pad the input to the next largest bucket, `[700, 700]`, and use this bucket for inference. If we receive an input with shape `[440, 540]`, we would need to pad the input to the bucket size, `[600, 600]`, and use this bucket for inference. As another example of creating bucketed models, consider a computer vision application where the dataset is not uniformly distributed. As before, let’s assume the input shapes range between `[400, 400]` to `[800, 800]`. Now, let’s assume the data shape distribution is bimodal, such that `[540, 540]` and `[720, 720]` are the two most common input shapes. In this example, it might make sense to create bucketed models for input shapes `[540, 540]`, `[720, 720]`, and `[800, 800]` to target the most common shapes while still including the entire range of input shapes. End-to-end computer vision bucketing example ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this example, we run inference in a computer vision application that has variable shaped images that range in shape from `[400, 400]` to `[800, 800]`. We create bucketed models for the input shapes `[500, 500]`, `[600, 600]`, `[700, 700]`, and `[800, 800]` to handle the variable input shapes. .. code-block:: python import numpy as np import torch from torchvision import models import torch_neuron # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Define the bucket sizes that will be used for compilation and inference bucket_sizes = [(500, 500), (600, 600), (700, 700), (800, 800)] # Create the bucketed models by compiling a model for each bucket size buckets = {} for bucket_size in bucket_sizes: # Create an example input that is the desired bucket size h, w = bucket_size image = torch.rand([1, 3, h, w]) # Compile with the example input to create the bucketed model model_neuron = torch.neuron.trace(model, image) # Run a warm up inference to load the model into Inferentia memory model_neuron(image) # Add the bucketed model based on its bucket size buckets[bucket_size] = model_neuron def get_bucket_and_pad_image(image): # Determine which bucket size to use oh, ow = image.shape[-2:] target_bucket = None for bucket_size in bucket_sizes: # Choose a bucket that's larger in both the height and width dimensions if oh <= bucket_size[0] and ow <= bucket_size[1]: target_bucket = bucket_size break # Pad the image to match the size of the bucket h_delta = target_bucket[0] - oh w_delta = target_bucket[1] - ow b_pad = h_delta # Bottom padding l_pad = 0 # Left padding t_pad = 0 # Top padding r_pad = w_delta # Right padding # Pad the height and width of the image padding_amounts = (l_pad, r_pad, t_pad, b_pad) image_padded = torch.nn.functional.pad(image, padding_amounts, value=0) return image_padded, target_bucket # Run inference on inputs with different shapes for _ in range(10): # Create an image with a random height and width in range [400, 400] to [800, 800] h = int(np.random.uniform(low=400, high=800)) w = int(np.random.uniform(low=400, high=800)) image = torch.rand(1, 3, h, w) # Determine bucket and pad the image image_padded, target_bucket = get_bucket_and_pad_image(image) # Use the corresponding bucket to run inference output = buckets[target_bucket](image_padded) .. _bucketing_example_nlp: Natural language processing bucketing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As an example of implementing bucketing for natural language processing models, consider an application where the lengths of tokenized sequences in a dataset are uniformly distributed between 0 and 128 tokens. Given that every tokenized sequence length between 0 and 128 is equally likely, it might make sense to create bucketed models that divide up the range of tokenized sequence lengths into equally sized chunks. For example, we could create bucketed models for tokenized sequence lengths 64 and 128. As an example of running inference with bucketing, let's assume that we created bucketed models for the input tokenized sequence lengths 64 and 128. If we receive a tokenized sequence with length 55, we would need to pad it to the bucket size 64 and use this bucket for inference. If we receive a tokenized sequence with length 112, we would need to pad it to the bucket size 128 and use this bucket for inference. End-to-end natural language processing bucketing example ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this example, we run inference in a natural language processing application that has variable length tokenized sequences that range from 0 to 128. We create bucketed models for lengths 64 and 128 to handle the variable input lengths. .. code-block:: python import numpy as np import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch_neuron # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) model.eval() # Define the bucket sizes that will be used for compilation and inference bucket_sizes = [64, 128] # Create the bucketed models by compiling a model for each bucket size buckets = {} for bucket_size in bucket_sizes: # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" # Create an example input that is the desired bucket size paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=bucket_size, padding='max_length', truncation=True, return_tensors="pt") # Convert example inputs to a format that is compatible with TorchScript tracing example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids'] # Compile with the example input to create the bucketed model model_neuron = torch.neuron.trace(model, example_inputs_paraphrase) # Run a warm up inference to load the model into Inferentia memory model_neuron(*example_inputs_paraphrase) # Add the bucketed model based on its bucket size buckets[bucket_size] = model_neuron def get_bucket_and_pad_paraphrase(paraphrase): # Determine which bucket size to use inputs = paraphrase['input_ids'] attention = paraphrase['attention_mask'] token_type = paraphrase['token_type_ids'] paraphrase_len = inputs.shape[1] target_bucket = None for bucket_size in bucket_sizes: if paraphrase_len <= bucket_size: target_bucket = bucket_size break # Pad the paraphrase to match the size of the bucket delta = target_bucket - paraphrase_len zeros = torch.zeros([1, delta], dtype=torch.long) inputs = torch.cat([inputs, zeros], dim=1) attention = torch.cat([attention, zeros], dim=1) token_type = torch.cat([token_type, zeros], dim=1) paraphrase_padded = inputs, attention, token_type return paraphrase_padded, target_bucket # Create two sample sequences sequence_0 = ("The only other bear similar in size to the polar bear is the " "Kodiak bear, which is a subspecies of the brown bear. Adult male " "polar bears weigh 350–700 kg and measure 2.4–3 meters in total " "length. All bears are short-tailed, the polar bear's tail is " "relatively the shortest amongst living bears.") sequence_1 = ("Around the Beaufort Sea, however, mature males reportedly " "average 450 kg. Adult females are roughly half the size of males " "and normally weigh 150–250 kg, measuring 1.8–2.4 meters in length. " "The legs are stocky and the ears and tail are small.") # Run inference on inputs with different shapes # We create the variable shapes by randomly cropping the sequences for _ in range(10): # Get random sequence lengths between 0 and 128 paraphrase_len = int(np.random.uniform(128)) # Crop the paraphrase paraphrase_cropped = tokenizer.encode_plus(sequence_0, sequence_1, max_length=paraphrase_len, padding='max_length', truncation=True, return_tensors="pt") # Determine bucket and pad the paraphrase paraphrase_padded, target_bucket = get_bucket_and_pad_paraphrase(paraphrase_cropped) # Use the corresponding bucket to run inference output = buckets[target_bucket](*paraphrase_padded) ================================================ FILE: about-neuron/appnotes/torch-neuron/index.rst ================================================ .. _torch-neuron-appnotes: PyTorch Neuron Application Notes ================================= .. toctree:: :maxdepth: 1 :hidden: bucketing-app-note rcnn-app-note torch-neuron-dataparallel-app-note This section contains application notes specific to PyTorch Neuron (``torch-neuron``) for ``Inf1`` instances. These guides cover advanced optimization techniques, implementation patterns, and best practices for deploying PyTorch models on AWS Inferentia. Application Notes ----------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: bucketing-app-note :link-type: doc **Dynamic Batching with Bucketing** ^^^ Optimize inference performance using dynamic batching and bucketing strategies .. grid-item-card:: :link: rcnn-app-note :link-type: doc **R-CNN Implementation Guide** ^^^ Comprehensive guide for implementing and optimizing R-CNN models on Inferentia .. grid-item-card:: :link: torch-neuron-dataparallel-app-note :link-type: doc **Data Parallel Inference** ^^^ Scale inference workloads using ``torch.neuron.DataParallel`` for multi-core execution ================================================ FILE: about-neuron/appnotes/torch-neuron/rcnn-app-note.rst ================================================ .. _torch-neuron-r-cnn-app-note: Running R-CNNs on Inf1 ====================== This application note demonstrates how to compile and run `Detectron2 `__-based R-CNNs on Inf1. It also provides guidance on how to use profiling to improve performance of R-CNN models on Inf1. .. contents:: Table of contents :local: R-CNN Model Overview -------------------- Region-based CNN (R-CNN) models are commonly used for object detection and image segmentation tasks. A typical R-CNN architecture consists of the following components: - **Backbone:** The backbone extracts features from input images. In some models the backbone is a Feature Pyramid Network (FPN), which uses a top-down architecture with lateral connections to build an in-network feature pyramid from a single-scale input. The backbone is commonly a ResNet or Vision Transformer based network. - **Region Proposal Network (RPN):** The RPN predicts region proposals with a wide range of scales and aspect ratios. RPNs are constructed using convolutional layers and anchor boxes, which that serve as references for multiple scales and aspect ratios. - **Region of Interest (RoI):** The RoI component is used to resize the extracted features of varying size to the same size so that they can be consumed by a fully connected layer. RoI Align is typically used instead of RoI Pooling, because RoI Align provides better alignment. The `Detectron2 `__ library provides many popular PyTorch R-CNN implementations, including R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN. This application note focuses on the Detectron2 R-CNN models. R-CNN Limitations and Considerations on Inferentia (NeuronCore-v1) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ R-CNN models may have limitations and considerations on Inferentia (NeuronCore-v1). See the Model Architecture Fit Guidelines for more information. These limitations are not applicable to NeuronCore-v2. Requirements ------------ The process described in this application note is intended to be run on an ``inf1.2xlarge``. In practice, R-CNN models can be run on any Inf1 instance size. Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the `PyTorch Installation Guide `__. Select the kernel from the “Kernel -> Change Kernel” option at the top of the Jupyter notebook page. Installation ------------ This process requires the following pip packages: - ``torch==1.11.0`` - ``torch-neuron`` - ``neuron-cc`` - ``opencv-python`` - ``pycocotools`` - ``torchvision==0.12.0`` - ``detectron2==0.6`` The following section explains how to build ``torchvision`` from source and install the ``Detectron2`` package. It also reinstalls the Neuron packages, to ensure version compatibility. The ``torchvision`` ``roi_align_kernel.cpp`` kernel is modified to use OMP threading for a multi-threaded inference on the CPU. This significantly improves the performance of RoI Align kernels on Inf1: OMP threading leads to a RoI Align latency reduction two to three times larger than the default ``roi_align_kernel.cpp`` kernel configuration. .. code:: ipython3 # Install python3.7-dev for pycocotools (a Detectron2 dependency) !sudo apt install python3.7-dev -y # Install Neuron packages !pip uninstall -y torchvision !pip install --force-reinstall "protobuf==3.20.1" ninja opencv-python !pip install --force-reinstall torch-neuron==1.11.0.* neuron-cc[tensorflow] --extra-index-url https://pip.repos.neuron.amazonaws.com # Change cuda to 10.2 for Detectron2 !sudo rm /usr/local/cuda !sudo ln -s /usr/local/cuda-10.2 /usr/local/cuda # Install Torchvision 0.12.0 from source !git clone -b release/0.12 https://github.com/pytorch/vision.git # Update the RoI Align kernel to use OMP multithreading with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'r') as file: content = file.read() # Enable OMP Multithreading and set the number of threads to 4 old = "// #pragma omp parallel for num_threads(32)" new = "#pragma omp parallel for num_threads(4)" content = content.replace(old, new) # Re-write the file with open('vision/torchvision/csrc/ops/cpu/roi_align_kernel.cpp', 'w') as file: file.write(content) # Build Torchvision with OMP threading !cd vision && CFLAGS="-fopenmp" python setup.py bdist_wheel %pip install vision/dist/*.whl # Install Detectron2 release v0.6 !python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@v0.6' Compiling an R-CNN for Inf1 --------------------------- By default, R-CNN models are not compilable on Inf1, because they cannot be traced with ``torch.jit.trace``, which is a requisite for inference on Inf1. The following section demonstrates techniques for compiling a Detectron2 R-CNN model for inference on Inf1. Specifically, this section explains how to create a standard Detectron2 R-CNN model, using a ResNet-101 backbone. It demonstrates how to use profiling to identify the most compute-intensive parts of the R-CNN that need to be compiled for accelerated inference on Inf1. It then explains how to manually extract and compile the ResNet backbone (the dominant compute component) and inject the compiled backbone back into the full model, for improved performance. Create a Detectron2 R-CNN Model ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a Detectron2 R-CNN model using the ``COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml`` pretrained weights and config file. Download a sample image from the COCO dataset and run an example inference. .. code:: ipython3 from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg def get_model(): # Configure the R-CNN model CONFIG_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml" WEIGHTS_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml" cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file(CONFIG_FILE)) cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(WEIGHTS_FILE) cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 cfg.MODEL.DEVICE = 'cpu' # Send to CPU for Neuron Tracing # Create the R-CNN predictor wrapper predictor = DefaultPredictor(cfg) return predictor .. code:: ipython3 import os import urllib.request # Define a function to get a sample image def get_image(): filename = 'input.jpg' if not os.path.exists(filename): url = "http://images.cocodataset.org/val2017/000000439715.jpg" urllib.request.urlretrieve(url, filename) return filename .. code:: ipython3 import time import cv2 # Create an R-CNN model predictor = get_model() # Get a sample image from the COCO dataset image_filename = get_image() image = cv2.imread(image_filename) # Run inference and print inference latency start = time.time() outputs = predictor(image) print(f'Inference time: {(time.time() - start):0.3f} s') Profile the Model ~~~~~~~~~~~~~~~~~ Use the `PyTorch Profiler `__ to identify which operators contribute the most to the model’s runtime on CPU. Ideally, you can compile these compute intensive operators onto Inf1 for accelerated inference. .. code:: ipython3 import torch.autograd.profiler as profiler with profiler.profile(record_shapes=True) as prof: with profiler.record_function("model_inference"): predictor(image) print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=30)) We see that convolution operators (``aten::convolution``) contribute the most to inference time. By compiling these convolution operators to Inf1, you can improve performance of the R-CNN model. Print the R-CNN model architecture to see which layers contain the ``aten::convolution`` operators: .. code:: ipython3 print(predictor.model) Note that the ResNet FPN backbone (`predictor.model.backbone `__ L17-L162) contains the majority of convolution operators in the model. The RPN (`predictor.model.proposal_generator `__ L181-L533) also contains several convolutions. Based on this, compile the ResNet backbone and RPN onto Inf1 to maximize performance. Compiling the ResNet backbone to Inf1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section demonstrates how to compile the ResNet backbone to Inf1 and use it for inference. Eextract the backbone by accessing it with ``predictor.model.backbone``. Compile the backbone using ``strict=False``, because the backbone outputs a dictionary. Use a fixed input shape (``800 x 800``) for compilation, as all inputs will be resized to this shape during inference. This section also defines a basic preprocessing function (mostly derived from the Detectron2 R-CNN `DefaultPredictor `__ module L308-L318) that reshapes inputs to ``800 x 800``. Create a ``NeuronRCNN`` wrapper to inject the compiled backbone back into the model by dynamically replacing the ``predictor.model.backbone`` attribute with the compiled model. .. code:: ipython3 import torch import torch_neuron example = torch.rand([1, 3, 800, 800]) # Use `with torch.no_grad():` to avoid a jit tracing issue in the ResNet backbone with torch.no_grad(): neuron_backbone = torch_neuron.trace(predictor.model.backbone, example, strict=False) backbone_filename = 'backbone.pt' torch.jit.save(neuron_backbone, backbone_filename) .. code:: ipython3 from detectron2.modeling.meta_arch.rcnn import GeneralizedRCNN from torch.jit import ScriptModule class NeuronRCNN(torch.nn.Module): """ Creates a `NeuronRCNN` wrapper that injects the compiled backbone into the R-CNN model. It also stores the `size_divisibility` attribute from the original backbone. """ def __init__(self, model: GeneralizedRCNN, neuron_backbone: ScriptModule) -> None: super().__init__() # Keep track of the backbone variables size_divisibility = model.backbone.size_divisibility # Load and inject the compiled backbone model.backbone = neuron_backbone # Set backbone variables setattr(model.backbone, 'size_divisibility', size_divisibility) self.model = model def forward(self, x): return self.model(x) .. code:: ipython3 # Create the R-CNN with the compiled backbone neuron_rcnn = NeuronRCNN(predictor.model, neuron_backbone) neuron_rcnn.eval() # Print the R-CNN architecture to verify the backbone is now the # `neuron_backbone` (shows up as `RecursiveScriptModule`) print(neuron_rcnn) .. code:: ipython3 def preprocess(original_image, predictor): """ A basic preprocessing function that sets the input height=800 and input width=800. The function is derived from the preprocessing steps in the Detectron2 `DefaultPredictor` module. """ height, width = original_image.shape[:2] resize_func = predictor.aug.get_transform(original_image) resize_func.new_h = 800 # Override height resize_func.new_w = 800 # Override width image = resize_func.apply_image(original_image) image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1)) inputs = {"image": image, "height": height, "width": width} return inputs .. code:: ipython3 # Get a resized input using the sample image inputs = preprocess(image, get_model()) # Run inference and print inference latency start = time.time() for _ in range(10): outputs = neuron_rcnn([inputs])[0] print(f'Inference time: {((time.time() - start)/10):0.3f} s') .. code:: ipython3 with profiler.profile(record_shapes=True) as prof: with profiler.record_function("model_inference"): neuron_rcnn([inputs]) print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=30)) By running the backbone on Inf1, the overall runtime is already significantly improved. The count and runtime of ``aten::convolution`` operators is also decreased. We now see a ``neuron::forward_v2`` operator that is the compiled backbone. Optimize the R-CNN model ------------------------ Compiling the RPN ~~~~~~~~~~~~~~~~~ Examine the profiling and note that there are still several ``aten::convolution``, ``aten::linear``, and ``aten::addmm`` operators that significantly contribute to the model’s overall latency. By inspecting the model's architecture and code, we can determine that the majority of these operators are contained in the RPN module (`predictor.model.proposal_generator `__ L181-L533). To improve the model's performance, extract the RPN Head and compile it on Inf1 to increase the number of operators running on Inf1. You need to compile the RPN Head, because the RPN Anchor Generator contains objects that are not traceable with ``torch.jit.trace``. The RPN Head contains five layers that run inference on multiple resized inputs. To compile the RPN Head, create a list of tensors that contain the input (“``features``”) shapes used by RPN Head on each layer. These tensor shapes can be determined by printing the input shapes in the RPN Head ``forward`` function (``predictor.model.proposal_generator.rpn_head.forward``). Create a new ``NeuronRCNN`` wrapper that injects both the compiled backbone and RPN Head into the R-CNN model. .. code:: ipython3 import math input_shape = [1, 3, 800, 800] # Overall input shape at inference time # Create the list example of RPN inputs using the resizing logic from the RPN Head features = list() for i in [0, 1, 2, 3, 4]: ratio = 1 / (4 * 2**i) x_i_h = math.ceil(input_shape[2] * ratio) x_i_w = math.ceil(input_shape[3] * ratio) feature = torch.zeros(1, 256, x_i_h, x_i_w) features.append(feature) .. code:: ipython3 # Extract and compile the RPN Head neuron_rpn_head = torch_neuron.trace(predictor.model.proposal_generator.rpn_head, [features]) rpn_head_filename = 'rpn_head.pt' torch.jit.save(neuron_rpn_head, rpn_head_filename) .. code:: ipython3 class NeuronRCNN(torch.nn.Module): """ Creates a wrapper that injects the compiled backbone and RPN Head into the R-CNN model. """ def __init__(self, model: GeneralizedRCNN, neuron_backbone: ScriptModule, neuron_rpn_head: ScriptModule) -> None: super().__init__() # Keep track of the backbone variables size_divisibility = model.backbone.size_divisibility # Inject the compiled backbone model.backbone = neuron_backbone # Set backbone variables setattr(model.backbone, 'size_divisibility', size_divisibility) # Inject the compiled RPN Head model.proposal_generator.rpn_head = neuron_rpn_head self.model = model def forward(self, x): return self.model(x) .. code:: ipython3 # Create the R-CNN with the compiled backbone and RPN Head predictor = get_model() neuron_rcnn = NeuronRCNN(predictor.model, neuron_backbone, neuron_rpn_head) neuron_rcnn.eval() # Print the R-CNN architecture to verify the compiled modules show up print(neuron_rcnn) .. code:: ipython3 # Run inference and print inference latency start = time.time() for _ in range(10): outputs = neuron_rcnn([inputs])[0] print(f'Inference time: {((time.time() - start)/10):0.3f} s') .. code:: ipython3 with profiler.profile(record_shapes=True) as prof: with profiler.record_function("model_inference"): neuron_rcnn([inputs]) print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=30)) By running the compiled backbone and RPN Head on Inf1, overall runtime is improved. Once again, the number and runtime of ``aten::convolution`` operators is also decreased. There are now two ``neuron::forward_v2`` operators, which correspond to the compiled backbone and RPN Head. Fusing the Backbone and RPN Head ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is usually preferable to compile fewer independent models (“subgraphs”) on Inf1. Combining models and compiling them as a single subgraph enables the Neuron compiler to perform additional optimizations and reduces I/O data transfer between CPU and NeuronCores between each subgraph. In this section, the ResNet backbone and RPN Head are "fused" into a single model to compile on Inf1. Create the ``NeuronFusedBackboneRPNHead`` wrapper as a compilable model that contains both the ResNet backbone (`predictor.model.backbone `__ L17-L162) and RPN Head (`predictor.model.proposal_generator `__ L181-L533). Output the ``features`` to be used downstream by the RoI Heads. Compile this ``NeuronFusedBackboneRPNHead`` wrapper as ``neuron_backbone_rpn``, then create a separate ``BackboneRPN`` wrapper to inject the ``neuron_backbone_rpn`` in place of the original backbone and RPN Head. Copy the remainder of the RPN ``forward`` code (`predictor.model.proposal_generator.forward `__ L431-L480) to create a “fused” backbone + RPN module. Lastly, re-write the ``NeuronRCNN`` wrapper to use the fused backbone + RPN module. The ``NeuronRCNN`` wrapper also uses the ``predictor.model`` ``forward`` code to re-write the rest of the R-CNN model forward function. .. code:: ipython3 class NeuronFusedBackboneRPNHead(torch.nn.Module): """ Wrapper to compile the fused ResNet backbone and RPN Head. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.backbone = model.backbone self.rpn_head = model.proposal_generator.rpn_head self.in_features = model.proposal_generator.in_features def forward(self, x): features = self.backbone(x) features_ = [features[f] for f in self.in_features] return self.rpn_head(features_), features .. code:: ipython3 # Create the wrapper with the combined backbone and RPN Head predictor = get_model() backbone_rpn_wrapper = NeuronFusedBackboneRPNHead(predictor.model) backbone_rpn_wrapper.eval() # Compile the wrapper example = torch.rand([1, 3, 800, 800]) with torch.no_grad(): neuron_backbone_rpn_head = torch_neuron.trace( backbone_rpn_wrapper, example, strict=False) backbone_rpn_filename = 'backbone_rpn.pt' torch.jit.save(neuron_backbone_rpn_head, backbone_rpn_filename) .. code:: ipython3 class BackboneRPN(torch.nn.Module): """ Wrapper that uses the compiled `neuron_backbone_rpn` instead of the original backbone and RPN Head. We copy the remainder of the RPN `forward` code (`predictor.model.proposal_generator.forward`) to create a "fused" backbone + RPN module. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.backbone_rpn_head = NeuronFusedBackboneRPNHead(model) self._rpn = model.proposal_generator self.in_features = model.proposal_generator.in_features def forward(self, images): preds, features = self.backbone_rpn_head(images.tensor) features_ = [features[f] for f in self.in_features] pred_objectness_logits, pred_anchor_deltas = preds anchors = self._rpn.anchor_generator(features_) # Transpose the Hi*Wi*A dimension to the middle: pred_objectness_logits = [ # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A) score.permute(0, 2, 3, 1).flatten(1) for score in pred_objectness_logits ] pred_anchor_deltas = [ # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B) x.view(x.shape[0], -1, self._rpn.anchor_generator.box_dim, x.shape[-2], x.shape[-1]) .permute(0, 3, 4, 1, 2) .flatten(1, -2) for x in pred_anchor_deltas ] proposals = self._rpn.predict_proposals( anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes ) return proposals, features .. code:: ipython3 class NeuronRCNN(torch.nn.Module): """ Wrapper that uses the fused backbone + RPN module and re-writes the rest of the R-CNN `model` `forward` function. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() # Use the fused Backbone + RPN self.backbone_rpn = BackboneRPN(model) self.roi_heads = model.roi_heads self.preprocess_image = model.preprocess_image self._postprocess = model._postprocess def forward(self, batched_inputs): images = self.preprocess_image(batched_inputs) proposals, features = self.backbone_rpn(images) results, _ = self.roi_heads(images, features, proposals, None) return self._postprocess(results, batched_inputs, images.image_sizes) .. code:: ipython3 # Create the new NeuronRCNN wrapper with the combined backbone and RPN Head predictor = get_model() neuron_rcnn = NeuronRCNN(predictor.model) neuron_rcnn.eval() # Inject the Neuron compiled models neuron_rcnn.backbone_rpn.backbone_rpn_head = neuron_backbone_rpn_head # Print the R-CNN architecture to verify the compiled modules show up print(neuron_rcnn) .. code:: ipython3 # Run inference and print inference latency start = time.time() for _ in range(10): outputs = neuron_rcnn([inputs])[0] print(f'Inference time: {((time.time() - start)/10):0.3f} s') .. code:: ipython3 with profiler.profile(record_shapes=True) as prof: with profiler.record_function("model_inference"): neuron_rcnn([inputs]) print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=30)) By running the fused backbone + RPN Head on Inf1, overall runtime is improved even more. We now see a single ``neuron::forward_v2`` operator with a lower runtime than the previous combined runtime of the two separate ``neuron::forward_v2`` operators. Compiling the RoI Heads ~~~~~~~~~~~~~~~~~~~~~~~ This section describes how to extract and compile part of RoI Heads module (`predictor.model.roi_heads `__ L530-L778) which runs most of the remaining ``aten::linear`` and ``aten::addmm`` operators on Inf1. The entire RoI Heads module cannot be extracted, because it contains unsupported operators. So you need to create a ``NeuronBoxHeadBoxPredictor`` wrapper, extracts specific parts of the ``roi_heads`` for compilation. The example input for compilation is the shape of the input into the ``self.roi_heads.box_head.forward`` function. Write another wrapper, ``ROIHead`` that combines the compiled ``roi_heads`` into the rest of the RoI module. The ``_forward_box`` and ``forward`` functions are from the ``predictor.model.roi_heads`` module. Lastly, re-write the ``NeuronRCNN`` wrapper to use the optimized RoI Heads wrapper as well as the fused backbone + RPN module. .. code:: ipython3 class NeuronBoxHeadBoxPredictor(torch.nn.Module): """ Wrapper that extracts the RoI Box Head and Box Predictor for compilation. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.roi_heads = model.roi_heads def forward(self, box_features): box_features = self.roi_heads.box_head(box_features) predictions = self.roi_heads.box_predictor(box_features) return predictions .. code:: ipython3 # Create the NeuronBoxHeadBoxPredictor wrapper predictor = get_model() box_head_predictor = NeuronBoxHeadBoxPredictor(predictor.model) box_head_predictor.eval() # Compile the wrapper example = torch.rand([1000, 256, 7, 7]) neuron_box_head_predictor = torch_neuron.trace(box_head_predictor, example) roi_head_filename = 'box_head_predictor.pt' torch.jit.save(neuron_box_head_predictor, roi_head_filename) .. code:: ipython3 class ROIHead(torch.nn.Module): """ Wrapper that combines the compiled `roi_heads` into the rest of the RoI module. The `_forward_box` and `forward` functions are from the `predictor.model.roi_heads` module. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.roi_heads = model.roi_heads self.neuron_box_head_predictor = NeuronBoxHeadBoxPredictor(model) def _forward_box(self, features, proposals): features = [features[f] for f in self.roi_heads.box_in_features] box_features = self.roi_heads.box_pooler( features, [x.proposal_boxes for x in proposals]) predictions = self.neuron_box_head_predictor(box_features) pred_instances, _ = self.roi_heads.box_predictor.inference( predictions, proposals) return pred_instances def forward(self, images, features, proposals, targets=None): pred_instances = self._forward_box(features, proposals) pred_instances = self.roi_heads.forward_with_given_boxes( features, pred_instances) return pred_instances, {} .. code:: ipython3 class NeuronRCNN(torch.nn.Module): """ Wrapper that uses the fused backbone + RPN module and the optimized RoI Heads wrapper """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() # Create fused Backbone + RPN self.backbone_rpn = BackboneRPN(model) # Create Neuron RoI Head self.roi_heads = ROIHead(model) # Define pre and post-processing functions self.preprocess_image = model.preprocess_image self._postprocess = model._postprocess def forward(self, batched_inputs): images = self.preprocess_image(batched_inputs) proposals, features = self.backbone_rpn(images) results, _ = self.roi_heads(images, features, proposals, None) return self._postprocess(results, batched_inputs, images.image_sizes) .. code:: ipython3 # Initialize an R-CNN on CPU predictor = get_model() # Create the Neuron R-CNN on CPU neuron_rcnn = NeuronRCNN(predictor.model) neuron_rcnn.eval() # Inject the Neuron compiled models neuron_rcnn.backbone_rpn.backbone_rpn_head = neuron_backbone_rpn_head neuron_rcnn.roi_heads.neuron_box_head_predictor = neuron_box_head_predictor .. code:: ipython3 # Run inference and print inference latency start = time.time() for _ in range(10): outputs = neuron_rcnn([inputs])[0] print(f'CPU Inference time: {((time.time() - start)/10):0.3f} s') .. code:: ipython3 with profiler.profile(record_shapes=True) as prof: with profiler.record_function("model_inference"): neuron_rcnn([inputs]) print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=30)) Although the overall latency did not change significantly, running more of the model on Inf1 instead of CPU frees up CPU resources when multiple models are running in parallel. End-to-end Compilation and Inference ------------------------------------ This section provides standalone code that compiles and runs an optimized Detectron2 R-CNN on Inf1. Most of the code in this section is from the previous sections in this application note and is consolidated here for easy deployment. This section has the following main components: - Preprocessing and compilation functions - Wrappers that extract the R-CNN ResNet backbone, RPN Head, and RoI Head for compilation on Inf1. - A ``NeuronRCNN`` wrapper that creates an optimized end-to-end Detectron2 R-CNN model for inference on Inf1 - Benchmarking code that runs parallelized inference for optimized throughput on Inf1 Benchmarking ~~~~~~~~~~~~ The benchmarking section explains how to load multiple optimized RCNN models and run them in parallel, to maximize throughput. Use the beta NeuronCore placement API, ``torch_neuron.experimental.neuron_cores_context()``, to ensure all compiled models in an optimized RCNN model are loaded onto the same NeuronCore. Note that the functionality and API of ``torch_neuron.experimental.neuron_cores_context()`` might change in future releases. Define a simple benchmark function that loads four optimized RCNN models onto four separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput. Benchmark various numbers of loaded models, to show the impact of parallelism. Note that throughput increases (at the cost of latency) when more models are run in parallel on Inf1. Increasing the number of worker threads also improves throughput. Other improvements ~~~~~~~~~~~~~~~~~~ There are many additional optimizations that can be applied to RCNN models on Inf1 depending on the application: For latency sensitive applications: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Each of the five layers in the RPN head can be parallelized to decrease overall latency. - The number of OMP Threads can be increased in the ROI Align kernel. Both of these optimizations improve latency, at the cost of decreasing throughput. For throughput sensitive applications: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - The input batch size can be increased to improve NeuronCore utilization. .. code:: ipython3 import time import os import urllib.request from typing import Any, Union, Callable import cv2 import numpy as np from concurrent.futures import ThreadPoolExecutor import torch import torch_neuron from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.modeling.meta_arch.rcnn import GeneralizedRCNN # ----------------------------------------------------------------------------- # Helper functions # ----------------------------------------------------------------------------- def get_model(): # Configure the R-CNN model CONFIG_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml" WEIGHTS_FILE = "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml" cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file(CONFIG_FILE)) cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(WEIGHTS_FILE) cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 cfg.MODEL.DEVICE = 'cpu' # Send to CPU for Neuron Tracing # Create the R-CNN predictor wrapper predictor = DefaultPredictor(cfg) return predictor def get_image(): # Get a sample image filename = 'input.jpg' if not os.path.exists(filename): url = "http://images.cocodataset.org/val2017/000000439715.jpg" urllib.request.urlretrieve(url, filename) return filename def preprocess(original_image, predictor): """ A basic preprocessing function that sets the input height=800 and input width=800. The function is derived from the preprocessing steps in the Detectron2 `DefaultPredictor` module. """ height, width = original_image.shape[:2] resize_func = predictor.aug.get_transform(original_image) resize_func.new_h = 800 # Override height resize_func.new_w = 800 # Override width image = resize_func.apply_image(original_image) image = torch.as_tensor(image.astype("float32").transpose(2, 0, 1)) inputs = {"image": image, "height": height, "width": width} return inputs # ----------------------------------------------------------------------------- # Neuron modules # ----------------------------------------------------------------------------- class NeuronFusedBackboneRPNHead(torch.nn.Module): """ Wrapper to compile the fused ResNet backbone and RPN Head. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.backbone = model.backbone self.rpn_head = model.proposal_generator.rpn_head self.in_features = model.proposal_generator.in_features def forward(self, x): features = self.backbone(x) features_ = [features[f] for f in self.in_features] return self.rpn_head(features_), features class BackboneRPN(torch.nn.Module): """ Wrapper that uses the compiled `neuron_backbone_rpn` instead of the original backbone and RPN Head. We copy the remainder of the RPN `forward` code (`predictor.model.proposal_generator.forward`) to create a "fused" backbone + RPN module. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.backbone_rpn_head = NeuronFusedBackboneRPNHead(model) self._rpn = model.proposal_generator self.in_features = model.proposal_generator.in_features def forward(self, images): preds, features = self.backbone_rpn_head(images.tensor) features_ = [features[f] for f in self.in_features] pred_objectness_logits, pred_anchor_deltas = preds anchors = self._rpn.anchor_generator(features_) # Transpose the Hi*Wi*A dimension to the middle: pred_objectness_logits = [ # (N, A, Hi, Wi) -> (N, Hi, Wi, A) -> (N, Hi*Wi*A) score.permute(0, 2, 3, 1).flatten(1) for score in pred_objectness_logits ] pred_anchor_deltas = [ # (N, A*B, Hi, Wi) -> (N, A, B, Hi, Wi) -> (N, Hi, Wi, A, B) -> (N, Hi*Wi*A, B) x.view(x.shape[0], -1, self._rpn.anchor_generator.box_dim, x.shape[-2], x.shape[-1]) .permute(0, 3, 4, 1, 2) .flatten(1, -2) for x in pred_anchor_deltas ] proposals = self._rpn.predict_proposals( anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes ) return proposals, features class NeuronBoxHeadBoxPredictor(torch.nn.Module): """ Wrapper that extracts the RoI Box Head and Box Predictor for compilation. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.roi_heads = model.roi_heads def forward(self, box_features): box_features = self.roi_heads.box_head(box_features) predictions = self.roi_heads.box_predictor(box_features) return predictions class ROIHead(torch.nn.Module): """ Wrapper that combines the compiled `roi_heads` into the rest of the RoI module. The `_forward_box` and `forward` functions are from the `predictor.model.roi_heads` module. """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() self.roi_heads = model.roi_heads self.neuron_box_head_predictor = NeuronBoxHeadBoxPredictor(model) def _forward_box(self, features, proposals): features = [features[f] for f in self.roi_heads.box_in_features] box_features = self.roi_heads.box_pooler( features, [x.proposal_boxes for x in proposals]) predictions = self.neuron_box_head_predictor(box_features) pred_instances, _ = self.roi_heads.box_predictor.inference( predictions, proposals) return pred_instances def forward(self, images, features, proposals, targets=None): pred_instances = self._forward_box(features, proposals) pred_instances = self.roi_heads.forward_with_given_boxes( features, pred_instances) return pred_instances, {} class NeuronRCNN(torch.nn.Module): """ Wrapper that uses the fused backbone + RPN module and the optimized RoI Heads wrapper """ def __init__(self, model: GeneralizedRCNN) -> None: super().__init__() # Create fused Backbone + RPN self.backbone_rpn = BackboneRPN(model) # Create Neuron RoI Head self.roi_heads = ROIHead(model) # Define pre and post-processing functions self.preprocess_image = model.preprocess_image self._postprocess = model._postprocess def forward(self, batched_inputs): images = self.preprocess_image(batched_inputs) proposals, features = self.backbone_rpn(images) results, _ = self.roi_heads(images, features, proposals, None) return self._postprocess(results, batched_inputs, images.image_sizes) # ----------------------------------------------------------------------------- # Compilation functions # ----------------------------------------------------------------------------- def compile( model: Union[Callable, torch.nn.Module], example_inputs: Any, filename: str, **kwargs ) -> torch.nn.Module: """ Compiles the model for Inf1 if it doesn't already exist and saves it as the provided filename. model: A module or function which defines a torch model or computation. example_inputs: An example set of inputs which will be passed to the `model` during compilation. filename: Name of the compiled model kwargs: Extra `torch_neuron.trace` kwargs """ if not os.path.exists(filename): with torch.no_grad(): compiled_model = torch_neuron.trace(model, example_inputs, **kwargs) torch.jit.save(compiled_model, filename) # ----------------------------------------------------------------------------- # Benchmarking function # ----------------------------------------------------------------------------- def benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=4, batch_size=1, n_threads=4, iterations=200): """ A simple benchmarking function that loads `n_models` optimized models onto separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput. """ # Load models models = list() for i in range(n_models): with torch_neuron.experimental.neuron_cores_context(i): # Create the RCNN with the fused backbone + RPN Head and compiled RoI Heads # Initialize an R-CNN on CPU predictor = get_model() # Create the Neuron R-CNN on CPU neuron_rcnn = NeuronRCNN(predictor.model) neuron_rcnn.eval() # Inject the Neuron compiled models neuron_rcnn.backbone_rpn.backbone_rpn_head = torch.jit.load(backbone_rpn_filename) neuron_rcnn.roi_heads.neuron_box_head_predictor = torch.jit.load(roi_head_filename) models.append(neuron_rcnn) # Warmup for _ in range(8): for model in models: model([inputs]) latencies = [] # Thread task def task(i): start = time.time() models[i]([inputs]) finish = time.time() latencies.append((finish - start) * 1000) begin = time.time() with ThreadPoolExecutor(max_workers=n_threads) as pool: for i in range(iterations): pool.submit(task, i % n_models) end = time.time() # Compute metrics boundaries = [50, 95, 99] names = [f'Latency P{i} (ms)' for i in boundaries] percentiles = np.percentile(latencies, boundaries) duration = end - begin # Display metrics results = { 'Samples': iterations, 'Batch Size': batch_size, 'Models': n_models, 'Threads': n_threads, 'Duration (s)': end - begin, 'Throughput (inf/s)': (batch_size * iterations) / duration, **dict(zip(names, percentiles)), } print('-' * 80) pad = max(map(len, results)) for key, value in results.items(): if isinstance(value, float): print(f'{key + ":" :<{pad + 1}} {value:0.3f}') else: print(f'{key + ":" :<{pad + 1}} {value}') print() if __name__ == "__main__": # Create and compile the combined backbone and RPN Head wrapper backbone_rpn_filename = 'backbone_rpn.pt' predictor = get_model() backbone_rpn_wrapper = NeuronFusedBackboneRPNHead(predictor.model) backbone_rpn_wrapper.eval() example = torch.rand([1, 3, 800, 800]) compile(backbone_rpn_wrapper, example, backbone_rpn_filename, strict=False) # Create and compile the RoI Head wrapper roi_head_filename = 'box_head_predictor.pt' predictor = get_model() box_head_predictor = NeuronBoxHeadBoxPredictor(predictor.model) box_head_predictor.eval() example = torch.rand([1000, 256, 7, 7]) compile(box_head_predictor, example, roi_head_filename) # Download a sample image from the COCO dataset and read it image_filename = get_image() image = cv2.imread(image_filename) inputs = preprocess(image, get_model()) # Benchmark the Neuron R-CNN model for various numbers of loaded models benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=1, n_threads=1) benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=1, n_threads=2) benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=2, n_threads=2) benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=2, n_threads=4) benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=4, n_threads=4) benchmark(backbone_rpn_filename, roi_head_filename, inputs, n_models=4, n_threads=8) ================================================ FILE: about-neuron/appnotes/torch-neuron/torch-neuron-dataparallel-app-note.rst ================================================ .. _torch-neuron-dataparallel-app-note: Data Parallel Inference on Torch Neuron ======================================= .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ This guide introduces :func:`torch.neuron.DataParallel`, a Python API that implements data parallelism on :class:`~torch.jit.ScriptModule` models created by the :doc:`Trace API `. The following sections explain how data parallelism can improve the performance of inference workloads on Inferentia, including how :func:`torch.neuron.DataParallel` uses dynamic batching to run inference on variable input sizes. It covers an overview of the :func:`torch.neuron.DataParallel` module and provides a few :ref:`example data parallel applications `. Data parallel inference ------------------------- Data Parallelism is a form of parallelization across multiple devices or cores, referred to as nodes. Each node contains the same model and parameters, but data is distributed across the different nodes. By distributing the data across multiple nodes, data parallelism reduces the total execution time of large batch size inputs compared to sequential execution. Data parallelism works best for smaller models in latency sensitive applications that have large batch size requirements. torch.neuron.DataParallel ------------------------- To fully leverage the Inferentia hardware, we want to use all available NeuronCores. An inf1.xlarge and inf1.2xlarge have four NeuronCores, an inf1.6xlarge has 16 NeuronCores, and an inf1.24xlarge has 64 NeuronCores. For maximum performance on Inferentia hardware, we can use :func:`torch.neuron.DataParallel` to utilize all available NeuronCores. :func:`torch.neuron.DataParallel` implements data parallelism at the module level by replicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference. This function is analogous to :class:`~torch.nn.DataParallel` in PyTorch. :func:`torch.neuron.DataParallel` requires PyTorch >= 1.8. The following sections provide an overview of some of the features of :func:`torch.neuron.DataParallel` that enable maximum performance on Inferentia. NeuronCore selection ^^^^^^^^^^^^^^^^^^^^ By default, DataParallel will try to use all NeuronCores allocated to the current process to fully saturate the Inferentia hardware for maximum performance. It is more efficient to make the batch dimension divisible by the number of NeuronCores. This will ensure that NeuronCores are not left idle during parallel inference and the Inferentia hardware is fully utilized. In some applications, it is advantageous to use a subset of the available NeuronCores for DataParallel inference. DataParallel has a ``device_ids`` argument that accepts a list of :obj:`int` or ``'nc:#'`` that specify the NeuronCores to use for parallelization. See :ref:`Specifying NeuronCores ` for an example of how to use ``device_ids`` argument. Batch dim ^^^^^^^^^ DataParallel accepts a ``dim`` argument that denotes the batch dimension used to split the input data for distributed inference. By default, DataParalell splits the inputs on ``dim = 0`` if the ``dim`` argument is not specified. For applications with a non-zero batch dim, the ``dim`` argument can be used to specify the inference-time input batch dimension. :ref:`DataParallel with dim ! = 0 ` provides an example of data parallel inference on inputs with batch dim = 2. .. _dynamic_batching_description: Dynamic batching ^^^^^^^^^^^^^^^^ Batch size has a direct impact on model performance. The Inferentia chip is optimized to run with small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes. As a general best practice, we recommend optimizing your model's throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Inferentia. Dynamic batching is a feature that allows you to use tensor batch sizes that the Neuron model was not originally compiled against. This is necessary because the underlying Inferentia hardware will always execute inferences with the batch size used during compilation. Fixed batch size execution allows tuning the input batch size for optimal performance. For example, batch size 1 may be best suited for an ultra-low latency on-demand inference application, while batch size > 1 can be used to maximize throughput for offline inferencing. Dynamic batching is implemented by slicing large input tensors into chunks that match the batch size used during the :func:`torch_neuron.trace` compilation call. The :func:`torch.neuron.DataParallel` class automatically enables dynamic batching on eligible models. This allows us to run inference in applications that have inputs with a variable batch size without needing to recompile the model. See :ref:`Dynamic batching ` for an example of how DataParallel can be used to run inference on inputs with a dynamic batch size without needing to recompile the model. Dynamic batching using small batch sizes can result in sub-optimal throughput because it involves slicing tensors into chunks and iteratively sending data to the hardware. Using a larger batch size at compilation time can use the Inferentia hardware more efficiently in order to maximize throughput. You can test the tradeoff between individual request latency and total throughput by fine-tuning the input batch size. Automatic batching in the DataParallel module can be disabled using the ``disable_dynamic_batching()`` function as follows: .. code-block:: python >>> model_parallel = torch.neuron.DataParallel(model_neuron) >>> model_parallel.disable_dynamic_batching() If dynamic batching is disabled, the compile-time batch size must be equal to the inference-time batch size divided by the number of NeuronCores. :ref:`DataParallel with dim != 0 ` and :ref:`Dynamic batching disabled ` provide examples of running DataParallel inference with dynamic batching disabled. Performance optimizations ^^^^^^^^^^^^^^^^^^^^^^^^^ The DataParallel module has a ``num_workers`` attribute that can be used to specify the number of worker threads used for multithreaded inference. By default, ``num_workers = 2 * number of NeuronCores``. This value can be fine tuned to optimize DataParallel performance. DataParallel has a ``split_size`` attribute that dictates the size of the input chunks that are distributed to each NeuronCore. By default, ``split_size = max(1, input.shape[dim] // number of NeuronCores)``. This value can be modified to optimally match the inference input chunk size with the compile-time batch size. .. _data_paraellel_examples: Examples -------- The following sections provide example usages of the :func:`torch.neuron.DataParallel` module. .. _dataparallel_example_default: Default usage ^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-default.rst .. _dataparallel_example_specify_ncs: Specifying NeuronCores ^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst .. _dataparallel_example_dim_neq_zero: DataParallel with dim != 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst .. _dataparallel_example_dynamic_batching: Dynamic batching ^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst .. _dataparallel_example_disable_dynamic_batching: Dynamic batching disabled ^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst Full tutorial with torch.neuron.DataParallel ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For an end-to-end tutorial that uses DataParallel, see the :ref:`PyTorch Resnet Tutorial `. ================================================ FILE: about-neuron/appnotes/torch-neuronx/index.rst ================================================ .. _torch-neuronx-appnotes: PyTorch NeuronX Application Notes ================================== .. toctree:: :maxdepth: 1 :hidden: introducing-pytorch-2-6 introducing-pytorch-2-7 introducing-pytorch-2-8 introducing-pytorch-2-9 introducing-pytorch-2-x migration-from-xla-downcast-bf16 torch-neuronx-dataparallel-app-note torch-neuronx-graph-partitioner-app-note This section contains application notes specific to PyTorch NeuronX (``torch-neuronx``) for ``Trn1`` and ``Inf2`` instances. These guides cover PyTorch version migrations, advanced features, optimization techniques, and best practices for training and inference on AWS Trainium and Inferentia2. PyTorch Version Support ----------------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: introducing-pytorch-2-9 :link-type: doc **PyTorch 2.9 Support** ^^^ New features and migration guide for PyTorch 2.9 on Neuron .. grid-item-card:: :link: introducing-pytorch-2-8 :link-type: doc **PyTorch 2.8 Support** ^^^ New features and migration guide for PyTorch 2.8 on Neuron .. grid-item-card:: :link: introducing-pytorch-2-7 :link-type: doc **PyTorch 2.7 Support** ^^^ Features and improvements introduced with PyTorch 2.7 support .. grid-item-card:: :link: introducing-pytorch-2-x :link-type: doc **PyTorch 2.x Overview** ^^^ General guide to PyTorch 2.x series support and features Advanced Features ----------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: torch-neuronx-graph-partitioner-app-note :link-type: doc **Graph Partitioner** ^^^ Advanced graph partitioning strategies for distributed training and inference .. grid-item-card:: :link: torch-neuronx-dataparallel-app-note :link-type: doc **Data Parallel Inference** ^^^ Scale inference workloads using ``torch.neuronx.DataParallel`` for multi-core execution .. grid-item-card:: :link: migration-from-xla-downcast-bf16 :link-type: doc **XLA Migration Guide** ^^^ Migrate from deprecated XLA environment variables to PyTorch mixed-precision options ================================================ FILE: about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-6.rst ================================================ .. _introduce-pytorch-2-6: Introducing PyTorch 2.6 Support =============================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the :ref:`Neuron 2.23 ` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.6. :ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.6 for Amazon Linux 2023 and Ubuntu 22.04. Note that PyTorch NeuronX 2.6 is supported on Python 3.9, 3.10, and 3.11. Review :ref:`migration guide ` for possible changes to training scripts. No code changes are required for inference scripts. .. _how-pytorch-2.6-different: How is PyTorch NeuronX 2.6 different compared to PyTorch NeuronX 2.5? --------------------------------------------------------------------- PyTorch NeuronX 2.6 uses Torch-XLA 2.6 which has improved support for Automatic Mixed Precision and buffer aliasing. Additionally: * Reintroduced ``XLA_USE_32BIT_LONG`` to give customers the flexibility to use INT32 for their workloads. This flag was removed in v2.5. * Added xm.xla_device_kind() to return the XLA device kind string ('NC_v2' for Trainium1, 'NC_v3' and 'NC_v3d' for Trainium2). See :ref:`logical-neuroncore-config` for more info. See `Torch-XLA 2.6 release `__ for a full list. See :ref:`migrate_to_pytorch_2.6` for changes needed to use PyTorch NeuronX 2.6. .. note:: GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release. .. _install_pytorch_neuron_2.6: How can I install PyTorch NeuronX 2.6? -------------------------------------------- To install PyTorch NeuronX 2.6, follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide ` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.6 that you can use to get started. PyTorch NeuronX 2.6 can be installed using the following: .. code:: python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.6.* torchvision .. note:: PyTorch NeuronX 2.6 is currently available for Python 3.9, 3.10, 3.11. .. _migrate_to_pytorch_2.6: Migrate your application to PyTorch 2.6 --------------------------------------- First, install the PyTorch NeuronX 2.6 as described above in :ref:`installation guide ` Migrating training scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^ To migrate the training scripts from PyTorch NeuronX 2.5 to PyTorch NeuronX 2.6, implement the following changes: .. note:: ``xm`` below refers to ``torch_xla.core.xla_model``, ``xr`` refers to ``torch_xla.runtime``, and ``xmp`` refers to ``torch_xla.distributed.xla_multiprocessing`` * The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used) and will be removed in an upcoming release. Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`) * The functions ``xm.xrt_world_size()``, ``xm.get_ordinal()``, and ``xm.get_local_ordinal()`` are deprecated (warnings are shown when used). Switch to ``xr.world_size()``, ``xr.global_ordinal()``, and ``xr.local_ordinal()`` respectively as replacements. * The default behavior of ``torch.load`` parameter ``weights_only`` is changed from ``False`` to ``True``. Setting ``weights_only`` to ``True`` may cause issues with pickling custom objects. * If using ``xmp.spawn``, the ``nprocs`` argument is limited to 1 or None since v2.1. Previously, passing a value > 1 would result in a warning. In torch-xla 2.6, passing a value > 1 will result in an error with an actionable message to use ``NEURON_NUM_DEVICES`` to set the number of NeuronCores to use. See :ref:`v2.5 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.1. Migrating inference scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the inference scripts. Troubleshooting and Known Issues -------------------------------- Tensor split on second dimension of 2D array not working ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``. Lower BERT pretraining performance with torch-neuronx 2.6 compared to torch-neuronx 2.5 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is ~10% lower with torch-neuronx 2.6 compared to torch-neuronx 2.5. This is due to a known regression in the torch-xla library https://github.com/pytorch/xla/issues/9037 and may affect other models with high graph tracing overhead. To work around this issue, build the ``r2.6_aws_neuron`` branch of torch-xla as follows (see :ref:`pytorch-neuronx-install-cxx11` for C++11 ABI version): .. code:: bash # Setup build env (make sure you are in a python virtual env). Replace "apt" with "yum" on AL2023. sudo apt install cmake pip install yapf==0.30.0 wget https://github.com/bazelbuild/bazelisk/releases/download/v1.20.0/bazelisk-linux-amd64 sudo cp bazelisk-linux-amd64 /usr/local/bin/bazel # Clone repos git clone --recursive https://github.com/pytorch/pytorch --branch v2.6.0 cd pytorch/ git clone --recursive https://github.com/pytorch/xla.git --branch r2.6_aws_neuron _GLIBCXX_USE_CXX11_ABI=0 python setup.py bdist_wheel # The pip wheel will be present in ./dist cd xla/ CXX_ABI=0 python setup.py bdist_wheel # The pip wheel will be present in ./dist and can be installed instead of the torch-xla released in pypi.org Lower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 and 2.6 although there will be end-of-support warnings (as noted below). Warning "XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) WARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a warning that ``torch_xla.core.xla_model.xrt_world_size()`` will be removed in a future release. Switch to using ``torch_xla.runtime.world_size`` instead. WARNING:torch_xla.core.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a warning that ``torch_xla.core.xla_model.get_ordinal()`` will be removed in a future release. Switch to using ``torch_xla.runtime.global_ordinal`` instead. WARNING:torch_xla.core.xla_model.get_local_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.local_ordinal instead. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: ``torch_xla.core.xla_model.get_local_ordinal()`` will be removed in a future release. Use ``torch_xla.runtime.local_ordinal`` instead. Socket Error: Socket failed to bind ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.6, there must be a socket available for both torchrun and the ``init_process_group`` to bind. By default, both will be set to use unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur if the port you set it to is already in use. .. code:: [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:2.600 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use). To resolve the issue, ensure you are setting ``MASTER_PORT`` to a port value that is not used anywhere else in your scripts. Otherwise, you can leave ``MASTER_PORT`` unset and torchrun will set the default port for you. ``AttributeError: module 'torch' has no attribute 'xla'`` Failure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.6, training scripts might fail during activation checkpointing with the error shown below. .. code:: AttributeError: module 'torch' has no attribute 'xla' The solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing. Refer to the pytorch/xla discussion regarding this `issue `_. Also set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error. For more details on checkpointing, refer the `documentation `_. Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue `_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers. ``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch-xla version 2.6+ now requires ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see ``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context): .. code:: sudo dnf install libxcrypt-compat ``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.6, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed. .. code:: FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path' Check that the version of ``libneuronxla`` that support PyTorch NeuronX 2.6 is ``2.2.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide ` ``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operation in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work-around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1: .. code:: python compute_metrics=None if os.environ.get("NEURON_EXTRACT_GRAPHS_ONLY") else compute_metrics Compiler assertion error when running Stable Diffusion training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ With PyTorch 2.6 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.6 by keeping the `default gradient accumulation steps of 1 `__. .. code:: bash ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception: too many partition dims! {{0,+,960}[10],+,10560}[10] Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my models with PyTorch 2.6? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Do I need to update my scripts for PyTorch 2.6? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the :ref:`migration guide ` What environment variables will be changed with PyTorch NeuronX 2.6 ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) What features will be missing with PyTorch NeuronX 2.6? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch NeuronX 2.6 has all of the supported features in PyTorch NeuronX 2.5, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`. Can I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.6? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, NeuronX Distributed, and Transformers NeuronX, and AWS Neuron Reference for NeMo Megatron libraries will work with PyTorch NeuronX 2.6. Can I still use PyTorch 2.5 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.5 is supported for releases 2.21/2.22/2.23 and will reach end-of-life in a future release. Additionally, the CVE `CVE-2025-32434 `_ affects PyTorch version 2.5. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`. Can I still use PyTorch 2.1 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.1 is supported for release 2.21 and has reached end-of-life in release 2.22. Additionally, the CVEs `CVE-2024-31583 `_ and `CVE-2024-31580 `_ affect PyTorch versions 2.1 and earlier. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`. ================================================ FILE: about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-7.rst ================================================ .. _introduce-pytorch-2-7: Introducing PyTorch 2.7 Support =============================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the :ref:`Neuron 2.24 ` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.7. :ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.7 for Amazon Linux 2023 and Ubuntu 22.04. Note that PyTorch NeuronX 2.7 is supported on Python 3.9, 3.10, and 3.11. Review :ref:`migration guide ` for possible changes to training scripts. No code changes are required for inference scripts. .. _how-pytorch-2.7-different: How is PyTorch NeuronX 2.7 different compared to PyTorch NeuronX 2.5? --------------------------------------------------------------------- PyTorch NeuronX 2.7 uses Torch-XLA v2.7 and PyTorch v2.7 which have C++11 ABI enabled by default. Additionally, Torch-XLA v2.7 includes a fix for the training performance issue https://github.com/pytorch/xla/issues/9037. See `Torch-XLA 2.7 release `__ for a full list. See :ref:`migrate_to_pytorch_2.7` for changes needed to use PyTorch NeuronX 2.7. .. note:: GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release. .. _install_pytorch_neuron_2.7: How can I install PyTorch NeuronX 2.7? -------------------------------------------- To install PyTorch NeuronX 2.7, follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide ` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.7 that you can use to get started. PyTorch NeuronX 2.7 can be installed using the following: .. code:: python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.7.* torchvision .. note:: PyTorch NeuronX 2.7 is currently available for Python 3.9, 3.10, 3.11. .. _migrate_to_pytorch_2.7: Migrate your application to PyTorch 2.7 --------------------------------------- First, install the PyTorch NeuronX 2.7 as described above in :ref:`installation guide ` Migrating training scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^ To migrate the training scripts from PyTorch NeuronX 2.5/2.6 to PyTorch NeuronX 2.7, implement the following changes: .. note:: ``xm`` below refers to ``torch_xla.core.xla_model``, ``xr`` refers to ``torch_xla.runtime``, and ``xmp`` refers to ``torch_xla.distributed.xla_multiprocessing`` * The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used) and will be removed in an upcoming release. Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`) * The functions ``xm.xrt_world_size()``, ``xm.get_ordinal()``, and ``xm.get_local_ordinal()`` are deprecated and removed so there are errors when used. Switch to ``xr.world_size()``, ``xr.global_ordinal()``, and ``xr.local_ordinal()`` respectively as replacements. * The default behavior of ``torch.load`` parameter ``weights_only`` is changed from ``False`` to ``True``. Setting ``weights_only`` to ``True`` may cause issues with pickling custom objects. * If using ``xmp.spawn``, the ``nprocs`` argument is limited to 1 or None since v2.1. Previously, passing a value > 1 would result in a warning. In torch-xla 2.6+, passing a value > 1 will result in an error with an actionable message to use ``NEURON_NUM_DEVICES`` to set the number of NeuronCores to use. See :ref:`v2.6 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.5. See :ref:`v2.5 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.1. Migrating inference scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the inference scripts. Troubleshooting and Known Issues -------------------------------- Using the latest torch-xla v2.7 may result in increase in host memory usage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using the latest torch-xla v2.7 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage. TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AdamW now has an additional argument “decoupled_weight_decay” which defaults to False. If you get “TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'” with NeuronX Distributed, update to the latest version. Tensor split on second dimension of 2D array not working ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``. Lower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 and 2.7 although there will be end-of-support warnings (as noted below). Warning "XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) AttributeError: `_. Also set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error. For more details on checkpointing, refer the `documentation `_. Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue `_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers. ``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch-xla version 2.5+ now requires the ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see `ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context): .. code:: sudo dnf install libxcrypt-compat ``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.7, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed. .. code:: FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path' Check that the version of ``libneuronxla`` that supports PyTorch NeuronX 2.7 is ``2.2.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide ` ``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1: .. code:: python compute_metrics=None if os.environ.get("NEURON_EXTRACT_GRAPHS_ONLY") else compute_metrics Compiler assertion error when running Stable Diffusion training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ With PyTorch 2.7 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.7 by keeping the `default gradient accumulation steps of 1 `__. .. code:: bash ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception: too many partition dims! {{0,+,960}[10],+,10560}[10] Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my models with PyTorch 2.7? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Do I need to update my scripts for PyTorch 2.7? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the :ref:`migration guide ` What environment variables will be changed with PyTorch NeuronX 2.7 ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) What features will be missing with PyTorch NeuronX 2.7? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch NeuronX 2.7 has all of the supported features in PyTorch NeuronX 2.6, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`. Can I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.7? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, NeuronX Distributed and Transformers NeuronX are supported by PyTorch NeuronX 2.7. AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23. Can I still use PyTorch 2.6 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.6 is supported since release 2.23. Can I still use PyTorch 2.5 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.5 is supported for releases 2.21 to 2.24 and will reach end-of-life in a future release. Additionally, the CVE `CVE-2025-32434 `_ affects PyTorch version 2.5. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`. Can I still use PyTorch 2.1 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.1 is supported for release 2.21 and has reached end-of-life in release 2.22. Additionally, the CVEs `CVE-2024-31583 `_ and `CVE-2024-31580 `_ affect PyTorch versions 2.1 and earlier. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`. ================================================ FILE: about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-8.rst ================================================ .. _introduce-pytorch-2-8: Introducing PyTorch 2.8 Support =============================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the :ref:`Neuron 2.26 ` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.8. :ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.8 for Ubuntu 22.04. Note that PyTorch NeuronX 2.8 is supported on Python 3.10 and 3.11, with 3.12+ support coming in a future release. Review :ref:`migration guide ` for possible changes to training scripts. No code changes are required for inference scripts. .. _how-pytorch-2.8-different: How is PyTorch NeuronX 2.8 different compared to PyTorch NeuronX 2.7? --------------------------------------------------------------------- See `Torch-XLA 2.8 release `__ for a full list of changes. See :ref:`migrate_to_pytorch_2.8` for changes needed to use PyTorch NeuronX 2.8. .. note:: GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release. .. _install_pytorch_neuron_2.8: How can I install PyTorch NeuronX 2.8? -------------------------------------------- To install PyTorch NeuronX 2.8, follow the :ref:`setup-torch-neuronx` guides for Ubuntu 22.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide ` for Ubuntu 22.04 with a pre-installed virtual environment for PyTorch NeuronX 2.8 that you can use to get started. PyTorch NeuronX 2.8 can be installed using the following: .. code:: python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.8.* torchvision .. note:: PyTorch NeuronX 2.8 is currently available for Python 3.10 and 3.11, with 3.12+ support coming in a future release. .. note:: To use Amazon Linux 2023, you will need to install Python 3.10 or 3.11 to use PyTorch NeuronX 2.8. .. _migrate_to_pytorch_2.8: Migrate your application to PyTorch 2.8 --------------------------------------- First, install the PyTorch NeuronX 2.8 as described above in :ref:`installation guide ` Migrating training scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the training scripts to move from PyTorch NeuronX 2.7 to PyTorch NeuronX 2.8. See :ref:`v2.7 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.6. See :ref:`v2.6 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.5. Migrating inference scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the inference scripts. Troubleshooting and Known Issues -------------------------------- [v2.8] Lower BERT/LLaMA performance with torch-xla 2.8.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using the publicly released version of torch-xla 2.8.0 from public PyPI repositories would result in lower performance for models like BERT and LLaMA (https://github.com/pytorch/xla/issues/9605). To fix this, switch to using the updated torch-xla version 2.8.1 from public PyPI repositories. Using the latest torch-xla 2.7/2.8 may result in increase in host memory usage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using torch-xla 2.7/2.8 may result in an increase in host memory usage compared to torch-xla 2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage. TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AdamW now has an additional argument ``decoupled_weight_decay`` which defaults to False. If you get ``TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'`` with NeuronX Distributed, update to the latest version. Tensor split on second dimension of 2D array not working ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``. Lower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 to 2.8 although there will be end-of-support warnings (as noted below). DeprecationWarning: Use torch_xla.device instead ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a end-of-support warning when using ``torch_xla.core.xla_model.xla_device()``. Switch to ``torch_xla.device()`` instead. DeprecationWarning: Use torch_xla.sync instead ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a end-of-support warning when using ``torch_xla.core.xla_model.mark_step()``. Switch to ``torch_xla.sync()`` instead. Warning "XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) AttributeError: `_. Also set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error. For more details on checkpointing, refer the `documentation `_. Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue `_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers. ``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1: .. code:: python compute_metrics=None if os.environ.get("NEURON_EXTRACT_GRAPHS_ONLY") else compute_metrics Compiler assertion error when running Stable Diffusion training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ With PyTorch 2.8 (torch-neuronx), you may encounter the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you want to run Stable Diffusion training, disable gradient accumulation in torch-neuronx 2.8 by keeping the `default gradient accumulation steps of 1 `__. .. code:: bash ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception: too many partition dims! {{0,+,960}[10],+,10560}[10] Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my models with PyTorch 2.8? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Do I need to update my scripts for PyTorch 2.8? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the :ref:`migration guide ` What environment variables will be changed with PyTorch NeuronX 2.8 ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) What features will be missing with PyTorch NeuronX 2.8? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch NeuronX 2.8 has all of the supported features in PyTorch NeuronX 2.7, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`. Can I use Neuron Distributed libraries with PyTorch NeuronX 2.8? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, NeuronX Distributed libraries are supported by PyTorch NeuronX 2.8. Transformers NeuronX has reached end-of-support in release 2.26. AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23. Can I still use PyTorch 2.7 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.7 is supported since release 2.24. Can I still use PyTorch 2.6 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.6 is supported since release 2.23. Can I still use PyTorch 2.5 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.5 reached end-of-support in release 2.25. Can I still use Amazon Linux 2023? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. You will need to install Python 3.10 or 3.11 to use PyTorch NeuronX 2.8. ================================================ FILE: about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-9.rst ================================================ .. _introduce-pytorch-2-9: Introducing PyTorch 2.9 Support =============================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the :ref:`Neuron 2.27 ` release, customers can now upgrade to PyTorch NeuronX (``torch-neuronx``) with specific support for PyTorch version 2.9. PyTorch NeuronX 2.9 adds support for AWS Trainium 3 (Trn3) instances, in addition to existing support for Trainium (Trn2/Trn1/Trn1n) and Inferentia (Inf2) instances. :ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.9 for Ubuntu 24.04. Note that PyTorch NeuronX 2.9 is supported on Python 3.10, 3.11 and 3.12. Review :ref:`migration guide ` for possible changes to training scripts. No code changes are required for inference scripts. .. _how-pytorch-2.9-different: How is PyTorch NeuronX 2.9 different compared to PyTorch NeuronX 2.8? --------------------------------------------------------------------- See `Torch-XLA 2.9 release `__ for a full list of changes. See :ref:`migrate_to_pytorch_2.9` for changes needed to use PyTorch NeuronX 2.9. .. note:: Torch Dynamo (torch.compile) support in Neuron will be available in a future release. .. _install_pytorch_neuron_2.9: How can I install PyTorch NeuronX 2.9? -------------------------------------------- To install PyTorch NeuronX 2.9, follow the :ref:`setup-torch-neuronx` guides for Ubuntu 24.04 AMI. Refer to the Neuron Multi-Framework DLAMI :ref:`setup guide ` for Ubuntu 24.04 with a pre-installed virtual environment for PyTorch NeuronX 2.9 that you can use to get started. PyTorch NeuronX 2.9 can be installed using the following: .. code:: python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.9.* torchvision .. note:: PyTorch NeuronX 2.9 is currently available for Python 3.10, 3.11 and 3.12. .. note:: To use Amazon Linux 2023, you will need to install Python 3.10, 3.11 or 3.12 to use PyTorch NeuronX 2.9. See `Amazon Linux 2023 Python documentation `_ for installation instructions. .. _migrate_to_pytorch_2.9: Migrate your application to PyTorch 2.9 --------------------------------------- First, install the PyTorch NeuronX 2.9 as described above in :ref:`installation guide ` Migrating training scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the training scripts to move from PyTorch NeuronX 2.8 to PyTorch NeuronX 2.9. See :ref:`v2.8 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.7. See :ref:`v2.7 migration guide ` for additional changes needed if you are migrating from PyTorch NeuronX 2.6. Migrating inference scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the inference scripts. Troubleshooting and Known Issues -------------------------------- GLIBC compatibility issue on Amazon Linux 2023 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running PyTorch NeuronX 2.9 on Amazon Linux 2023, you may encounter the following error: .. code:: ImportError: /usr/lib64/libm.so.6: version `GLIBC_2.35' not found (required by /opt/conda/lib/python3.12/site-packages/_XLAC.cpython-312-x86_64-linux-gnu.so) This occurs because the PyTorch NeuronX 2.9 binaries require GLIBC 2.35, but Amazon Linux 2023 ships with an older version of GLIBC. Use Ubuntu 24.04 AMI instead, which has the required GLIBC version. Follow the :ref:`setup-torch-neuronx` installation guide for Ubuntu 24.04. Using the latest torch-xla 2.7/2.8/2.9 may result in increase in host memory usage ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using the latest torch-xla v2.7/2.8/2.9 may result in an increase in host memory usage compared to torch-xla v2.6. In one example, LLama2 pretraining with ZeRO1 and sequence length 16k could see an increase of 1.6% in host memory usage. TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AdamW now has an additional argument ``decoupled_weight_decay`` which defaults to False. If you get ``TypeError: AdamW.__init__() got an unexpected keyword argument 'decoupled_weight_decay'`` with NeuronX Distributed, update to the latest version. Tensor split on second dimension of 2D array not working ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, when using the tensor split operation on a 2D array in the second dimension, the resulting tensors do not contain the expected data (https://github.com/pytorch/xla/issues/8640). The workaround is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another workaround is to use ``torch.tensor_split``. Lower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is approximately 11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a workaround to recover the performance, you can set ``XLA_DOWNCAST_BF16=1``, which will still work in torch-neuronx 2.5 through 2.9 although there will be end-of-support warnings (as noted below). DeprecationWarning: Use torch_xla.device instead ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a end-of-support warning when using ``torch_xla.core.xla_model.xla_device()``. Switch to ``torch_xla.device()`` instead. DeprecationWarning: Use torch_xla.sync instead ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a end-of-support warning when using ``torch_xla.core.xla_model.mark_step()``. Switch to ``torch_xla.sync()`` instead. Warning "XLA_DOWNCAST_BF16 will be deprecated after the 2.6 release, please downcast your model directly" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) AttributeError: `_. Also set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error. For more details on checkpointing, refer the `documentation `_. Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue `_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers. ``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running Neuron Parallel Compile with HF Trainer API, you may see the errors ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operations in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work around this error, disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1: .. code:: python compute_metrics=None if os.environ.get("NEURON_EXTRACT_GRAPHS_ONLY") else compute_metrics Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my models with PyTorch 2.9? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Do I need to update my scripts for PyTorch 2.9? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See the :ref:`migration guide ` What environment variables will be changed with PyTorch NeuronX 2.9 ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warnings are shown when used). Switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) What features will be missing with PyTorch NeuronX 2.9? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch NeuronX 2.9 has all of the supported features in PyTorch NeuronX 2.8, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`. Can I use Neuron Distributed libraries with PyTorch NeuronX 2.9? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, NeuronX Distributed libraries are supported by PyTorch NeuronX 2.9. Transformers NeuronX has reached end-of-support in release 2.26. AWS Neuron Reference for NeMo Megatron has reached end-of-support in release 2.23. Can I still use PyTorch 2.8 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.8 is supported since release 2.26. Can I still use PyTorch 2.7 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.7 is supported since release 2.24. .. note:: PyTorch NeuronX 2.7 supports Python 3.10, and 3.11. Python 3.12 is not supported for PyTorch 2.7 and earlier versions. Can I still use PyTorch 2.6 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.6 has reached end-of-support since release 2.27. Can I still use Amazon Linux 2023? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. You will need to install Python 3.10, 3.11 or 3.12 to use PyTorch NeuronX 2.9. ================================================ FILE: about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-x.rst ================================================ .. _introduce-pytorch-2-5: Introducing PyTorch 2.5 Support =============================== .. contents:: Table of contents :local: :depth: 2 What are we introducing? ------------------------ Starting with the :ref:`Neuron 2.21 ` release, customers will be able to upgrade to ``PyTorch NeuronX(torch-neuronx)`` supporting ``PyTorch 2.5``. :ref:`setup-torch-neuronx` is updated to include installation instructions for PyTorch NeuronX 2.5 for Amazon Linux 2023 and Ubuntu 22. Note that PyTorch NeuronX 2.5 does not support Python 3.8 which is default in Ubuntu 20. To use Ubuntu 20, customers will need to install Python 3.9+. Please review :ref:`migration guide ` for possible changes to training scripts. No code changes are required for inference scripts. .. _how-pytorch-2-5-different: How is PyTorch NeuronX 2.5 different compared to PyTorch NeuronX 2.1? --------------------------------------------------------------------- PyTorch NeuronX 2.5 uses Torch-XLA 2.5 which has improved support for eager debug mode, Automatic Mixed Precission, PJRT device auto-detection, FP8, and others. See `Torch-XLA 2.5 release `__ for a full list. See :ref:`migrate_to_pytorch_2_5` for changes needed to use PyTorch NeuronX 2.5. .. note:: GSPMD and Torch Dynamo (torch.compile) support in Neuron will be available in a future release. .. _install_pytorch_neuron_2_5: How can I install PyTorch NeuronX 2.5? -------------------------------------------- To install PyTorch NeuronX 2.5 please follow the :ref:`setup-torch-neuronx` guides for Amazon Linux 2023 and Ubuntu 22 AMI. Please also refer to the Neuron multi-framework DLAMI :ref:`setup guide ` for Ubuntu 22 with a pre-installed virtual environment for PyTorch NeuronX 2.5 that you can use to get started. PyTorch NeuronX 2.5 can be installed using the following: .. code:: python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.5.* torchvision .. note:: PyTorch NeuronX 2.5 is currently available for Python 3.9, 3.10, 3.11. .. _migrate_to_pytorch_2_5: Migrate your application to PyTorch 2.5 --------------------------------------- Please make sure you have first installed the PyTorch NeuronX 2.5 as described above in :ref:`installation guide ` Migrating training scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^ To migrate the training scripts from PyTorch NeuronX 2.1 to PyTorch NeuronX 2.5, implement the following changes: .. note:: ``xm`` below refers to ``torch_xla.core.xla_model`` and ``xr`` refers to ``torch_xla.runtime`` * The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to convert model to BF16 format. (see :ref:`migration_from_xla_downcast_bf16`) * The ``torch_xla.experimental.pjrt`` module which was replaced by ``torch_xla.runtime`` in Torch-XLA 2.1, has been removed in Torch-XLA 2.5. Users should now utilize the ``torch_xla.runtime`` module as a replacement. * ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. * ``xm.all_reduce`` no longer operates in-place for single tensors. To fix this, please convert the single tensor to an array (e.g.. ``[single_tensor]``) or assign the output of ``xm.all_reduce`` to a variable. * The functions ``xm.xrt_world_size()``, ``xm.xla_model.get_ordinal()``, and ``xm.xla_model.get_local_ordinal()`` are deprecated (warning when used). Please switch to ``xr.world_size``, ``xr.global_ordinal``, and ``xr.local_ordinal`` respectively as replacements. * ``torch_xla.experimental.xla_sharding`` is now replaced by ``torch_xla.distributed.spmd.xla_sharding``. * Class ``ZeroRedundancyOptimizer`` now has two new arguments that replaces the optional boolean argument ``coalesce_cc``: * ``bucket_cap_mb_all_gather`` (int, Optional): Number of MegaBytes of the tensor bucket to fill before doing all-gather. Default: 0 (disable all gather coalescing). * ``bucket_cap_mb_reduce_scatter`` (int, Optional): Number of MegaBytes of the tensor bucket to fill before doing reduce-scatter. Default: 0 (disable reduce scatter coalescing). Migrating inference scripts ^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are no code changes required in the inference scripts. Troubleshooting and Known Issues -------------------------------- Neuronx-Distributed Training Llama 3.1 70B 8-node tutorial failed with OSError when the Neuron Cache is placed on FSx mount ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, the Neuronx-Distributed Training Llama 3.1 70B 8-node tutorial failed with OSError (Errno 61) when the Neuron Cache is placed on FSx mount: .. code:: bash [rank197]: RuntimeError: Bad StatusOr access: INVALID_ARGUMENT: RunNeuronCCImpl: error condition !(error != 400): : [Errno 61] No data available: '/fsxl/neuron_cache/neuronxcc-2.16.372.0+4a9b2326/MODULE_3540044791706521849+4eb52b03/model.neff' -> '/tmp/tmpx7bvfpmm/model.neff' We found that the error is due to FSx failing during file copy when there are multiple readers (13 workers fail to copy out of 256). This issue doesn’t affect simpler models like BERT. To work-around the issue, please use the shared NFS mount (/home directory on a Parallel Cluster) instead of FSx to store Neuron Cache. This will be fixed in an upcoming release. Running in-place update operations (e.g. all_reduce) on 0-dimensional tensors result in buffer aliasing errors in torch 2.5 and earlier ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Torch's lazy tensor core has a feature where 0-dimensional tensors are stored in a device cache, so scalar constant values can be transferred once and then reused. The values in the device cache are supposed to be marked read-only and never participate in parameter aliasing. However, due to a bug in torch-xla 2.5 (`#8499 `_), sometimes the read-only flag can be dropped, allowing these tensors to be donated, resulting in aliasing errors later when the cached value is used again. A work-around is to avoid using 0-dimensional tensors by changing them to be 1d tensor of length 1 (`example `_). If modifying library code is not possible, disable XLA parameter aliasing by setting environment variable XLA_ENABLE_PARAM_ALIASING=0 Tensor split on second dimension of 2D array not working ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, when using tensor split operation on a 2D array in the second dimension, the resulting tensors don't have the expected data (https://github.com/pytorch/xla/issues/8640). The work-around is to set ``XLA_DISABLE_FUNCTIONALIZATION=0``. Another work-around is to use ``torch.tensor_split``. Import torch_xla crashed with ``TypeError: must be called with a dataclass type or instance`` with torch-xla 2.5 and torch 2.5.1+cpu (CPU flavor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When using torch 2.5.1+cpu (CPU flavor) on python 3.10, importing torch_xla crashed with ``TypeError: must be called with a dataclass type or instance`` due to installed triton version 3.2.0 (https://github.com/pytorch/xla/issues/8560). To work-around, please remove the installed triton package or downgrade to triton==3.1.0 or use the regular torch 2.5.1 (GPU flavor). Certain sequence of operations with ``xm.save()`` could corrupt tensors ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When using the ``xm.save`` function to save tensors, please use ``xm.mark_step()`` before ``xm.save`` to avoid the error described in https://github.com/pytorch/xla/issues/8422 where parameter aliasing could corrupt other tensor values. This issue will be fixed in a future release. (Here ``xm`` is ``torch_xla.core.xla_model`` following PyTorch/XLA convention) Lower BERT pretraining performance when switch to using ``model.to(torch.bfloat16)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, BERT pretraining performance is ~11% lower when switching to using ``model.to(torch.bfloat16)`` as part of migration away from the deprecated environment variable ``XLA_DOWNCAST_BF16`` due to https://github.com/pytorch/xla/issues/8545. As a work-around to recover the performance, you can set ``XLA_DOWNCAST_BF16=1`` which would still work in torch-neuronx 2.5 and 2.6 although there will be end-of-support warnings (as noted below). Warning "XLA_DOWNCAST_BF16 will be deprecated after the 2.5 release, please downcast your model directly" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) WARNING:root:torch_xla.core.xla_model.xrt_world_size() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.world_size instead. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a warning that ``torch_xla.core.xla_model.xrt_world_size()`` will be removed in a future release. Please switch to using ``torch_xla.runtime.world_size`` instead. WARNING:torch_xla.core.xla_model.xla_model.get_ordinal() will be removed in release 2.7. is deprecated. Use torch_xla.runtime.global_ordinal instead. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a warning that ``torch_xla.core.xla_model.xla_model.get_ordinal()`` will be removed in a future release. Please switch to using ``torch_xla.runtime.global_ordinal`` instead. AttributeError: module 'torch_xla.runtime' has no attribute 'using_pjrt' ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In Torch-XLA 2.5, ``torch_xla.runtime.using_pjrt`` is removed because PJRT is the sole Torch-XLA runtime. See `commit PR `__. Socket Error: Socket failed to bind ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.5, there needs to be a socket available for both torchrun and the ``init_process_group`` to bind. Both of these, by default, will be set to unused sockets. If you plan to use a ``MASTER_PORT`` environment variable then this error may occur, if the port you set it to is already in use. .. code:: [W socket.cpp:426] [c10d] The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). [W socket.cpp:426] [c10d] The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use). [E socket.cpp:462] [c10d] The server socket has failed to listen on any local network address. RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to ?UNKNOWN? (errno: 98 - Address already in use). To resolve the issue, please ensure if you are setting ``MASTER_PORT`` that the port you're setting it to is not used anywhere else in your scripts. Otherwise, you can leave ``MASTER_PORT`` unset, and torchrun will set the default port for you. ``AttributeError: module 'torch' has no attribute 'xla'`` Failure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.5, training scripts might fail during activation checkpointing with the error shown below. .. code:: AttributeError: module 'torch' has no attribute 'xla' The solution is to use ``torch_xla.utils.checkpoint.checkpoint`` instead of ``torch.utils.checkpoint.checkpoint`` as the checkpoint function while wrapping pytorch modules for activation checkpointing. Refer to the pytorch/xla discussion regarding this `issue `_. Also set ``use_reentrant=True`` while calling the torch_xla checkpoint function. Failure to do so will lead to ``XLA currently does not support use_reentrant==False`` error. For more details on checkpointing, refer the `documentation `_. Error ``Attempted to access the data pointer on an invalid python storage`` when using HF Trainer API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While using HuggingFace Transformers Trainer API to train (i.e. :ref:`HuggingFace Trainer API fine-tuning tutorial`), you may see the error "Attempted to access the data pointer on an invalid python storage". This is a known `issue `_ and has been fixed in the version ``4.37.3`` of HuggingFace Transformers. ``ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` on Amazon Linux 2023 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch-xla version 2.5+ now requires ``libcrypt.so.1`` shared library. Currently, Amazon Linux 2023 includes ``libcrypt.so.2`` shared library by default so you may see `ImportError: libcrypt.so.1: cannot open shared object file: No such file or directory`` when using torch-neuronx 2.1+ on Amazon Linux 2023. To install ``libcrypt.so.1`` on Amazon Linux 2023, please run the following installation command (see also https://github.com/amazonlinux/amazon-linux-2023/issues/182 for more context): .. code:: sudo dnf install libxcrypt-compat ``FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path'`` Failure ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In PyTorch 2.5, users might face the error shown below due to incompatible ``libneuronxla`` and ``torch-neuronx`` versions being installed. .. code:: FileNotFoundError: [Errno 2] No such file or directory: 'libneuronpjrt-path' Check that the version of ``libneuronxla`` that support PyTorch NeuronX 2.5 is ``2.1.*``. If not, then uninstall ``libneuronxla`` using ``pip uninstall libneuronxla`` and then reinstall the packages following the installation guide :ref:`installation guide ` GlibC error on Amazon Linux 2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If using Torch-NeuronX 2.5 on Amazon Linux 2, you will see a GlibC error below. Please switch to a newer supported OS such as Ubuntu 22 or Amazon Linux 2023. .. code:: bash ImportError: /lib64/libc.so.6: version `GLIBC_2.27' not found (required by /tmp/debug/_XLAC.cpython-38-x86_64-linux-gnu.so) ``Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` error during Neuron Parallel Compile ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running Neuron Parallel Compile with HF Trainer API, you may see the error ``Status: INVALID_ARGUMENT: Input dimension should be either 1 or equal to the output dimension it is broadcasting into`` or ``IndexError: index out of range`` in Accelerator's ``pad_across_processes`` function. This is due to data-dependent operation in evaluation metrics computation. Data-dependent operations would result in undefined behavior with Neuron Parallel Compile trial execution (execute empty graphs with zero outputs). To work-around this error, please disable compute_metrics when NEURON_EXTRACT_GRAPHS_ONLY is set to 1: .. code:: python compute_metrics=None if os.environ.get("NEURON_EXTRACT_GRAPHS_ONLY") else compute_metrics Compiler assertion error when running Stable Diffusion training ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Currently, with PyTorch 2.5 (torch-neuronx), we are seeing the following compiler assertion error with Stable Diffusion training when gradient accumulation is enabled. This will be fixed in an upcoming release. For now, if you would like to run Stable Diffusion training with Neuron SDK release 2.21/2.22, please disable gradient accumulation in torch-neuronx 2.5. .. code:: bash ERROR 222163 [NeuronAssert]: Assertion failure in usr/lib/python3.9/concurrent/futures/process.py at line 239 with exception: too many partition dims! {{0,+,960}[10],+,10560}[10] Frequently Asked Questions (FAQ) -------------------------------- Do I need to recompile my models with PyTorch 2.5? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Do I need to update my scripts for PyTorch 2.5? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Please see the :ref:`migration guide ` What environment variables will be changed with PyTorch NeuronX 2.5 ? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The environment variables ``XLA_DOWNCAST_BF16`` and ``XLA_USE_BF16`` are deprecated (warning when used). Please switch to automatic mixed-precision or use ``model.to(torch.bfloat16)`` command to cast model to BF16. (see :ref:`migration_from_xla_downcast_bf16`) What features will be missing with PyTorch NeuronX 2.5? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch NeuronX 2.5 now has most of the supported features in PyTorch NeuronX 2.1, with known issues listed above, and unsupported features as listed in :ref:`pytorch_rn`. Can I use Neuron Distributed and Transformers Neuron libraries with PyTorch NeuronX 2.5? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes, NeuronX Distributed, and Transformers NeuronX, and AWS Neuron Reference for NeMo Megatron libraries will work with PyTorch NeuronX 2.5. Can I still use PyTorch 2.1 version? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch 2.1 is supported for release 2.21 and will reach end-of-life in a future release. Additionally, the CVEs `CVE-2024-31583 `_ and `CVE-2024-31580 `_ affect PyTorch versions 2.1 and earlier. We recommend upgrading to the new version of Torch-NeuronX by following :ref:`setup-torch-neuronx`. ================================================ FILE: about-neuron/appnotes/torch-neuronx/migration-from-xla-downcast-bf16.rst ================================================ .. _migration_from_xla_downcast_bf16: Migration From ``XLA_USE_BF16``/``XLA_DOWNCAST_BF16`` ===================================================== Introduction ------------ The environmental variables ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16`` were created to provide an easy cast-to-bf16 option before automatic mixed-precision or ``model.to(torch.bfloat16)`` as available in Torch-XLA. Now that both automatic mixed precision and ``model.to(torch.bfloat16)`` are available in Torch-XLA, ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16`` are redundant and can be replaced with these options as a more familiar experience as on other platforms such as CPUs and GPUs. Using them in Torch-XLA 2.5+ would cause warnings to be displayed about their end-of-support. While they are still functional, their functionality will be removed in a future release (Torch-XLA 2.8) so the recommended changes below are available as replacement. NeuronX Distributed Training has been updated to use some of the options below. Please see :ref:`standard_mixed_precision` for more information. The changes recommended below can best be made to scripts running with Torch-XLA 2.5+. The same recommendations are also available in :ref:`pytorch-neuronx-programming-guide`. .. note:: This guide recommends the options below as replacement for ``XLA_USE_BF16`` and ``XLA_DOWNCAST_BF16``. Do not set ``XLA_USE_BF16=1`` or ``XLA_DOWNCAST_BF16=1`` when using the options below on Neuron devices. Using them will override the per-operator precision settings provided by the options and thus cause more operators to execute in bfloat16. Full BF16 with stochastic rounding enabled ------------------------------------------ Previously, on torch-neuronx 2.1 and earlier, the environmental variables ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` provided full casting to BF16 with stochastic rounding enabled by default. These environmental variables are deprecated in torch-neuronx 2.5, although still functional with warnings. To replace ``XLA_USE_BF16`` or ``XLA_DOWNCAST_BF16`` with stochastic rounding on Neuron, set ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1`` and use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type BF16 as follows: .. code:: python os.environ["NEURON_RT_STOCHASTIC_ROUNDING_EN"] = "1" # model is created model.to(torch.bfloat16) Stochastic rounding is needed to enable faster convergence for full BF16 model. If the loss is to be kept in FP32, initialize it with ``dtype=torch.float`` as follows: .. code:: python running_loss = torch.zeros(1, dtype=torch.float).to(device) Similarly, if the optimizer states are to be kept in FP32, convert the gradients to FP32 before optimizer computations: .. code:: python grad = p.grad.data.float() For a full example, please see the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) `, which has been updated to use ``torch.nn.Module.to`` instead of ``XLA_DOWNCAST_BF16``. BF16 in GPU-compatible mode without stochastic rounding enabled --------------------------------------------------------------- Full BF16 training in GPU-compatible mode would enable faster convergence without the need for stochastic rounding, but would require a FP32 copy of weights/parameters to be saved and used in the optimizer. To enable BF16 in GPU-compatible mode without stochastic rounding enabled, use the ``torch.nn.Module.to`` method to cast model floating-point parameters and buffers to data-type bfloat16 as follows without setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``: .. code:: python # model is created model.to(torch.bfloat16) In the initializer of the optimizer, for example AdamW, you can add code like the following code snippet to make a FP32 copy of weights: .. code:: python # keep a copy of weights in highprec self.param_groups_highprec = [] for group in self.param_groups: params = group['params'] param_groups_highprec = [p.data.float() for p in params] self.param_groups_highprec.append({'params': param_groups_highprec}) From then, you can use the usual gradients but updating the FP32 copy of weights instead: .. code:: python for group, group_highprec in zip(self.param_groups, self.param_groups_highprec): for p, p_highprec in zip(group['params'], group_highprec['params']): # convert gradients to FP32 before computing exponential average grad = p.grad.data.float() # compute the exponential average and denominator using grad ... # Update FP32 copy of weights p_highprec.data.addcdiv_(exponential_avg, denominator, value=-step_size) In the :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) `, this mode can be enabled by pasing ``--optimizer=AdamW_FP32ParamsCopy`` option to ``dp_bert_large_hf_pretrain_hdf5.py`` and setting ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0`` (or leave it unset). BF16 automatic mixed precision using PyTorch Autocast ----------------------------------------------------- By default, the compiler automatically casts internal FP32 operations to BF16. You can disable this and allow PyTorch's BF16 automatic mixed precision function (``torch.autocast``) to do the casting of certain operations to operate in BF16. To enable PyTorch's BF16 mixed-precision, first turn off the Neuron compiler auto-cast: .. code:: python os.environ["NEURON_CC_FLAGS"] = "--auto-cast=none" Next, per recommendation from official PyTorch `torch.autocast documentation `__, place only the forward-pass of the training step in the ``torch.autocast`` scope with ``xla`` device type: .. code:: python with torch.autocast(dtype=torch.bfloat16, device_type='xla'): # forward pass The device type is XLA because we are using PyTorch-XLA's autocast backend. The PyTorch-XLA `autocast mode source code `_ lists which operations are casted to lower precision BF16 ("lower precision fp cast policy" section), which are maintained in FP32 ("fp32 cast policy"), and which are promoted to the widest input types ("promote" section). .. note:: If an operation is not part of any policy in `autocast mode source code `_, the data type of the inputs will be used for the computation of the operation. Example showing the original training code snippet: .. code:: python def train_loop_fn(train_loader): for i, data in enumerate(train_loader): inputs = data[0] labels = data[3] outputs = model(inputs, labels=labels) loss = outputs.loss/ flags.grad_acc_steps loss.backward() optimizer.step() xm.mark_step() The following shows the training loop modified to use BF16 autocast: .. code:: python os.environ["NEURON_CC_FLAGS"] = "--auto-cast=none" def train_loop_fn(train_loader): for i, data in enumerate(train_loader): torch.cuda.is_bf16_supported = lambda: True with torch.autocast(dtype=torch.bfloat16, device_type='xla'): inputs = data[0] labels = data[3] outputs = model(inputs, labels=labels) loss = outputs.loss/ flags.grad_acc_steps loss.backward() optimizer.step() xm.mark_step() For a full example of BF16 mixed-precision, see :ref:`PyTorch Neuron BERT Pretraining Tutorial (Data-Parallel) `. See official PyTorch documentation for more details about `torch.autocast `__ . ================================================ FILE: about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note.rst ================================================ .. _torch-neuronx-dataparallel-app-note: Data Parallel Inference on torch_neuronx ======================================= .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ This guide introduces :func:`torch_neuronx.DataParallel`, a Python API that implements data parallelism on :class:`~torch.jit.ScriptModule` models created by the :ref:`torch_neuronx_trace_api`. The following sections explain how data parallelism can improve the performance of inference workloads on Inferentia, including how :func:`torch_neuronx.DataParallel` uses dynamic batching to run inference on variable input sizes. It covers an overview of the :func:`torch_neuronx.DataParallel` module and provides a few :ref:`example data parallel applications `. Data parallel inference ------------------------- Data Parallelism is a form of parallelization across multiple devices or cores, referred to as nodes. Each node contains the same model and parameters, but data is distributed across the different nodes. By distributing the data across multiple nodes, data parallelism reduces the total execution time of large batch size inputs compared to sequential execution. Data parallelism works best for smaller models in latency sensitive applications that have large batch size requirements. torch_neuronx.DataParallel ------------------------- To fully leverage the Inferentia hardware, we want to use all available NeuronCores. An inf2.xlarge and inf2.8xlarge have two NeuronCores, an inf2.24xlarge has 12 NeuronCores, and an inf2.48xlarge has 24 NeuronCores. For maximum performance on Inferentia hardware, we can use :func:`torch_neuronx.DataParallel` to utilize all available NeuronCores. :func:`torch_neuronx.DataParallel` implements data parallelism at the module level by replicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference. This function is analogous to :class:`~torch.nn.DataParallel` in PyTorch. :func:`torch_neuronx.DataParallel` requires PyTorch >= 1.8. The following sections provide an overview of some of the features of :func:`torch_neuronx.DataParallel` that enable maximum performance on Inferentia. NeuronCore selection ^^^^^^^^^^^^^^^^^^^^ By default, DataParallel will try to use all NeuronCores allocated to the current process to fully saturate the Inferentia hardware for maximum performance. It is more efficient to make the batch dimension divisible by the number of NeuronCores. This will ensure that NeuronCores are not left idle during parallel inference and the Inferentia hardware is fully utilized. In some applications, it is advantageous to use a subset of the available NeuronCores for DataParallel inference. DataParallel has a ``device_ids`` argument that accepts a list of :obj:`int` or ``'nc:#'`` that specify the NeuronCores to use for parallelization. See :ref:`Specifying NeuronCores ` for an example of how to use ``device_ids`` argument. Batch dim ^^^^^^^^^ DataParallel accepts a ``dim`` argument that denotes the batch dimension used to split the input data for distributed inference. By default, DataParalell splits the inputs on ``dim = 0`` if the ``dim`` argument is not specified. For applications with a non-zero batch dim, the ``dim`` argument can be used to specify the inference-time input batch dimension. :ref:`DataParallel with dim ! = 0 ` provides an example of data parallel inference on inputs with batch dim = 2. .. _dynamic_batching_description_torch_neuronx: Dynamic batching ^^^^^^^^^^^^^^^^ Batch size has a direct impact on model performance. The Inferentia chip is optimized to run with small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes. As a general best practice, we recommend optimizing your model's throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Inferentia. Dynamic batching is a feature that allows you to use tensor batch sizes that the Neuron model was not originally compiled against. This is necessary because the underlying Inferentia hardware will always execute inferences with the batch size used during compilation. Fixed batch size execution allows tuning the input batch size for optimal performance. For example, batch size 1 may be best suited for an ultra-low latency on-demand inference application, while batch size > 1 can be used to maximize throughput for offline inferencing. Dynamic batching is implemented by slicing large input tensors into chunks that match the batch size used during the :func:`torch_neuronx.trace` compilation call. The :func:`torch_neuronx.DataParallel` class automatically enables dynamic batching on eligible models. This allows us to run inference in applications that have inputs with a variable batch size without needing to recompile the model. See :ref:`Dynamic batching ` for an example of how DataParallel can be used to run inference on inputs with a dynamic batch size without needing to recompile the model. Dynamic batching using small batch sizes can result in sub-optimal throughput because it involves slicing tensors into chunks and iteratively sending data to the hardware. Using a larger batch size at compilation time can use the Inferentia hardware more efficiently in order to maximize throughput. You can test the tradeoff between individual request latency and total throughput by fine-tuning the input batch size. Automatic batching in the DataParallel module can be disabled using the ``disable_dynamic_batching()`` function as follows: .. code-block:: python >>> model_parallel = torch_neuronx.DataParallel(model_neuron) >>> model_parallel.disable_dynamic_batching() If dynamic batching is disabled, the compile-time batch size must be equal to the inference-time batch size divided by the number of NeuronCores. :ref:`DataParallel with dim != 0 ` and :ref:`Dynamic batching disabled ` provide examples of running DataParallel inference with dynamic batching disabled. Performance optimizations ^^^^^^^^^^^^^^^^^^^^^^^^^ The DataParallel module has a ``num_workers`` attribute that can be used to specify the number of worker threads used for multithreaded inference. By default, ``num_workers = 2 * number of NeuronCores``. This value can be fine tuned to optimize DataParallel performance. DataParallel has a ``split_size`` attribute that dictates the size of the input chunks that are distributed to each NeuronCore. By default, ``split_size = max(1, input.shape[dim] // number of NeuronCores)``. This value can be modified to optimally match the inference input chunk size with the compile-time batch size. .. _data_parallel_examples_torch_neuronx: Examples -------- The following sections provide example usages of the :func:`torch_neuronx.DataParallel` module. .. _dataparallel_example_default_torch_neuronx: Default usage ^^^^^^^^^^^^^ .. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.rst .. _dataparallel_example_specify_ncs_torch_neuronx: Specifying NeuronCores ^^^^^^^^^^^^^^^^^^^^^^ .. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.rst .. _dataparallel_example_dim_neq_zero_torch_neuronx: DataParallel with dim != 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.rst .. _dataparallel_example_dynamic_batching_torch_neuronx: Dynamic batching ^^^^^^^^^^^^^^^^ .. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.rst .. _dataparallel_example_disable_dynamic_batching_torch_neuronx: Dynamic batching disabled ^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.rst ================================================ FILE: about-neuron/appnotes/torch-neuronx/torch-neuronx-graph-partitioner-app-note.rst ================================================ .. _torch-neuronx-graph-partitioner-app-note: Graph Partitioner on torch_neuronx ======================================= .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ This guide introduces the graph partitioner for torch-neuronx. The following sections explain the purpose of the graph partitioner, how it works, and go over a few examples. The Purpose of the Graph Partitioner ------------------------------------ While ``neuronx-cc`` is very sophisticated and can compile most operators, there are some operator configurations that are not supported by the compiler. Usually in a model that contains unsupported operators, these are only a few operators while the supported parts of the model can benefit from the acceleration benefits that Neuron offers. With this in mind, we developed a graph partitioner that will partition out unsupported operators to be executed on CPU, while compiling and executing the supported operators on Neuron. How it Works ------------ Determining Unsupported Operators ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Operator support is determined by the ``neuronx-cc`` compiler frontend. This is done because this gives us more flexibility than a static list. This is evident in cases where a specific operator configuration is supported but another configuration is not supported. For example, we support the square root operator, but do not support it with a ``C64`` data type for example. To check operator support, we use the :func:`torch_neuronx.analyze` API, which queries the compiler for device placement: Neuron or CPU, which gives the graph partitioner a base graph to start partitioning. The below image shows the flow of the graph partitioner: |torch-neuronx-graph-partitioner-flow-diagram| .. |torch-neuronx-graph-partitioner-flow-diagram| image:: /images/torch-neuronx-graph-partitioner-flow-diagram.png Customizability ^^^^^^^^^^^^^^^ The graph partitioner has a wide range of customizability for a variety of situations. The customization options include: 1. **Minimum Operator Support:** Only partition the model if a minimum percentage of operators are supported. 2. **Minimum Subgraph Size:** The minimum number of operators in any given subgraph. This can be useful if having compute chokepoints with single operator subgraphs is not desired. 3. **Maximum Subgraph Count:** The maximum number of subgraphs. Too many subgraphs can fragment the computation graph causing performance degredation. 4. **Ops to Partition:** Additional operators to partition to CPU beyond the unsupported operators. This can be useful to suggest to the graph partitioner to partition to create a more balanced graph. Furthermore, compiler flags/args can be passed into all Neuron subgraphs through the graph partitioner. For the API Reference, visit :func:`torch_neuronx.trace` and :class:`torch_neuronx.PartitionerConfig` .. note:: Dynamic batching has a case-by-case support with partitioned models, because it is highly dependent on how the final partition scheme looks like. Examples -------- The following sections provide example usages of the graph partitioner. Default Usage ^^^^^^^^^^^^^ The below model is a simple MLP model with sorted log softmax output. The sort operator, ``torch.sort()`` or ``aten::sort``, is not supported by ``neuronx-cc`` at this time, so the graph partitioner will partition out the sort operator to CPU. .. code-block:: python import torch import torch_neuronx import torch.nn as nn import logging # adjust logger level to see what the partitioner is doing logger = logging.getLogger("Neuron") class MLP(nn.Module): def __init__( self, input_size=28 * 28, output_size=10, layers=[4096, 2048] ): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) self.relu = nn.ReLU() def forward(self, x): f1 = self.fc1(x) r1 = self.relu(f1) f2 = self.fc2(r1) r2 = self.relu(f2) f3 = self.fc3(r2) out = torch.log_softmax(f3, dim=1) sort_out,_ = torch.sort(out) return sort_out n = MLP() n.eval() inputs = torch.rand(32,784) # Configure the graph partitioner with the default values partitioner_config = torch_neuronx.PartitionerConfig() # Trace a neural network with graph partitioner enabled neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config) # Run inference on the partitioned model output = neuron_net(inputs) Specifying requirements ^^^^^^^^^^^^^^^^^^^^^^^ This example is very similar to the previous example, but has two differences. The unsupported sort operator is sandwiched between the ReLU activation function after the first linear layer and the second linear layer. The second difference is that we are specifying a max subgraph count of 2. .. code-block:: python import torch import torch_neuronx import torch.nn as nn import logging # adjust logger level to see what the partitioner is doing logger = logging.getLogger("Neuron") class MLP(nn.Module): def __init__( self, input_size=28 * 28, output_size=10, layers=[4096, 2048] ): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) self.relu = nn.ReLU() def forward(self, x): f1 = self.fc1(x) r1 = self.relu(f1) sort_r1,_ = torch.sort(r1) f2 = self.fc2(sort_r1) r2 = self.relu(f2) f3 = self.fc3(r2) out = torch.log_softmax(f3, dim=1) return out n = MLP() n.eval() inputs = torch.rand(32,784) # Configure the graph partitioner with the default values partitioner_config = torch_neuronx.PartitionerConfig(max_subgraph_count=2) # This trace will fail since the min_subgraph_size requirement can't be satisfied by the graph partitioner neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config) Output: .. code-block:: ValueError: The partitioner has found 3 subgraphs which exceeds the specified max subgraph count of 2. This example fails because the sort operator placement generates 3 subgraphs, which is more than 2. Specifying additional operators to partition ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This example shows a situation where we want to partition out the log_softmax operator despite it being supported. We also specify an 80% support percentage threshold. .. code-block:: python import torch import torch_neuronx import torch.nn as nn import logging # adjust logger level to see what the partitioner is doing logger = logging.getLogger("Neuron") logger.setLevel(logging.INFO) class MLP(nn.Module): def __init__( self, input_size=28 * 28, output_size=10, layers=[4096, 2048] ): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) self.relu = nn.ReLU() def forward(self, x): f1 = self.fc1(x) r1 = self.relu(f1) f2 = self.fc2(r1) r2 = self.relu(f2) f3 = self.fc3(r2) out = torch.log_softmax(f3, dim=1) sort_out,_ = torch.sort(out) return sort_out n = MLP() n.eval() inputs = torch.rand(32,784) # Configure the graph partitioner with the default values partitioner_config = torch_neuronx.PartitionerConfig(min_operator_percentage_threshold=0.8,ops_to_partition=set(["aten::log_softmax"])) # This trace succeeds neuron_net = torch_neuronx.trace(n, inputs, partitioner_config=partitioner_config) Key Output logs: .. code-block:: ... Neuron: The following operations are currently supported: Neuron: aten::linear Neuron: aten::relu Neuron: aten::log_softmax Neuron: The following operations are currently not supported: Neuron: aten::sort, unsup.py(28): ... Neuron: 85.71% of arithmetic operations (6 of 7) are supported Neuron: Num Partitions: 2 Neuron: Creating Partition #1 for device: Device.NEURON Neuron: The following operators will be included in this partition: Neuron: prim::GetAttr:9 Neuron: aten::linear:3 Neuron: aten::relu:2 ... Neuron: Creating Partition #2 for device: Device.CPU Neuron: The following operators will be included in this partition: Neuron: prim::Constant:4 Neuron: aten::sort:1 Neuron: aten::log_softmax:1 Notice that we still report that ``aten::log_softmax`` is still supported, but also report that ``aten::log_softmax`` is in Partition #2 which is for ``Device.CPU``. ================================================ FILE: about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.rst ================================================ .. _neuron_llm_inference: Generative LLM inference with Neuron ==================================== .. contents:: Table of contents :local: :depth: 2 Background ---------- Large Language Models (LLMs) generate human-like text through a process known as generative inference. Fundamentally, given an input prompt, generative LLM inference generates text outputs, by iteratively predicting the next token in a sequence. These models typically take a sequence of integers as input, which represent a sequence of tokens (words/subwords), and generate a prediction for the next token to be emitted. Below is a simple example that illustrates this in code: .. code-block:: python # Vocabulary of tokens the model can parse. The position of each token in the # vocabulary is used as the token_id (an integer representing that token) vocab = ["having", "I", "fun", "am", "learning", ".", "Neuron"] # input token_ids: list of integers that represent the input tokens in this # case: "I", "am", "having", "fun" input_token_ids = [1, 3, 0, 2] # The LLM gets a vector of input token_ids, and generates a probability-distribution # for what the output token_id should be (with a probability score for each token_id # in the vocabulary) output = LLM(input_token_ids) # by taking argmax on the output, we effectively perform a 'greedy sampling' process, # i.e. we choose the token_id with the highest probability. Other sampling techniques # also exist, e.g. Top-K. By choosing a probabilistic sampling method we enable the model # to generate different outputs when called multiple times with the same input. next_token_id = np.argmax(output) # map the token_id back into an output token next_token = vocab[next_token_id] To generate entire sentences, the application iteratively invokes the LLM to generate the next token's prediction, and at each iteration we append the predicted token back into the input: .. code-block:: python def generate(input_token_ids, n_tokens_to_generate): for _ in range(n_tokens_to_generate): # decode loop output = LLM(input_token_ids) # model forward pass next_token_id = np.argmax(output) # greedy sampling if (next_token_id == EOS_TOK_ID) break # break if generated End Of Sentence (EOS) # append the prediction to the input, and continue to the next out_token input_token_ids.append(int(next_token_id)) return input_token_ids[-n_tokens_to_generate :] # only return generated token_ids input_token_ids = [1, 3] # "I" "am" output_token_ids = generate(input_tokens_ids, 4) # output_token_ids = [0, 2, 4, 6] output_tokens = [vocab[i] for i in output_token_ids] # "having" "fun" "learning" “Neuron” This process, of predicting a future value (regression) and adding it back into the input (auto), is sometimes referred to as autoregression. For more details, Jay Mody’s \ `GPT in 60 Lines of NumPy `__\ is an excellent writeup on GPTs (Generative Pre-trained Transformers). Performance optimizations ------------------------- The sheer size of state-of-the-art LLMs, as well as the sequential nature of text generation, poses multiple challenges for efficient generative LLM deployment. First, the model is typically sharded across multiple devices, in order to fit the model in device memory. This creates communication overhead and complexity among devices. Secondly, certain deployments have strict application-level latency bounds, thus requiring substantial latency optimizations. This is especially challenging, due to the sequential nature of token-by-token generation. Finally, generating one token at a time often leads to poor device utilization, due to low arithmetic intensity, which can be improved via batching (see :ref:`what_batch_size_to_use`). The Neuron SDK provides several built-in optimizations, allowing you to extract optimal performance when deploying LLM models, including: KV-caching: ^^^^^^^^^^^ The `transformers-neuronx `__ library implements KV-cache optimization, which saves compute resources by reusing previously calculated SelfAttention key-value pairs, instead of recalculating them for each generated token. To illustrate this concept, see the inner workings of the MaskedSelfAttention operator in the figure below. At each token generation step, the Query vector of a single current token is multiplied by the Key vectors of all previous tokens in the sequence to create attention scores and these scores are further multiplied by the Value vectors of all previous tokens. .. image:: /images/masked-self-attention-operator.png The core idea behind this optimization is that instead of re-computing the Key and Value vectors for all previous tokens at each token generation step, Neuron can perform only incremental computation for the current token and re-use previously computed Key/Value vectors from the KV-cache. The Key/Value vector of the current token is also appended to the KV-cache, for the next token generation step. .. image:: /images/kv-cache-optimization.png Note that the first token in the output sequence is unique in two ways: .. container:: - No KV-cache is available at this point. - Neuron needs to compute the entire KV-cache for tokens (the input prompt), rather than one incremental KV-cache entry. This means that first-token latency is typically higher than the following tokens. Model sharding: ^^^^^^^^^^^^^^^ Neuron enables you to shard the model across devices via Tensor Parallelism, Pipeline Parallelism (coming soon), or a combination of the two (coming soon). Tensor Parallelism shards each layer across multiple devices, enabling you to achieve the optimal latency. Pipeline Parallelism places different layers on different devices and creates a pipeline between them (as the name suggests) and is useful mainly when optimizing throughput and/or cost-per-inference. To find the optimal Tensor/Pipeline parallelism configuration for your model, see the :ref:`model_partitioning` section. Computation/communication overlap: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Neuron compiler automatically fuses Collective Communication primitives (e.g., AllReduce) with the following computation (e.g., GEMM) in the compute graph. This helps minimize any overhead caused by sharding the model across devices. Compact data-types: ^^^^^^^^^^^^^^^^^^^ Neuron supports INT8 and FP8 (coming soon), which can significantly reduce the model's memory bandwidth and capacity requirements. This is especially useful for Generative LLM inference, which is typically memory-bound. Therefore, using a compact data-type can improve the overall LLM inference performance with lower latency and higher throughput. Bucketing: ^^^^^^^^^^ The transformers-neuronx library automatically uses bucketing to process the input prompt and output tokens. Bucketing makes it possible to handle variable sequence lengths, without requiring support for dynamic shapes. Using multiple progressively larger buckets helps minimize the portion of the KV-cache that needs to be read for each token. .. _model_partitioning: Model partitioning ------------------ How many NeuronCores do I need? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Transformer models are typically defined via a hyper-parameter configuration, such as the following: .. code-block:: python { "n_vocab": 50257, # number of tokens in our vocabulary "n_ctx": 2048, # maximum possible sequence length of the input "n_embd": 9216, # embedding dimension (determines the "width" of the network) "n_head": 72, # number of attention heads (n_embd must be divisible by n_head) "n_layer": 64 # number of layers (determines the "depth" of the network) } To determine the number of NeuronCores needed to fit the model, perform the following calculation: .. code-block:: python weight_mem_footprint = 12 x x ^2 x KV_cache_mem_footprint = x x x x 2 x # is 2 for BF16/FP16, or 1 for FP8/INT8 mem_footprint = weight_mem_footprint + KV_cache_mem_footprint And from here, determining the number of NeuronCores is straightforward: .. code-block:: python num_neuron_cores = ceil_to_closest_supported_size (mem_footprint / , ) # 16GiB per Inferentia2/Trainium1 NeuronCore For example, when running OPT-66B on Inf2, with a batch-size of 16, the number of required NeuronCores can be computed as follows. .. code-block:: python # OPT-66B example (BF16, Inf2) # n_layer=64, n_ctx=2048, n_embd=9216, batch=16 weight_mem_footprint = 12 x 64 x 9216^2 x 2 = 121.5 GiB KV_cache_mem_footprint = 16 x 64 x 2048 x 9216 x 2 x 2 = 72 GiB mem_footprint = 121.5GiB + 72GiB = 193.5 GiB num_neuron_cores = ceil_to_closest_supported_size (193.5GiB / 16GiB, Inf2) = ceil_to_closest_supported_size (12.1) = 24 ## Currently, the Neuron runtime supports tensor-parallelism degrees 2, 8, and 32 on Trn1 ## and supports tensor-parallelism degrees 2, 4, 8, 12 and 24 on Inf2. Use the :ref:`neuron_calculator` to compute the number of cores needed for a custom hyper-parameter configuration. Which parallelism technique should I use? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Tensor parallelism improves latency, at the expense of increased intra-layer communication. Thus, as a general rule, it is recommended to use the smallest tensor parallelism degree that meets your latency requirement and then use pipeline/data parallelism from that point on. If latency is not a major concern in your application (e.g., model evaluation) and the primary goal is to maximize throughput (i.e., minimize total cost per token), then it is most efficient to use pipeline parallelism and increase the batch-size as much as possible. .. _what_batch_size_to_use: What batch-size should I use? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Due to the serial token generation nature of generative LLM inference, this workload tends to be extremely memory bound. This means that throughput (and thus cost per inference) improves significantly by batching. As a general rule, we recommend increasing the batch-size to the maximum amount that fits within the latency budget (up to batch=256. A larger batch-size typically does not help with performance.) Note that the KV-cache grows linearly with the batch-size and can grow until it runs out of memory (typically referred to as OOM). If the latency budget allows, we recommend increasing the batch-size to the maximum value that does not result in OOM. Users may also consider pipelining the model beyond what is necessary to fit model parameters / KV-cache on devices, in order to free up device-memory space and thus allow the batch-size to increase without causing OOM issues. ================================================ FILE: about-neuron/arch/glossary.rst ================================================ .. _neuron_hw_glossary: Neuron Glossary =============== .. contents:: Table of contents :local: :depth: 2 Terms ----- Neuron Devices (Accelerated Machine Learning chips) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Term - Description * - .. glossary:: Inferentia - AWS first generation accelerated machine learning chip supporting inference only * - .. glossary:: Trainium/Inferentia2 - AWS second generation accelerated machine learning chip supporting training and inference * - .. glossary:: Trainium2 - AWS second generation accelerated machine learning chip supporting training and inference * - .. glossary:: Neuron Device - Accelerated machine learning chip (e.g. Inferentia or Trainium) Neuron powered Instances ^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Term - Description * - .. glossary:: Inf1 - Inferentia powered accelerated compute EC2 instance * - .. glossary:: Trn1 - Trainium powered accelerated compute EC2 instance * - .. glossary:: Inf2 - Inferentia2 powered accelerated compute EC2 instance * - .. glossary:: Trn2 - Trainium2 powered accelerated compute EC2 instance NeuronCore terms ^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Term - Description * - .. glossary:: NeuronCore - The machine learning compute cores within Inferentia/Trainium * - .. glossary:: NeuronCore-v1 - Neuron Core within Inferentia * - .. glossary:: NeuronCore-v2 - Neuron Core within Trainium1/Inferentia2 * - .. glossary:: NeuronCore-v3 - Neuron Core within Trainium2 * - .. glossary:: Tensor Engine - 2D systolic array (within the NeuronCore), used for matrix computations * - .. glossary:: Scalar Engine - A scalar-engine within each NeuronCore, which can accelerate element-wise operations (e.g. GELU, ReLU, reciprocal, etc) * - .. glossary:: Vector Engine - A vector-engine with each NeuronCore, which can accelerate spatial operations (e.g. layerNorm, TopK, pooling, etc) * - .. glossary:: GPSIMD Engine - Embedded General Purpose SIMD cores, within each NeuronCore, to accelerate custom-operators * - .. glossary:: Sync Engine - The SP engine, which is integrated inside NeuronCore. Used for synchronization and DMA triggering. * - .. glossary:: Collective Communication Engine - Dedicated engine for collective communication, allows for overlapping computation and communication * - .. glossary:: High Bandwidth Memory - `High Bandwidth Memory `_, used as device memory for NeuronCore-v2 and beyond. * - .. glossary:: State Buffer - The main software-managed on-chip memory in NeuronCore-v1 and beyond. * - .. glossary:: Partial Sum Buffer - A second software-managed on-chip memory in NeuronCore-v1 and beyond, with near-memory accumulation support for TensorE output data. * - .. glossary:: NeuronLink - Interconnect between NeuronCores * - .. glossary:: NeuronLink-v1 - Interconnect between NeuronCores in Inferentia device * - .. glossary:: NeuronLink-v2 - Interconnect between NeuronCores in Trainium1/Inferentia2 device * - .. glossary:: NeuronLink-v3 - Interconnect between NeuronCores in Trainium2 device Neuron SDK terms ^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Term - Description * - .. glossary:: Neuron Kernel Interface - A bare-metal language and compiler for directly programming Neuron devices available on AWS Trainium/Inferentia2 and beyond devices. Abbreviations ------------- .. list-table:: :widths: auto :header-rows: 1 :align: left * - Abbreviation - Description * - .. glossary:: NxD Core - NeuronX Distributed Core Library * - .. glossary:: NxD Training - NeuronX Distributed Training Library * - .. glossary:: NxD Inference - NeuronX Distributed Inference Library * - .. glossary:: NC - Neuron Core * - .. glossary:: NeuronCore - Neuron Core * - .. glossary:: ND - Neuron Device * - .. glossary:: NeuronDevice - Neuron Device * - .. glossary:: TensorE - Tensor Engine * - .. glossary:: ScalarE - Scalar Engine * - .. glossary:: VectorE - Vector Engine * - .. glossary:: GpSimdE - GpSimd Engine * - .. glossary:: CCE - Collective Communication Engine * - .. glossary:: HBM - High Bandwidth Memory * - .. glossary:: SBUF - State Buffer * - .. glossary:: PSUM - Partial Sum Buffer * - .. glossary:: FP32 - Float32 * - .. glossary:: TF32 - TensorFloat32 * - .. glossary:: FP16 - Float16 * - .. glossary:: BF16 - Bfloat16 * - .. glossary:: cFP8 - Configurable Float8 * - .. glossary:: RNE - Round Nearest Even * - .. glossary:: SR - Stochastic Rounding * - .. glossary:: NKI - Neuron Kernel Interface * - .. glossary:: CustomOps - Custom Operators * - .. glossary:: RT - Neuron Runtime * - .. glossary:: DP - Data Parallel * - .. glossary:: DPr - Data Parallel degree * - .. glossary:: TP - Tensor Parallel * - .. glossary:: TPr - Tensor Parallel degree * - .. glossary:: PP - Pipeline Parallel * - .. glossary:: PPr - Pipeline Parallel degree ================================================ FILE: about-neuron/arch/index.rst ================================================ .. _neuron-architecture-index: .. meta:: :description: Explore the hardware architecture of AWS Neuron instances, including EC2 Trn and Inf instance types, AWS Inferentia and Trainium chips, and NeuronCore processing units. Learn about system specifications, memory hierarchies, interconnect topologies, and architectural considerations for machine learning workloads. :date-modified: 2025-10-03 AWS Neuron architecture guides ============================== Review and understand the hardware architecture of AWS Neuron instances, including AWS Elastic Compute Cloud (EC2) ``Trn`` and ``Inf`` instance types, AWS Inferentia and Trainium chips, and NeuronCore processing units. The documentation covers system specifications, memory hierarchies, interconnect topologies, and architectural considerations for machine learning workloads. About Neuron Hardware ---------------------- AWS Neuron hardware consists of custom-designed machine learning accelerators optimized for deep learning workloads. This section covers the architecture and capabilities of AWS Inferentia and Trainium chips, their NeuronCore processing units, and the EC2 instances that host them. Trainium Architecture ---------------------- .. grid:: 2 :gutter: 2 .. grid-item-card:: AWS Trainium3 :link: neuron-hardware/trainium3 :link-type: doc :class-body: sphinx-design-class-title-small Third-generation training accelerator chip .. grid-item-card:: AWS Trainium2 :link: neuron-hardware/trainium2 :link-type: doc :class-body: sphinx-design-class-title-small Second-generation training accelerator chip .. grid-item-card:: AWS Trainium :link: neuron-hardware/trainium :link-type: doc :class-body: sphinx-design-class-title-small First-generation training accelerator chip Inferentia Architecture ------------------------ .. grid:: 2 :gutter: 2 .. grid-item-card:: AWS Inferentia2 :link: neuron-hardware/inferentia2 :link-type: doc :class-body: sphinx-design-class-title-small Second-generation inference accelerator chip .. grid-item-card:: AWS Inferentia :link: neuron-hardware/inferentia :link-type: doc :class-body: sphinx-design-class-title-small First-generation inference accelerator chip NeuronCore Architecture ------------------------ NeuronCores are fully-independent heterogenous compute-units that power Tranium, Tranium2, Inferentia, and Inferentia2 chips. .. grid:: 2 :gutter: 2 .. grid-item-card:: NeuronCore v4 :link: neuron-hardware/neuron-core-v4 :link-type: doc :class-body: sphinx-design-class-title-small Processing unit architecture for Trainium3 .. grid-item-card:: NeuronCore v3 :link: neuron-hardware/neuron-core-v3 :link-type: doc :class-body: sphinx-design-class-title-small Processing unit architecture for Trainium2 .. grid-item-card:: NeuronCore v2 :link: neuron-hardware/neuron-core-v2 :link-type: doc :class-body: sphinx-design-class-title-small Processing unit architecture for Inferentia2 and Trainium .. grid-item-card:: NeuronCore v1 :link: neuron-hardware/neuron-core-v1 :link-type: doc :class-body: sphinx-design-class-title-small Processing unit architecture for Inferentia Neuron AWS EC2 Platform Architecture ------------------------------------- Overviews of the AWS Inf and Trn instance and UltraServer architectures. .. grid:: 2 :gutter: 2 .. grid-item-card:: Inf1 Architecture :link: neuron-hardware/inf1-arch :link-type: doc :class-body: sphinx-design-class-title-small Inf1 instance architecture and specifications .. grid-item-card:: Inf2 Architecture :link: neuron-hardware/inf2-arch :link-type: doc :class-body: sphinx-design-class-title-small Inf2 instance architecture and specifications .. grid-item-card:: Trn1 Architecture :link: neuron-hardware/trn1-arch :link-type: doc :class-body: sphinx-design-class-title-small Trn1 instance architecture and specifications .. grid-item-card:: Trn2 Architecture :link: neuron-hardware/trn2-arch :link-type: doc :class-body: sphinx-design-class-title-small Trn2 instance architecture and specifications .. grid-item-card:: Trn3 Architecture :link: neuron-hardware/trn3-arch :link-type: doc :class-body: sphinx-design-class-title-small Trn3 instance architecture and specifications .. toctree:: :maxdepth: 1 :hidden: AWS Inferentia AWS Inferentia2 AWS Trainium AWS Trainium2 AWS Trainium3 NeuronCore v1 NeuronCore v2 NeuronCore v3 NeuronCore v4 Inf1 Architecture Inf2 Architecture Trn1 Architecture Trn2 Architecture Trn3 Architecture ================================================ FILE: about-neuron/arch/neuron-features/custom-c++-operators.rst ================================================ .. _feature-custom-c++-operators: Neuron Custom C++ Operators =========================== .. include:: /neuron-customops/customops-intro.txt For more details see :ref:`neuron_c++customops` ================================================ FILE: about-neuron/arch/neuron-features/data-types.rst ================================================ .. _neuron-data-types: Data Types ========== .. contents:: Table of contents :local: :depth: 2 Introduction ------------ Inferentia and Trainium NeuronDevices include different NeuronCore versions, which support different data-types. This section describes what data-types are supported in each NeuronCore version. NeuronCore v1 Data Types ------------------------ Neuron Data-Types ^^^^^^^^^^^^^^^^^ Neuron enables developers to choose from multiple data-types. The supported data-types are FP32, FP16, and BF16. Developers can train their models on their platform of choice (e.g. EC2 P3 instances), and then easily move their trained models to EC2 Inf1 for execution. .. raw:: html
Data Type S Range Precision
FP32 1 8 bits 23 bits
BF16 1 8 bits 7 bits
FP16 1 5 bits 10 bits

FP16/BF16 models ~~~~~~~~~~~~~~~~ Models natively trained in FP16/BF16 will be executed in their trained data-types. This is a straightforward migration from the training platform to Inf1. FP32 models ~~~~~~~~~~~ Neuron SDK supports **automatic model conversion** from FP32 to BF16 by default. This capability allows developers to train their models using FP32 format for the highest accuracy, and achieve performance benefits without having to worry about low-precision training (e.g. no need for loss-scaling during training). ML models are typically robust to FP32 to BF16 conversion, with minimal to no impact on accuracy. The conversion accuracy is model dependent; therefore, users are encouraged to benchmark the accuracy of the auto-converted model against the original FP32 trained model. When the compiler is supplied with an unmodified FP32 model input it will automatically compile the model to run as BF16 on Inferentia. During inference the FP32 input data will be auto-converted internally by Inferentia to BF16 and the output will be converted back to FP32 data-type. For explicit FP16 inferencing, either use an FP16 trained model, or use an external tool (like AMP) to make the explicit conversions. .. _neuron-data-types-v2: NeuronCore v2 Data Types ------------------------ The NeuronCore v2 supports the following data types: * 32 and 16-bit Floating Point (FP32 / FP16) * TensorFloat-32 (TF32) * Brain Floating Point (BFloat16) * 8-bit Floating point with configurable range and precision (cFP8) * Unsigned 8-bit integer (UINT8) The layout for these is as follows: .. raw:: html
Data Type S Range Precision
FP32 1 8 bits 23 bits
TF32 1 8 bits 10 bits
BF16 1 8 bits 7 bits
FP16 1 5 bits 10 bits
FP8_e5m2 1 5 bits 2 bits
FP8_e4m3 1 4 bits 3 bits
FP8_e3m4 1 3 bits 4 bits
UINT8 8 bits

Model Type Conversion ^^^^^^^^^^^^^^^^^^^^^ The Neuron SDK supports automatic model conversion from FP32 to BF16 by default. This capability allows developers to train their models using FP32 format for the highest accuracy, and then achieve run-time performance benefits without having to worry about low-precision training (e.g. no need for loss-scaling during training). ML models are typically robust to FP32 to BF16 conversion, with minimal to no impact on accuracy. Since conversion accuracy is model dependent, users are encouraged to benchmark the accuracy of the auto-converted model against the original FP32 trained model. See :ref:`Mixed Precision and Performance-accuracy Tuning for Training` for more details on supported data types and their properties. The Neuron compiler offers the ``--auto-cast`` and ``--auto-cast-type`` options to specify automatic casting of FP32 tensors to other data types to address performance and accuracy tradeoffs. See the :ref:`Neuron Compiler CLI Reference Guide` for a description of these options. NeuronCore v2 Rounding Modes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Because floating point values are represented by a finite number of bits, they cannot represent all real numbers accurately. Floating point calculations that exceed their defined data type size are rounded. The NeuronCore v2 performs a Round-to-Nearest (RNE) algorithm with ties to Even by default. It also provides a new Stochastic Rounding mode. When Stochastic Rounding is enabled, the hardware will round the floating point value up or down using a proportional probability. This could lead to improved model convergence. Use the environment variable NEURON_RT_STOCHASTIC_ROUNDING_EN to select a rounding mode. ================================================ FILE: about-neuron/arch/neuron-features/index.rst ================================================ .. _neuron-features-index: Neuron Features =============== Neuron features provide insights into Neuron capabilities that enable high-performance and improve usability of developing and deploying deep learning acceleration on top of Inferentia and Trainium based instances. .. grid:: 2 :gutter: 2 .. grid-item-card:: Custom C++ operators :link: custom-c++-operators :link-type: doc :class-body: sphinx-design-class-title-small Framework for implementing custom operators in C++ to extend Neuron's built-in operation support. .. grid-item-card:: Data types :link: data-types :link-type: doc :class-body: sphinx-design-class-title-small Supported numerical data types including FP32, FP16, BF16, and INT8 for efficient model execution. .. grid-item-card:: Logical NeuronCore configuration :link: logical-neuroncore-config :link-type: doc :class-body: sphinx-design-class-title-small Configuration options for grouping and managing NeuronCores as logical units for workload distribution. .. grid-item-card:: Neuron persistent cache :link: neuron-caching :link-type: doc :class-body: sphinx-design-class-title-small Persistent caching system for compiled models to reduce compilation time across sessions. .. grid-item-card:: NeuronCore batching :link: neuroncore-batching :link-type: doc :class-body: sphinx-design-class-title-small Batching strategies to maximize throughput by processing multiple inputs simultaneously on NeuronCores. .. grid-item-card:: NeuronCore pipeline :link: neuroncore-pipeline :link-type: doc :class-body: sphinx-design-class-title-small Pipeline execution model that overlaps computation and data movement for improved performance. .. grid-item-card:: Rounding modes :link: rounding-modes :link-type: doc :class-body: sphinx-design-class-title-small Configurable numerical rounding modes for controlling precision and accuracy in computations. .. toctree:: :maxdepth: 1 :hidden: Custom C++ operators Data types Logical NeuronCore configuration Neuron persistent cache NeuronCore batching NeuronCore pipeline Rounding modes ================================================ FILE: about-neuron/arch/neuron-features/logical-neuroncore-config.rst ================================================ .. _logical-neuroncore-config: ################################ Logical NeuronCore configuration ################################ Logical NeuronCore configuration (LNC) is a set of compiler and runtime settings for instances powered by AWS Trainium2 that determines the number of NeuronCores exposed to your machine learning (ML) applications. LNC configuration works by combining the compute and memory resources of multiple physical NeuronCores into a single logical NeuronCore. You can configure these settings to reduce the number of worker process needed for training and deployment of large-scale models. .. important:: LNC can only be set to **1** or **2**. These are the only supported values. On Trn2, each chip has 8 physical NeuronCores. With LNC=2 (default), these are grouped into 4 Logical NeuronCores. With LNC=1, all 8 physical cores are treated as individual logical NeuronCores. LNC applies only to Trn2 and Trn3 instances. .. contents:: Concepts :depth: 1 :local: :backlinks: none =================== Logical NeuronCores =================== A logical NeuronCore is a grouping of physical NeuronCores that the Neuron Compiler, Neuron Runtime, Neuron Tools, and Frameworks handle as a single unified NeuronCore. Every Trainium2 device contains eight physical NeuronCore-v3. ============================= Compiler and runtime settings ============================= LNC configuration is controlled with the following runtime and compiler settings: | **Neuron Runtime** | The ``NEURON_LOGICAL_NC_CONFIG`` runtime environment variable controls how many physical NeuronCores are grouped to make up a logical NeuronCore. | **Neuron compiler flags** | The ``--logical-nc-config`` or ``-lnc`` command-line options control the degree of model sharding the compiler performs on an input graph. You must compile your Models to use the LNC configuration set by the Neuron Runtime environment variable. AWS Neuron currently doesn't support setting the compiler flag to a different LNC configuration than the Neuron Runtime environment variable. ================================= Logical NeuronCore configurations ================================= AWS Neuron supports the following Logical NeuronCore configurations: .. tab-set:: .. tab-item:: LNC = 2 A Logical NeuronCore configuration (LNC) of two is the default setting on Trainium2 devices. It combines two physical NeuronCore-v3 into a logical NeuronCore with the software id ``NC_V3d``. When you set Logical NeuronCore configuration to two, it directs Trainium2 devices to expose four ``NC_v3d`` to your machine learning applications. On this setting, a ``Trn2.48xlarge`` instance presents 64 available NeuronCores. The folowing high-level diagram shows a ``Trn2.48xlarge`` instance, connected in a 2D torus topology, with the Logical NeuronCore configuration set to two. .. image:: /images/architecture/Trn2/trn2_lnc2.png :align: center :width: 750 | Trainium2 devices contain four 24GB HBM banks. Each bank is shared by two physical NeuronCore-v3. When LNC=2, the two physical NeuronCores share a single address space. Workers on each of the two physical NeuronCores can access tensors and perform local collective operations without accessing the network. The following diagram shows how a logical NeuronCore is presented to the software under this configuration. .. image:: /images/architecture/NeuronCore/lnc_2.png :align: center :width: 450 | To set the Logical NeuronCore configuration to two, use the following runtime and compiler flag combination: | **Runtime environment variable:** | ``NEURON_LOGICAL_NC_CONFIG`` = 2 | **Compiler flag:** | ``-lnc`` = 2 | .. tab-item:: LNC = 1 When you set the Logical NeuronCore configuration to one, it assigns each physical NeuronCore-v3 to a single logical NeuronCore with the software id ``NC_V3``. This directs Trainium2 devices to expose eight ``NC_v3`` to your machine learning applications. On this setting, a ``Trn2.48xlarge`` instance presents 128 available NeuronCores. The following high-level diagram shows a ``Trn2.48xlarge`` instance, connected in a 2D torus topology, with the Logical NeuronCore configuration set to one. .. image:: /images/architecture/Trn2/trn2_lnc1.png :align: center :width: 750 | Trainium2 devices contain four 24GB HBM banks. Each bank is shared by two physical NeuronCore-v3. When the Logical NeuronCore configuration is set to one, both physical NeuronCores have access to the entire 24GB HBM bank. The following diagram shows how logical NeuronCores are presented to the software under this configuration. .. image:: /images/architecture/NeuronCore/lnc_1.png :align: center :width: 475 | To set the Logical NeuronCore configuration to one, use the following runtime and compiler flag combination: | **Runtime environment variable:** | ``NEURON_LOGICAL_NC_CONFIG`` = 1 | **Compiler flag:** | ``-lnc`` = 1 | ================================================ FILE: about-neuron/arch/neuron-features/neuron-caching.rst ================================================ .. _neuron-caching: Neuron Persistent Cache ======================= PyTorch Neuron (``torch-neuronx``) uses ``torch-xla``, and ``torch-xla`` operates in lazy mode. In other words, every operation in training script is recorded in a graph. The graph is executed only when the results are requested by the user when they use ``print`` or ``xm.mark_step``. Requesting results tells ``torch-xla`` that the recorded graph needs to be executed. Before executing the graph on a Neuron device, ``torch-xla`` would call Neuron Compiler (``neuronx-cc``) to compile the graph into Neuron specific graph. Then the graph is executed on the NeuronCores. Compiling the graph involves running optimizations that can make use of the NeuronCores efficiently. Running these optimizations can be expensive and can result in long compile times. To save the users from compiling these graphs at every iteration, ``torch-xla`` maintains an in-memory cache called Just in Time (JIT) cache. When the user re-runs the same graph (eg. 2nd iteration of the training run), torch-xla would check in this JIT cache and re-use the cached compilation result, thereby avoiding the wait times. Since the JIT cache is an in-memory cache, it needs to be constructed every time the training script is run. Hence, if the user re-runs the training script, a new JIT cache is created. This causes a compilation for the first training graph. To avoid such compilations across training runs, PyTorch Neuron (``torch-neuronx``) has built an on-disk ``Neuron Persistent Cache``. Since this cache is on-disk, its persistent across training runs. So now, when a graph is compiled for the fist time, the compilation result is saved in ``Neuron Persistent Cache``. When the user re-runs the training script, since the JIT cache is not ready, it would send the graph for compilation. PyTorch Neuron (``torch-neuronx``) would then check if the compiled result is present in the ``Neuron Persistent Cache``, if yes, it would return with the compiled result. This on-disk cache thereby avoids compilations across training runs. This cache is enabled by default for Neuron's PyTorch/XLA flow (training) as well as transformers-neuronx LLM inference package. The default cache path is the directory ``/var/tmp/neuron-compile-cache``. Look at the diagram below on the end to end flow: |Image:| As seen from the diagram, the operations are recorded in a graph in lazy mode and only when a mark_step is hit, the graph is executed. Before execution, the graph passes through two caches to check if we have compiled the graph sometime in the past. If yes, we reuse the compilation result and execute with it. This avoid duplicate compilations. One thing to note, both JIT cache and Neuron Cache are complementary to each other. JIT cache prevents duplicate compilation within a run and Neuron Cache prevents duplicate compilations across training runs. For example, within a training script, we have a training loop that iterates through the dataset. The first iteration would trace a unique graph and the following iteration would trace a graph that is similar to the first one. In this case, the subsequent iterations would hit the JIT cache and reuse the result. However, to save users from compiling for the first iteration graph, ``Neuron Persistent Cache`` would be used. In this case, the very first time when the script is run, the ``Neuron Persistent Cache`` would be updated. Going forward when we re-run the training script, compilation results from ``Neuron Persistent Cache`` would be used. To better understand how ``Neuron Persistent Cache`` works, consider the example below: .. code:: python import torch import torch_xla import torch_xla.core.xla_model as xm device = xm.xla_device() t1 = torch.randn(3, 3).to(device) t2 = t1 / 0.5 x = t2.cpu() Running the above example produces the following logs: .. code:: bash 2023-08-25 21:51:36.000433: INFO ||NCC_WRAPPER||: Compile cache path: /var/tmp/neuron-compile-cache . Compiler status PASS Re-running the above script would fetch the graph from the neuron cache and you would see logs as follows: .. code:: bash 2023-08-25 21:52:23.000451: INFO ||NCC_WRAPPER||: Compile cache path: /var/tmp/neuron-compile-cache 2023-08-25 21:52:23.000453: INFO ||NCC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.8.0.25+a3ad0f342/MODULE_198775565831884870+d41d8cd9/model.neff. Exiting with a successfully compiled graph. As you can see, the next run picks the compiled graph from cache, thereby saving the compilation time. The cache uses hash of the Neuron compiler flags and XLA graph as the key. If the Neuron compiler version or XLA graph changes, you will see recompilation. Examples of changes that would cause XLA graph change include: - Model type and size - Batch size - Optimizer and optimizer hyperparameters - Location of xm.mark_step() To keep cache size small and to enable weights/parameters updates without recompilation, only the compute graphs are cached when using transformers-neuronx (weights/parameters are inputs to the compute graphs) and training flow using torch-neuronx's XLA (weights/parameters are inputs and outputs of the compute graphs). Note that this caching mechanism doesn't apply to the torch-neuronx trace API where the weights/parameters are frozen and converted to constants, then compiled together with the compute operations (traced graphs with frozen weights/parameters are not cached). All compilation results are saved in the cache. To disable the cache, you can pass ``--no_cache`` option via NEURON_CC_FLAGS: .. code:: python os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --no_cache' The default cache path is the directory ``/var/tmp/neuron-compile-cache``. To change the cache's location, pass ``cache_dir=`` option via ``NEURON_CC_FLAGS`` or ``NEURON_COMPILE_CACHE_URL=`` environment variables: .. code:: python os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --cache_dir=' .. code:: python os.environ['NEURON_COMPILE_CACHE_URL'] = '' The cache URL specified using ``--cache_dir`` is prioritized over that specified using ``NEURON_COMPILE_CACHE_URL`` if both are set. If ```` starts with ``s3://``, it will use the AWS S3 URL as the cache location, provided that the corresponding S3 bucket exists and is both readable and writeable. You can change the verbose level of the compiler by adding ``log_level`` to either ``WARNING``, ``INFO`` or ``ERROR``. This can be done as follows: .. code:: python os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --log_level=INFO' A graph compilation can fail because of a compilation error or an environment issue (for example, compilation is interrupted by ctrl-C). The graph would be marked as failed and subsequent rerun would encounter message like below: .. code:: bash INFO ||NCC_WRAPPER||: Got a cached failed neff at /var/tmp/neuron-compile-cache/neuronxcc-2.8.0.25+a3ad0f342/MODULE_12486829708343293975+d41d8cd9/model.neff. Will skip compilation, please set --retry_failed_compilation for recompilation. To retry compilation, add ``--retry_failed_compilation`` in ``NEURON_CC_FLAGS`` environment variable. When the script is reran, all the previously failed compilations are recompiled and fresh results are saved in the cache. .. code:: python os.environ['NEURON_CC_FLAGS'] = os.environ.get('NEURON_CC_FLAGS', '') + ' --retry_failed_compilation' Note that all flags demonstrated above will be parsed by a tool called ``neuron_cc_wrapper``, which is a wrapper over Neuron Compiler CLI to provide caching mechanism. All these flags will not be passed into Neuron Compiler CLI. .. |Image:| image:: ./images/NeuronCaching.png ================================================ FILE: about-neuron/arch/neuron-features/neuroncore-batching.rst ================================================ .. _neuron-batching: Neuron Batching =============== Batching refers to the process of grouping multiple samples together, and processing them as a group (i.e. passing them together through the neural network). Batching is typically used as an optimization for improving throughput at the expense of higher latency (and potentially higher memory footprint). Batching considerations are slightly different between inference and training workloads, and we thus cover them separately below. .. contents:: Table of contents :local: :depth: 2 Batching in inference workloads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What is batched inference? ^^^^^^^^^^^^^^^^^^^^^^^^^^ The concept of batched inference is conceptually illustrated below, with a single NeuronCore performing batched computation of a 3 layer neural network with a batch-size of 4. The NeuronCore reads the parameters for a certain layer from the external memory, and then performs the corresponding computations for all 4 inference-requests, before reading the next set of parameters (thus, performing more compute for every parameter read from memory). .. image:: /images/batched-inference.png What are the benefits of batched Inference? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For inference, batching is typically used as a trade-off knob between throughput and latency: higher batch-size typically leads to better hardware utilization and thus higher throughput, but at the same time batching requires to perform more computation until getting the first results, and hence leads to higher latency. .. image:: /images/tradeoffs.png To understand why batching tends to improve throughput (up to a certain max value), it is useful to consider an intuitive visual performance-model called ‘the roofline model’, which provides with a theoretical bound on the system’s performance: .. image:: /images/memoryvscompute.png The X-axis indicates the arithmetic intensity (AI) of the workload, which is the ratio between the number of operations and the number of bytes read-from/written-to memory. The Y-axis indicates the theoretical extractable performance. For small(large) AI values, the workload is expected to be memory(compute) bound. For inference workloads, AI is often approximated by dividing the model’s number of operations by its memory footprint (#params x dtype_size). To a first order approximate, the AI value is linearly dependent on the batch-size, which means that the workloads performance (throughput) is expected to increase with the batch-size. To understand this more intuitively, for a larger batch size, Neuron can better amortize the cost of reading parameters from the external memory, and thus improve the overall hardware efficiency. It should be noted that while the roofline model can be very useful, it is not perfectly accurate (e.g. it doesn’t take into account spill/fills from/to on-chip SRAM memories), and thus users are encouraged to use it as a tool for **estimating** the optimal batch-size for their workloads. How to determine the optimal batch-size for inference workloads? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The optimal batch size is dependent on the application-level requirements: some applications require strict latency guarantees (in which case, check out the :ref:`neuroncore-pipeline` technology), while other applications strictly aim to maximize throughput. We thus encourage our users to try out multiple batch-sizes, and compare performance between them. A good starting for batch-size exploration can be identified using the roofline model: we can choose a batch-size that achieves an Arithmetic Intensity which is at the edge of the compute bound region. By doing that, we aim to achieve max throughput with a minimal batch-size, and thus minimal impact to latency. .. image:: /images/memoryvscompute2.png This can be expressed via the following equation: ``batch-size(Inference) = ceiling[0.5 x (/) /`` ``(/(<#model-dense-params> x ))]`` (for NeuronDevice PeakFLOPS and MemBW, see the :ref:`trainium-arch`, :ref:`inferentia-arch` and :ref:`inferentia2-arch` pages. For example, a BF16 BERT-Large model, with a sequence length of 128, will have the following approximated batch sizes: .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - Model - NeuronDevice - Peak TFLOPS (BF16) - MemBW (GB/sec) - Model GFLOPs - Model Dense Params (Millions) - Data-type size (BF16) - Approximated optimal batch-size * - BERT-Large (SeqLen=128) - Inferentia - 64 - 50 - 77.3 - 302 - 2 - 6 * - BERT-Large (SeqLen=128) - Trainium - 210 - 820 - 77.3 - 302 - 2 - 2 * - ResNet-50 - Inferentia - 64 - 50 - 7.8 - 25 - 2 - 5 * - ResNet-50 - Trainium - 210 - 820 - 7.8 - 25 - 2 - 1 We recommend to evaluate multiple batch sizes and compare the performance between them, in order to determine the optimal latency/throughput deployment-point. How to set the batch-size? ^^^^^^^^^^^^^^^^^^^^^^^^^^ The Neuron compiler takes a model and its sample input, as inputs for the compilation process. For example, the code snippet below will compile a model with a batch-size of 4: .. code:: import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input of batch size 4 image = torch.rand([4, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image, dynamic_batch_size=True) # Execute with a batch of 12 images batch = torch.rand([12, 3, 224, 224]) results = model_neuron(batch) For ahead-of-time compiled inference graphs (i.e. Inf1), dynamic batching can be used (as shown in the above code snippet) to process a larger client-side inference batch-size, and allow the framework to automatically break up the user-batch (12 in our case) into smaller batch sizes, to match the compiled batch-size (4 in our case). This technique increases the achievable throughput by hiding the framework-to-neuron overhead, and amortizing it over a larger batch size. .. seealso:: - :ref:`torch-neuronx-dynamic-batching` in ``torch-neuronx`` - :ref:`tensorflow-neuronx-special-flags` in ``tensorflow-neuronx``. Batching in training workloads ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Unlike inference workloads, training is inherently an offline process, and thus doesn’t have latency requirements. This means that training is almost always batched to some degree. Batch-size naming ^^^^^^^^^^^^^^^^^ For distributed processing, defining the batch size depends on the observation level. There are multiple terms you should be aware of when running a distributed training job, especially global batch size (GBS) and micro-batch. Knowing the batch size in advance is crucial for precompiling the computational graph and for setting the hyperparameters. micro-batch size Smallest unit of the number of samples getting processed in a single step in the accelerator. For very large models, it is frequently chosen to be 1. gradient accumulation Process of iterating over a micro-batch multiple times and summing up the gradients before an optimizer update. This can happen in a dedicated loop for gradient accumulation or as part of multiple iterations of samples in pipeline parallelism. See :ref:`pp_developer_guide` for more details on pipeline parallelism. data-parallel size (or DP degree) Number of model replicas that process different portions of data in parallel. Each replica maintains a complete copy of the model while processing unique data chunks, after which their gradients are synchronized for the optimizer update. See :ref:`neuron_hw_glossary` for more details. global batch-size Number of total samples used for an update of the optimizer. This includes all the respective gradients that get added up from data-parallel processing or gradient accumulation. :literal:`global batch size = micro_batch_size * data_parallel_size * gradient_accumulation_steps` mini-batch or replica-batch size Number of samples that contribute to a gradient within one data-parallel rank. A mini-batch gradient is obtained by aggregating multiple micro-batch gradients within or without a pipeline (aka. gradient accumulation). :literal:`mini_batch_size = micro_batch_size * gradient_accumulation_steps` worker batch The portion of mini-batch samples processed by a worker. The idea behind a worker batch is that one worker (node) might have a subset of the dp-degrees and we care about how much data gets tackled by this worker. How to determine the optimal batch-size for training workloads? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Determining the optimal batch-size for training workloads can be a non-trivial task. In most cases, we’d want to choose the largest batch-size that we can get away with. The most dominant factor for determining the optimal batch-size in training workloads is memory footprint: training workloads have higher memory footprint compared to inference, as they require saving more tensors aside from the model parameters, such as gradients, intermediate activations (passed between forward-pass and backward-pass), and optimizer-state. If the batch-size is increased beyond a certain point, one can run out of device memory (indicated by an ‘Out of device memory’ error, typically abbreviated as OOM). To estimate the memory footprint of a model, we look at the different contributors: 1. Weights and gradients: 1. typically 2B each, thus 4B per parameter 2. Optimizer state: 1. typically 4B - 12B per parameter 3. Intermediate activations: 1. sum of all tensor sizes for forward pass 2. for example, for a transformer neural network, this is roughly 16 x x x x x = 100MB x For training workloads, determining the optimal batch size can be a little more tricky, due to two reasons: 1. *Higher memory footprint:* Training workloads have higher memory footprint compared to inference, as they require saving more tensors aside from the model parameters, such as gradients, intermediate-state and optimizer-state. If the batch-size is increased too much, one can run out of device memory (indicated by an ‘Out of memory’ error, typically abbreviated as OOM). 2. *Arithmetic intensity estimation:* Arithmetic intensity is harder to estimate in training workloads, compared to inference workloads, as the majority of the external memory access are due to reads/writes of intermediate activation state (rather than parameters), which requires lower level familiarity with the model to estimate correctly. A good first order approximate for the optimal batch-size in a training workload, is the largest one that can fit in the device’s memory (i.e. won’t lead to OOM error). :literal:`batch-size(Training) = 0.6 x ( x x ``)` :literal:`/ ``(<#model-dense-params> x ``)` Note TP-rank stands for Tensor-Parallelism rank, i.e. how many NeuronCores participate in a single Tensor-Parallelism group. Similarly, PP-rank stands for Pipeline-Parallelism rank, i.e. how many NeuronCores participate in a single Pipeline-Parallelism group. For example, for BERT-Large Ph1 training, with a model-state of 4B per parameter (2B weights, 2B parameters), and TP-rank = PP-rank = 1, the approximated optimal per-NeuronCore training batch-size would be: :literal:`batch-size(Training/Trainium) = 0.6 x (1 x 1 x 16e+9``) / ``(300e+6 x 4``) = 8` ================================================ FILE: about-neuron/arch/neuron-features/neuroncore-pipeline.rst ================================================ .. _neuroncore-pipeline: NeuronCore Pipeline =================== The Neuron software feature referred to as a NeuronCore Pipeline refers to the process of sharding a compute-graph across multiple NeuronCores, caching the model parameters in each core’s on-chip memory (cache), and then streaming inference requests across the cores in a pipelined manner. Based on the number of NeuronCores selected, the model might get seamlessly sharded across up-to 16 Inferentia devices (i.e. 64 NeuronCores). This enables users to optimize for both throughput and latency, as it enables the NeuronCores to process neural-networks with locally cached data and avoid the cost of accessing external memory. |Image:| One benefit to this approach is that NeuronCore Pipeline can typically hit maximal hardware efficiency without the need for batching (e.g. BERT, ResNet50). For maximal performance, users should choose an instance-size that can cache the entire model by using sufficient NeuronCores. Inf1 instance types have different number of Inferentia devices, each of which has 4 NeuronCores, as shown here https://aws.amazon.com/ec2/instance-types/inf1/ To enable the NeuronCore Pipeline optimization, the compiler should be invoked with the following flags: ``--neuroncore-pipeline-cores N``. The number of NeuronCores is typically chosen to be the minimal number that can fit the entire model, which is currently done through a trial-and-error process (compiling to different number of cores and looking for compilation success/failure message). This process will be automated in the future. A simple formula to help define the number of NeuronCores that may be an appropriate choice is :: neuroncore-pipeline-cores = 4 * round( number-of-weights-in-model/(2 * 10^7) ) This allocates a set of NeuronCores based on the size of the given model's weights and normalizes to multiples of 4 so it uses full Inferentias. The code snippet below shows how to compile a model with NeuronCore Pipeline for 16 NeuronCores (instance size inf1.6xlarge). :: import numpy as np import tensorflow.neuron as tfn example_input = np.zeros([1,224,224,3], dtype='float16') tfn.saved_model.compile("rn50_fp16", "rn50_fp16_compiled/1", model_feed_dict={'input_1:0' : example_input }, compiler_args = ['--neuroncore-pipeline-cores', '16']) .. |Image:| image:: ./images/NeuronCorePipelining.png ================================================ FILE: about-neuron/arch/neuron-features/rounding-modes.rst ================================================ .. _neuron-rounding-modes: Neuron Rounding Modes ===================== .. contents:: Table of contents :local: :depth: 1 .. _neuron-rounding-mode-rne: Round Nearest, ties to Even (RNE) --------------------------------- When the exact result of a floating point operation cannot be exactly represented as a floating point value, it must be rounded. The IEEE 754-2008 standard defines the default rounding mode to be ‘Round Nearest, ties to Even’ (RNE for short). Under this scheme, numbers are rounded to the nearest representable value, and in case of a ‘tie’ (i.e. the number is exactly between the two nearest representable values) numbers will be rounded to the nearest even number. All NeuronCore generations support the RNE rounding scheme, which is the most commonly used rounding scheme for Machine Learning workloads. Below is an illustration of the RNE rounding scheme: .. image:: /images/rne1.png :width: 700 .. image:: /images/rne2.png :width: 700 .. image:: /images/rne3.png :width: 700 .. _neuron-rounding-mode-sr: Stochastic Rounding (SR) ------------------------ One downside of the RNE rounding scheme (and other rounding schemes described in the IEEE 754-2008 standard), is that when adding floating point values of significantly different magnitudes, rounding can squash small values and prevent them from accumulating over time. To improve this, starting from the second generation of the NeuronCore (NeuronCore-v2), customers can choose between the RNE rounding scheme described above, and a second rounding scheme called ‘Stochastic Rounding’ (SR for short). Stochastic rounding prevents the computation precision-loss described above, by performing the rounding operations in a probabilistic manner, according to the relative distance from the two nearest representable values, as illustrated below: .. image:: /images/sr.png :width: 700 By performing the rounding in a probabilistic manner, this scheme allows for small increments to accumulate over time, even when added to numbers of significantly higher magnitude, which leads to more precise results when performing large floating point computations (as done for machine learning). Quick Tests ----------- As an example, we examine the code-snippet below: :: import torch import torch_xla import torch_xla.core.xla_model as xm device = xm.xla_device() a = torch.tensor(1024.0).half().to(device) for i in range(2048) : a = (a + 0.5) xm.mark_step() print(a) This code shows that rounding can significantly impact the calculation’s precision over time. To use standard RNE rounding, use the environment variable ``NEURON_RT_STOCHASTIC_ROUNDING_EN=0``. To enable stochastic rounding, use the environment variable ``NEURON_RT_STOCHASTIC_ROUNDING_EN=1``. NOTE: Stochastic rounding mode is enabled by default in PyTorch-Neuron when XLA_USE_BF16=1. The first test continues to show 1024 due to RNE rounding after each addition, and the second test shows result that is mostly in line with expectation. :: $ NEURON_RT_STOCHASTIC_ROUNDING_EN=0 python3 rounding_mode_test.py tensor(1024., device='xla:1', dtype=torch.float16) $ NEURON_RT_STOCHASTIC_ROUNDING_EN=1 python3 rounding_mode_test.py tensor(2056., device='xla:1', dtype=torch.float16) ================================================ FILE: about-neuron/arch/neuron-hardware/inf1-arch.rst ================================================ .. _aws-inf1-arch: Amazon EC2 Inf1 Architecture ============================== On this page, we provide an architectural overview of the Amazon EC2 Inf1 instance and the corresponding :ref:`Inferentia ` NeuronChips that power them (:ref:`Inferentia ` chips from here on). .. contents:: Table of Contents :local: :depth: 2 .. _inf1-arch: Inf1 Architecture ----------------- The EC2 Inf1 instance is powered by 16 :ref:`Inferentia ` chips, allowing customers to choose between four instance sizes: .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - Instance size - # of Inferentia chips - vCPUs - Host Memory (GiB) - FP16/BF16 TFLOPS - INT8 TOPS - Device Memory (GiB) - Device Memory bandwidth (GiB/sec) - NeuronLink-v1 chip-to-chip bandwidth (GiB/sec/chip) - EFA bandwidth (Gbps) * - Inf1.xlarge - 1 - 4 - 8 - 64 - 128 - 8 - 50 - N/A - up-to 25 * - Inf1.2xlarge - 1 - 8 - 16 - 64 - 128 - 8 - 50 - N/A - up-to 25 * - Inf1.6xlarge - 4 - 24 - 48 - 256 - 512 - 32 - 200 - 32 - 25 * - Inf1.24xlarge - 16 - 96 - 192 - 1024 - 2048 - 128 - 800 - 32 - 100 Inf1 offers a direct chip-to-chip interconnect called NeuronLink-v1, which enables co-optimizing latency and throughput via the :ref:`Neuron Core Pipeline ` technology. .. image:: /images/inf1-server-arch.png ================================================ FILE: about-neuron/arch/neuron-hardware/inf2-arch.rst ================================================ .. _aws-inf2-arch: Amazon EC2 Inf2 Architecture ============================= On this page we provide an architectural overview of the Amazon EC2 Inf2 instances and the corresponding Inferentia2 NeuronChips that power them (Inferentia2 chips from here on). Inf2 Architecture ----------------- The EC2 Inf2 instance is powered by up to 12 :ref:`Inferentia2 chips `, and allows customers to choose between four instance sizes: .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - Instance size - # of Inferentia2 chips - vCPUs - Host Memory (GiB) - FP8/FP16/BF16/TF32 TFLOPS - FP32 TFLOPS - Device Memory (GiB) - Instance Memory Bandwidth (GiB/sec) - NeuronLink-v2 chip-to-chip (GiB/sec/chip) * - Inf2.xlarge - 1 - 4 - 16 - 190 - 47.5 - 32 - 820 - N/A * - Inf2.8xlarge - 1 - 32 - 128 - 190 - 47.5 - 32 - 820 - N/A * - Inf2.24xlarge - 6 - 96 - 384 - 1140 - 285 - 192 - 4920 - 192 * - Inf2.48xlarge - 12 - 192 - 768 - 2280 - 570 - 384 - 9840 - 192 Inf2 offers a low-latency, high-bandwidth chip-to-chip interconnect called NeuronLink-v2, which enables high-performance collective communication operations (e.g., AllReduce and AllGather). This allows sharding large models across Inferentia2 chips (e.g., via Tensor Parallelism), thus optimizing latency and throughput. This capability is especially useful when deploying Large Generative Models. .. image:: /images/inf2-topology.png ================================================ FILE: about-neuron/arch/neuron-hardware/inferentia.rst ================================================ .. _inferentia-arch: Inferentia Architecture ----------------------- At the heart of each Inf1 instance are sixteen Inferentia chips, each with four :ref:`NeuronCore-v1 `, as depicted below: .. image:: /images/inferentia-neurondevice.png Each Inferentia chip consists of: +---------------+-------------------------------------------+ | Compute | Four | | | :ref:`NeuronCore-v1 `| | | cores, delivering 128 INT8 TOPS and 64 | | | FP16/BF16 TFLOPS | +---------------+-------------------------------------------+ | Device Memory | 8GiB of device DRAM memory (for storing | | | parameters and intermediate state), with | | | 50 GiB/sec of bandwidth | +---------------+-------------------------------------------+ | NeuronLink | Enables co-optimization of latency and | | | throughput via the :ref:`Neuron Core | | | Pipeline ` | | | technology | +---------------+-------------------------------------------+ ================================================ FILE: about-neuron/arch/neuron-hardware/inferentia2.rst ================================================ .. _inferentia2-arch: Inferentia2 Architecture ------------------------ At the heart of each Inf2 instance are up to twelve Inferentia2 chips (each with two :ref:`NeuronCore-v2 ` cores). Inferentia2 is the second generation AWS purpose-built Machine Learning inference accelerator. The Inferentia2 chip architecture is depicted below: .. image:: /images/inferentia2.png Each Inferentia2 chip consists of: +----------------------------------+----------------------------------+ | Compute | Two :ref:`NeuronCore-v2 | | | ` | | | cores, delivering 380 INT8 TOPS, | | | 190 FP16/BF16/cFP8/TF32 TFLOPS, | | | and 47.5 FP32 TFLOPS. | +----------------------------------+----------------------------------+ | Device Memory | 32GiB of high-bandwidth device | | | memor (HBM) (for storing model | | | state), with 820 GiB/sec of | | | bandwidth. | +----------------------------------+----------------------------------+ | Data Movement | 1 TB/sec of DMA bandwidth, with | | | inline memory | | | compression/decompression. | +----------------------------------+----------------------------------+ | NeuronLink | NeuronLink-v2 for | | | chip-to-chip interconnect | | | enables high-performance | | | collective compute for | | | co-optimization of latency and | | | throughput. | +----------------------------------+----------------------------------+ | Programmability | Inferentia2 supports dynamic | | | shapes and control flow, via ISA | | | extensions of NeuronCore-v2 and | | | :ref:`custom-operators | | | ` | | | via the deeply embedded GPSIMD | | | engines. | +----------------------------------+----------------------------------+ For a more detailed description of all the hardware engines, see :ref:`NeuronCore-v2 `. ================================================ FILE: about-neuron/arch/neuron-hardware/neuron-core-v1.rst ================================================ .. _neuroncores-v1-arch: NeuronCore-v1 Architecture -------------------------- NeuronCore-v1 is the first generation NeuronCore engine, powering the Inferentia chips. Each NeuronCore-v1 is a fully-independent heterogenous compute-unit, with three main engines (Tensor/Vector/Scalar Engines), and on-chip software-managed SRAM memory, for maximizing data locality (compiler managed, for maximum data locality and optimized data prefetch). .. image:: /images/nc-v1.png The ScalarEngine is optimized for scalar computations, in which every element of the output is dependent on one element of the input, e.g., non-linearities such as GELU, SIGMOID, or EXP. The ScalarEngine is highly parallelized, and can process 512 floating point operations per cycle. It can handle various data types, including FP16, BF16, FP32, INT8, INT16, and INT32. The VectorEngine is optimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include ‘axpy’ operations (Z=aX+Y), Layer Normalization, Pooling operations, and many more. The VectorEngine is also highly parallelized, and can perform 256 floating point operations per cycle. It can handle various data-types, including FP16, BF16, FP32, INT8, INT16, and INT32. The TensorEngine is based on a power-optimized systolic array, which is highly optimized for tensor computations (e.g., GEMM, CONV, Reshape, Transpose), and supports mixed-precision computations (FP16/BF16/INT8 inputs, FP32/INT32 outputs). Each NeuronCore-v1 TensorEngine delivers 16 TFLOPS of FP16/BF16 tensor computations. ================================================ FILE: about-neuron/arch/neuron-hardware/neuron-core-v2.rst ================================================ .. _neuroncores-v2-arch: NeuronCore-v2 Architecture -------------------------- NeuronCore-v2 is the second generation of the NeuronCore engine, powering the Trainium chips. Each NeuronCore-v2 is a fully-independent heterogenous compute-unit, with 4 main engines (Tensor/Vector/Scalar/GPSIMD Engines), and on-chip software-managed SRAM memory, for maximizing data locality (compiler managed, for maximum data locality and optimized data prefetch). .. image:: /images/nc-v2.png Just like in NeuronCore-v1, The ScalarEngine is optimized for scalar-computations, in which every element of the output is dependent on one element of the input. The ScalarEngine is highly parallelized, and delivers 2.9 TFLOPS of FP32 computations (3x speedup relative to NeuronCore-v1). The NeuronCore-v2 ScalarEngine can handle various data types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. The VectorEngine is optimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include ‘axpy’ operations (Z=aX+Y), Layer Normalization, Pooling operations, and many more. The VectorEngine is also highly parallelized, and delivers 2.3 TFLOPS of FP32 computations (10x speedup vs. NeuronCore-v1). The NeuronCore-v2 VectorEngine can handle various data-types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16 and INT32. The TensorEngine is based on a power-optimized systolic-array, which is highly optimized for tensor computations (e.g., GEMM, CONV, Transpose), and supports mixed-precision computations (cFP8 / FP16 / BF16 / TF32 / FP32 / INT8 inputs, FP32 / INT32 outputs). Each NeuronCore-v2 TensorEngine delivers over 90 TFLOPS of FP16/BF16 tensor computations (6x speedup from NeuronCore-v1). NeuronCore-v2 also introduces a new engine called the GPSIMD-Engine, which consists of eight fully-programmable 512-bit wide vector processors, which can execute general purpose C-code and access the embedded on-chip SRAM memory. With these cores, customers can implement custom operators and execute them directly on the NeuronCores. NeuronCore-v2 also adds support for control flow, dynamic shapes, and programmable :ref:`rounding mode ` (RNE & Stochastic-rounding). ================================================ FILE: about-neuron/arch/neuron-hardware/neuron-core-v3.rst ================================================ .. _neuroncores-v3-arch: NeuronCore-v3 Architecture -------------------------- NeuronCore-v3 is the third-generation NeuronCore that powers Trainium2 chips. It is a fully-independent heterogenous compute unit consisting of 4 main engines: Tensor, Vector, Scalar, and GPSIMD, with on-chip software-managed SRAM memory to maximize data locality and optimize data prefetch. The following diagram shows a high-level overview of the NeuronCore-V3 architecture. .. image:: /images/architecture/NeuronCore/nc-v3.png :align: center :width: 250 | NeuronCore-v3 is made up of the following components: On-chip SRAM """""""""""" Each NeuronCore-v3 has a total of 28MB of on-chip SRAM. NeuronCore-v3 on-chip SRAM is software-managed to maximize data locality and optimize data prefetch. Tensor Engine """"""""""""" Tensor engines are based on a power-optimized systolic array. They are highly optimized for tensor computations such as GEMM, CONV, and Transpose. Tensor Engines support mixed-precision computations, including cFP8, FP16, BF16, TF32, and FP32 inputs and outputs. A NeuronCore-v3 Tensor Engine delivers 158 cFP8 TFLOPS, and 79 BF16/FP16/TF32 TFLOPS of tensor computations. Like NeuronCore-v2, NeuronCore-v3 supports control flow, dynamic shapes, and programmable rounding mode (RNE & Stochastic-rounding). NeuronCore-v3 also supports adjustable exponent biasing for the cFP8 data type. The NeuronCore-v3 Tensor Engine also supports Structured Sparsity, delivering up to 316 TFLOPS of cFP8/FP16/BF16/TF32 compute. This is useful when one of the input tensors to matrix multiplication exhibits a M:N sparsity pattern, where only M elements out of every N contiguous elements are non-zero. NeuronCore-v3 supports several sparsity patterns, including 4:16, 4:12, 4:8, 2:8, 2:4, 1:4, and 1:2. Vector Engine """""""""""""" Optimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include axpi operations (Z=aX+Y), Layer Normalization, and Pooling operations. Vector Engines are highly parallelized, and deliver a total of 1 TFLOPS of FP32 computations. NeuronCore-v3 Vector Engines can handle various data-types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. Scalar engine """"""""""""" Optimized for scalar computations in which every element of the output is dependent on one element of the input. Scalar Engines are highly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Scalar Engines support multiple data types, including cFP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. GPSIMD engine """"""""""""" Each GPSIMD engine consists of eight fully-programmable 512-bit wide vector processors. They can execute general purpose C-code and access the embedded on-chip SRAM, allowing you to implement custom operators and execute them directly on the NeuronCores. ================================================ FILE: about-neuron/arch/neuron-hardware/neuron-core-v4.rst ================================================ .. meta:: :description: "NeuronCore-v4 architecture overview and components." :date-modified: 12/02/2025 .. _neuroncores-v4-arch: NeuronCore-v4 Architecture =========================== NeuronCore-v4 is the fourth-generation NeuronCore that powers Trainium3 chips. It is a fully-independent heterogenous compute unit consisting of 4 main engines: Tensor, Vector, Scalar, and GPSIMD, with on-chip software-managed SRAM memory to maximize data locality and optimize data prefetch. The following diagram shows a high-level overview of the NeuronCore-v4 architecture. .. image:: /images/architecture/trn3/neuroncore-v4.png :align: center Like previous generations of NeuronCore, NeuronCore-v4 supports control flow, dynamic shapes, and programmable rounding mode (RNE & Stochastic-rounding). NeuronCore-v4 is made up of the following components: On-chip SRAM ------------- Each NeuronCore-v4 has a total of 32MiB of on-chip SRAM. The on-chip SRAM is software-managed to maximize data locality and optimize data prefetch. NeuronCore-v4 SRAM also introduces a new near-memory accumulation feature, which allows DMA engines to perform a read-add-write operation into existing SRAM data via a single transfer. Tensor Engine -------------- Tensor engines are based on a power-optimized systolic array. They are highly optimized for tensor computations such as GEMM, CONV, and Transpose. Tensor Engines support mixed-precision computations, including MXFP8/MXFP4, FP16, BF16, TF32, and FP32 inputs. The output data type can either be FP32 or BF16. A NeuronCore-v4 Tensor Engine delivers 315 MXFP8/MXFP4 TFLOPS, where MXFP8/MXFP4 are OCP (Open Compute Project) compliant data type formats. MXFP4 data types are converted to MXFP8 before Tensor Engine computation logic, using any arbitrary programmer-defined mapping. Besides quantized data types, a NeuronCore-v4 Tensor Engine also delivers 79 BF16/FP16/TF32 and 20 FP32 TFLOPS of tensor computations. The NeuronCore-v4 Tensor Engine also supports Structured Sparsity, delivering up to 315 TFLOPS of FP16/BF16/TF32 compute. This is useful when one of the input tensors to matrix multiplication exhibits a M:N sparsity pattern, where only M elements out of every N contiguous elements are non-zero. NeuronCore-v4 supports several sparsity patterns, including 4:16, 4:12, 4:8, 2:8, 2:4, 1:4, and 1:2. Vector Engine ---------------- Optimized for vector computations, in which every element of the output is dependent on multiple input elements. Examples include axpi operations (Z=aX+Y), Layer Normalization, and Pooling operations. Vector Engines are highly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Vector Engines can handle various data-types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. In addition, NeuronCore-v4 Vector Engine supports two new features: 1. Data quantization into MXFP8 data type formats from BF16/FP16, which is particularly useful for online data quantization in between MLP (multi-layer perceptron) layers. 2. Fast exponential functional evaluation, at 4x higher throughput than exponential on Scalar Engine, which is particularly useful in self attention acceleration. Scalar Engine --------------- Optimized for scalar computations in which every element of the output is dependent on one element of the input. Scalar Engines are highly parallelized, and deliver a total of 1.2 TFLOPS of FP32 computations. NeuronCore-v3 Scalar Engines support multiple data types, including FP8, FP16, BF16, TF32, FP32, INT8, INT16, and INT32. GPSIMD Engine --------------- Each GPSIMD engine consists of eight fully-programmable 512-bit wide vector processors. They can execute general purpose C/C++ code and access the embedded on-chip SRAM, allowing you to implement custom operators and execute them directly on the NeuronCores. ================================================ FILE: about-neuron/arch/neuron-hardware/trainium.rst ================================================ .. _trainium-arch: Trainium Architecture ---------------------- At the heart of the Trn1 instance are 16 x Trainium chips (each Trainium include 2 x :ref:`NeuronCore-v2 `). Trainium is the second generation purpose-built Machine Learning accelerator from AWS. The Trainium chip architecture is depicted below: .. image:: /images/trainium-neurondevice.png Each Trainium chip consists of: +----------------------------------+----------------------------------+ | Compute | Two :ref:`NeuronCore-v2 | | | ` | | | delivering 380 INT8 TOPS, | | | 190 FP16/BF16/cFP8/TF32 TFLOPS, | | | and 47.5 FP32 TFLOP. | +----------------------------------+----------------------------------+ | Device Memory | 32 GiB of device memory (for | | | storing model state), with 820 | | | GiB/sec of bandwidth. | +----------------------------------+----------------------------------+ | Data Movement | 1 TB/sec of DMA bandwidth, with | | | inline memory | | | compression/decompression. | +----------------------------------+----------------------------------+ | NeuronLink | NeuronLink-v2 for | | | chip-to-chip interconnect | | | enables efficient scale-out | | | training, as well as memory | | | pooling between the different | | | Trainium chips. | +----------------------------------+----------------------------------+ | Programmability | Trainium supports dynamic shapes | | | and control flow, via ISA | | | extensions of NeuronCore-v2. In | | | addition, Trainium also allows | | | for user-programmable | | | :ref:`rounding mode | | | ` | | | (Round Nearest Even Stochastic | | | Rounding), and custom operators | | | via the deeply embedded GPSIMD | | | engines. | +----------------------------------+----------------------------------+ For a detailed description of all the hardware engines, see :ref:`NeuronCore-v2 ` ================================================ FILE: about-neuron/arch/neuron-hardware/trainium2.rst ================================================ .. _trainium2-arch: ###################### Trainium2 Architecture ###################### Trainium2 is the third generation, purpose-built Machine Learning chip from AWS. Every Trainium2 chip contains eight NeuronCore-V3 cores. Beginning with Trainium2, AWS Neuron adds support for Logical NeuronCore Configuration (LNC), which lets you combine the compute and memory resources of multiple physical NeuronCores into a single logical NeuronCore. The following diagram shows the architecture overview of a Trainium2 chip. .. image:: /images/architecture/Trainium2/trainium2.png :align: center :width: 400 =========================== Trainium2 chip components =========================== Each Trainium2 chip consists of the following components: +----------------------------------+-----------------------------------------------------+ | Compute | Eight NeuronCore-v3 that collectively deliver: | | | | | | * 1,299 FP8 TFLOPS | | | * 667 BF16/FP16/TF32 TFLOPS | | | * 2,563 FP8/FP16/BF16/TF32 sparse TFLOPS | | | * 181 FP32 TFLOPS | | | | +----------------------------------+-----------------------------------------------------+ | Device Memory | 96 GiB of device memory with 2.9 TB/sec of | | | bandwidth. | +----------------------------------+-----------------------------------------------------+ | Data Movement | 3.5 TB/sec of DMA bandwidth, with inline | | | memory compression and decompression. | +----------------------------------+-----------------------------------------------------+ | NeuronLink | NeuronLink-v3 for chip-to-chip interconnect | | | provides 1.28 TB/sec bandwidth per chip. It allows | | | for efficient scale-out training and inference, as | | | well as memory pooling between Trainium2 chips. | +----------------------------------+-----------------------------------------------------+ | Programmability | Trainium2 supports dynamic shapes and control flow | | | via NeuronCore-v3 ISA extensions. Trainium2 also | | | allows for user-programmable | | | :ref:`rounding mode ` | | | (Round Nearest Even or Stochastic Rounding), and | | | custom operators via deeply embedded GPSIMD engines.| +----------------------------------+-----------------------------------------------------+ | Collective communication | 16 CC-Cores orchestrate collective communication | | | among Trainium2 chips within and across instances. | +----------------------------------+-----------------------------------------------------+ ================================== Trainium2 performance improvements ================================== The following set of tables offer a comparison between Trainium and Trainium2 chips. Compute """"""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium - Trainium2 - Improvement factor * - FP8 (TFLOPS) - 191 - 1299 - 6.7x * - BF16/FP16/TF32 (TFLOPS) - 191 - 667 - 3.4x * - FP32 (TFLOPS) - 48 - 181 - 3.7x * - FP8/FP16/BF16/TF32 Sparse (TFLOPS) - Not applicable - 2563 - Not applicable Memory """""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium - Trainium2 - Improvement factor * - HBM Capacity (GiB) - 32 - 96 - 3x * - HBM Bandwidth (TB/sec) - 0.8 - 2.9 - 3.6x * - SBUF Capacity (MiB) - 48 - 224 - 4.7x * - Memory Pool Size - Up to 16 chips - Up to 64 chips - 4x Interconnect """""""""""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium - Trainium2 - Improvement factor * - Inter-chip Interconnect (GB/sec/chip) - 384 - 1280 - 3.3x Data movement """"""""""""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium - Trainium2 - Improvement factor * - CC Cores - 6 - 16 - 3.3x * - DMA barriers - Write-after-write - Strong-order-write - \>1x (Benefit DMA-size dependent) * - SBUF memory layout - Row-major - Row-major, Col-major-2B, Col-major-4B - Not applicable ==================== Additional resources ==================== For a detailed description of NeuronCore-v3 hardware engines, instances powered by AWS Trainium2, and Logical NeuronCore configuration, see the following resources: * :ref:`NeuronCore-v3 architecture ` * :ref:`Amazon EC2 Trn2 architecture ` * :ref:`Logical NeuronCore configuration ` ================================================ FILE: about-neuron/arch/neuron-hardware/trainium3.rst ================================================ .. meta:: :description: "Neuron Trainium3 (Trn3) architecture overview." :date-modified: 12/02/2025 .. _trainium3-arch: Trainium3 Architecture ======================= Trainium3 is the fourth-generation purpose-built Machine Learning chip from AWS. A Trainium3 device contains eight NeuronCore-v4 cores. Similar to Trainium2, AWS Neuron adds support for Logical NeuronCore Configuration (LNC), which lets you combine the compute and memory resources of multiple physical NeuronCores into a single logical NeuronCore. The following diagram shows the architecture overview of a Trainium3 chip. .. image:: /images/architecture/trn3/neuroncore-v4-overview.png :align: center NeuronCore-v4 -------------- Each Trainium3 chip consists of the following components: .. list-table:: :widths: auto :header-rows: 0 :stub-columns: 1 :align: left * - Compute - Eight NeuronCore-v4 cores that collectively deliver: * 2,517 MXFP8/MXFP4 TFLOPS * 671 BF16/FP16/TF32 TFLOPS * 2,517 FP16/BF16/TF32 sparse TFLOPS * 183 FP32 TFLOPS * - Device memory - 144 GiB of device memory, with 4.9 TB/sec of bandwidth. * - Data movement - 4.9 TB/sec of DMA bandwidth, with inline computation. * - NeuronLink - NeuronLink-v4 for device-to-device interconnect provides 2.56 TB/sec bandwidth per device. It enables efficient scale-out training, as well as memory pooling between the different Trainium3 devices. * - Programmability - Trainium3 supports dynamic shapes and control flow, via ISA extensions of NeuronCore-v4. Trainium3 also allows for user-programmable rounding mode (Round Nearest Even or Stochastic Rounding), and custom operators via the deeply embedded GPSIMD engines. * - Collective communication - 16 CC-Cores orchestrate collective communication among Trainium3 devices, both within a server and across servers. Trainium3 performance improvements ----------------------------------- The following set of tables offer a comparison between Trainium2 and Trainium3 chips. Compute """"""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium2 - Trainium3 - Improvement factor * - MXFP4 (TFLOPS) - Not applicable - 2517 - - * - FP8 (TFLOPS) - 1299 - 2517 - 2x * - BF16/FP16/TF32 (TFLOPS) - 667 - 671 - 1x * - FP32 (TFLOPS) - 181 - 183 - 1x Memory """""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium2 - Trainium3 - Improvement factor * - HBM Capacity (GiB) - 96 - 144 - 1.5x * - HBM Bandwidth (TB/sec) - 2.9 - 4.9 - 1.7x * - SBUF Capacity (MiB) - 224 - 256 - 1.14x Interconnect """""""""""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium2 - Trainium3 - Improvement factor * - Inter-chip Interconnect (GB/sec/chip) - 1280 - 2560 - 2x Data movement """"""""""""" .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - - Trainium2 - Trainium3 - Improvement factor * - DMA Bandwidth (TB/sec) - 3.5 - 4.9 - 1.4x Additional resources ---------------------- For a detailed description of NeuronCore-v4 hardware engines, instances powered by AWS Trainium3, and Logical NeuronCore configuration, see the following resources: * :ref:`NeuronCore-v4 architecture ` ================================================ FILE: about-neuron/arch/neuron-hardware/trn1-arch.rst ================================================ .. _aws-trn1-arch: Amazon EC2 Trn1/Trn1n Architecture =================================== On this page, we provide an architectural overview of the AWS Trn1/Trn1n instances, and the corresponding :ref:`Trainium ` NeuronChips that power them (Trainium chips from here on). .. contents:: Table of contents :local: :depth: 2 .. _trn1-arch: Trn1/Trn1n Architecture ----------------------- An EC2 Trn1/Trn1n instance is powered by up to 16 :ref:`Trainium ` chips. .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - Instance size - # of Trainium chips - vCPUs - Host Memory (GiB) - FP8/FP16/BF16/TF32 TFLOPS - FP32 TFLOPS - Device Memory (GiB) - Device Memory Bandwidth (GiB/sec) - EFA bandwidth (Gbps) * - Trn1.2xlarge - 1 - 8 - 32 - 190 - 47.5 - 32 - 820 - N/A - up-to 25 * - Trn1.32xlarge - 16 - 128 - 512 - 3,040 - 760 - 512 - 13,120 - 384 - 800 * - Trn1n.32xlarge - 16 - 128 - 512 - 3,040 - 760 - 512 - 13,120 - 768 - 1,600 The Trn1.2xlarge instance size allows customers to train their models on a single Trainium chip, which is useful for small model training, as well as for model experimentation. The Trn1.32xlarge and Trn1n.32xlarge instance size come with a high-bandwidth and low-latency NeuronLink-v2 chip-to-chip interconnect, which utilizes a 2D Torus topology. This is useful for collective communication between the Trainium chips during scale-out training, as well as for pooling the memory capacity of all Trainium chips, making it directly addressable from each of the chips. In a Trn1/Trn1n server, the Trainium chips are connected in a 2D Torus topology, as depicted below: .. image:: /images/trn1-topology.png The Trn1/Trn1n instances are also available in an EC2 UltraCluster, which enables customers to scale Trn1/Trn1n instances to over 100,000 Trainium chips, and leverage the AWS-designed non-blocking petabit-scale EFA networking infrastructure. .. image:: /images/ultracluster-1.png ================================================ FILE: about-neuron/arch/neuron-hardware/trn2-arch.rst ================================================ .. _aws-trn2-arch: ############################ Amazon EC2 Trn2 Architecture ############################ Trn2 is an Amazon EC2 accelerated computing instance, purpose built for high-performance deep learning training and inference. This page provides an architecture overview of the trn2.48xlarge and trn2u.48xlarge instances, and Trn2 UltraServer. .. contents:: Topics :local: :depth: 2 .. _trn2-arch: Trn2 instance sizes =================== Trn2 instances and UltraServers are available in the following sizes and configurations: * trn2.48xlarge * trn2u.48xlarge * Trn2 UltraServer .. _trn2-instance: trn2.48xlarge / trn2u.48xlarge """""""""""""""""""""""""""""" Trn2 instances are powered by 16 Trainium2 chips connected using a high-bandwidth, low-latency NeuronLink-v3 chip-to-chip interconnect. The NeuronLink-v3 chip-to-chip interconnect enables collective communication between Trainium2 chips during distributed training and inference. It also allows for the pooling of memory resources from all 16 Trainium2 chips. In a trn2.48xlarge or trn2u.48xlarge instance, 16 Trainium2 chips are connected using a 4x4, 2D Torus topology. The following diagram shows the intra-instance connections of a trn2.48xlarge or trn2u.48xlarge instance .. image:: /images/architecture/Trn2/trn2.48xlarge.png :align: center :width: 650 | .. _trn2-ultraserver: Trn2 UltraServer """"""""""""""""""""" A Trn2 UltraServer comprises four trn2u.48xlarge instances connected together via the NeuronLink-v3 chip-to-chip interconnect. This allows for a total of 64 Trainium2 chips to be interconnected within a Trn2 UltraServer. Trainium2 chips with the same coordinates in each Trn2 instance are connected in a ring topology. The following figure shows the inter-instance ring connection between Trainium2 chips. .. image:: /images/architecture/Trn2/u-trn2x64.png :align: center :width: 650 | Trn2 instance specifications ============================ The following table shows the performance metrics for Trainium2 based instances. .. list-table:: :widths: auto :header-rows: 1 :stub-columns: 1 :align: left * - Perfomance specification - trn2.48xlarge / trn2u.48xlarge - Trn2 UltraServer * - # of Trainium2 chips - 16 - 64 * - vCPUs - 192 - 768 * - Host Memory (GiB) - 2,048 - 8,192 * - FP8 PFLOPS - 20.8 - 83.2 * - FP16/BF16/TF32 PFLOPS - 10.7 - 42.8 * - FP8/FP16/BF16/TF32 Sparse PFLOPS - 41 - 164 * - FP32 PFLOPS - 2.9 - 11.6 * - Device Memory (GiB) - 1,536 - 6,144 * - Device Memory Bandwidth (TB/sec) - 46.4 - 185.6 * - Intra-instance NeuronLink-v3 bandwidth (GB/sec/chip) - 1,024 - 1,024 * - Inter-instance NeuronLink-v3 bandwidth (GB/sec/chip) - Not applicable - 256 * - EFAv3 bandwidth (Gbps) - 3,200 - 3,200 ================================================ FILE: about-neuron/arch/neuron-hardware/trn3-arch.rst ================================================ .. _aws-trn3-arch: ############################### Amazon EC2 Trn3 Architecture ############################### Amazon EC2 **Trn3** instances are accelerated computing instances powered by Trainium3 AI chips, purpose-built for high-performance deep learning training and inference. Trn3 is available in two UltraServer scale-up configurations: Gen1 with 64 Trainium3 chips per UltraServer, and Gen2 with 144 chips per UltraServer. Both configurations use NeuronSwitch-v1 interconnect technology to enable all-to-all connectivity between chips, especially optimized for workloads that leverage all-to-all communication patterns, such as Mixture of Experts models and autoregressive inference serving. ===================== Trn3 Gen1 UltraServer ===================== The EC2 Trn3 Gen1 UltraServers deliver 161 PetaFLOPS of dense MXFP8 compute, 314 TB/s of HBM bandwidth, and 9TB of HBM capacity. Each UltraServer consists of four servers with 16 Trainium3 devices per server. Therefore, the UltraServer integrates a total of 64 Trainium3 devices into a single scale-up domain, interconnected via our latest-generation NeuronLink-v4 and the newly introduced NeuronSwitch-v1. The chip-to-chip topology features an all-to-all connectivity design, replacing the previous 2D-torus architecture. This all-to-all topology is optimized for workloads that require efficient all-to-all communication patterns or ultra-low latency collectives, including Mixture of Experts models and autoregressive inference serving. The following diagram illustrates the Trn3 Gen1 UltraServer connectivity. .. image:: /images/architecture/trn3/trn3-ultraserver-gen1.png :align: center ===================== Trn3 Gen2 UltraServer ===================== The EC2 Trn3 Gen2 UltraServers deliver 362 PetaFLOPS of dense MXFP8 compute, 706 TB/s of HBM bandwidth, and 20TB of HBM capacity. Each UltraServer consists of 36 servers with 4 Trainium3 devices per server. Trainium3 devices within the same server are connected via a first-level NeuronSwitch-v1, while devices across servers are connected via two second-level NeuronSwitch-v1 and NeuronLink-v4. Therefore, the UltraServer integrates 144 Trainium3 devices into a single scale-up domain. Like Gen1, the chip-to-chip topology features an all-to-all connectivity design optimized for Mixture of Experts models and autoregressive inference serving. The following diagram illustrates the Trn3 Gen2 UltraServer connectivity. .. image:: /images/architecture/trn3/trn3-ultraserver-gen2.png :align: center ========================================== Trn3 Gen1/Gen2 UltraServer specifications ========================================== The following table shows the performance metrics for Tranium3 based instances. .. list-table:: :header-rows: 2 :stub-columns: 1 :widths: 30 20 20 * - - Trn3 Gen1 UltraServer - Trn3 Gen2 UltraServer * - Configuration - - * - # of Trainium3 devices - 64 - 144 * - Host vCPUs - 768 - 2304 * - Host Memory (GiB) - 8,192 - 27,648 * - **Compute** - - * - MXFP8/MXFP4 TFLOPS - 161,088 - 362,448 * - FP16/BF16/TF32 TFLOPS - 42,944 - 96,624 * - FP32 TFLOPS - 11,712 - 26,352 * - **Memory** - - * - Device Memory (GiB) - 9,216 - 20,736 * - Device Memory Bandwidth (TB/sec) - 313.6 - 705.6 * - **Interconnect** - - * - NeuronLink-v4 bandwidth (GiB/sec/device) - 2,048 - 2,048 * - EFA bandwidth (Gbps) - 12,800 - 28,800 ============================================ Trn3 UltraServer Connectivity and Networking ============================================ Trn3 UltraServers use a PCIe switch-based interconnect architecture for all chip-to-chip communication, both within and across servers. This replaces the point-to-point NeuronLink topology used in previous generations (Trn1, Trn2) with a switched fabric that enables flexible, all-to-all connectivity across the entire UltraServer domain. Intra-server connectivity ------------------------- Each server (sled) contains 4 Trainium3 chips connected through an intra-server PCIe switch. Each chip provides four PCIe Gen6 x8 links to this switch, delivering a total of 256 GB/s of bidirectional bandwidth between chips within the same server. This local switch enables low-latency communication for operations like tensor parallelism and data-parallel gradient synchronization within a server. Inter-server connectivity ------------------------- All servers within a rack are connected through inter-server PCIe switches. Each Trainium3 chip provides five PCIe Gen6 x8 links to the inter-server switch, delivering 320 GB/s of bidirectional bandwidth per chip for cross-server communication. This enables collective operations such as all-reduce and all-gather to span all servers in a rack without requiring host CPU involvement. Inter-rack connectivity ----------------------- For multi-rack configurations, Trainium3 chips in corresponding positions across racks are connected via dedicated direct PCIe links. Each chip provides two PCIe Gen6 x8 links for inter-rack communication, delivering 128 GB/s of bidirectional bandwidth per chip between racks. This direct-link design avoids additional switch hops for cross-rack traffic. Bandwidth summary ----------------- .. list-table:: :header-rows: 1 :widths: 30 30 40 * - Connectivity level - Bandwidth per chip - Link configuration * - Intra-server (within sled) - 256 GB/s - 4 × PCIe Gen6 x8 via intra-server switch * - Inter-server (within rack) - 320 GB/s - 5 × PCIe Gen6 x8 via inter-server switch * - Inter-rack - 128 GB/s - 2 × PCIe Gen6 x8 direct links Routing and address-based switching ------------------------------------ Unlike Trn1 and Trn2, where NeuronLink connections are point-to-point and require no intermediate routing, Trn3's PCIe switch fabric uses address-based routing to direct transactions to the correct destination chip. Each Trainium3 chip in the system is identified by a tuple of (rack, server, chip), and this identity is encoded in the upper bits of the PCIe address used for outbound transactions. The PCIe switches use BAR (Base Address Register) address matching to determine the correct output port for each transaction. This routing is transparent to ML workloads. The Neuron Runtime and compiler handle all address encoding and switch configuration automatically. From the developer's perspective, collective operations and direct memory access between chips work the same way as on previous Trainium generations. Semaphore-based synchronization ------------------------------- Trn3 uses hardware semaphores to synchronize data transfers across the switched fabric. When a chip writes data to a remote chip's HBM, a follow-up semaphore write signals completion to the receiving chip. The system guarantees that data and its associated semaphore always traverse the same physical path through the switch fabric, ensuring correct ordering without additional software synchronization overhead. ================================================ FILE: about-neuron/benchmarks/index.rst ================================================ .. _benchmark: .. meta:: :description: Explore AWS Neuron performance benchmarks for Inf1, Inf2, and Trn1 instances. Find detailed inference and training performance data across NLP, CV, and recommender models to optimize your machine learning workloads. :date-modified: 2025-10-03 Neuron performance ================== The Neuron performance pages provide comprehensive benchmarks and performance data for AWS Neuron SDK across different Trainium and Inferentia instance types. These benchmarks cover various open-source models for Natural Language Processing (NLP), Computer Vision (CV), and Recommender systems. Each benchmark includes detailed setup instructions and reproducible test configurations to help you evaluate performance for your specific use cases. Inference performance --------------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: appnote-performance-benchmark :link-type: ref **Inf1 Inference Performance** ^^^ Comprehensive inference benchmarks for ``Inf1`` instances across NLP, CV, and recommender models .. grid-item-card:: :link: inf2-performance :link-type: ref **Inf2 Inference Performance** ^^^ Latest inference performance data for ``Inf2`` instances with improved throughput and latency metrics .. grid-item-card:: :link: trn1-inference-performance :link-type: ref **Trn1 Inference Performance** ^^^ Inference benchmarks for ``Trn1`` instances showcasing versatile training and inference capabilities Training performance -------------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: trn1-training-performance :link-type: ref **Trn1 Training Performance** ^^^ Training performance benchmarks for ``Trn1`` instances with distributed training metrics and scalability data .. toctree:: :maxdepth: 1 :hidden: inf1/index inf2/inf2-performance trn1/trn1-inference-performance trn1/trn1-training-performance ================================================ FILE: about-neuron/benchmarks/inf1/data.csv ================================================ Name,Model,Model details,Framework,Application Type,Run Mode,Inst. Type,Num. Cores,Batch Size,Avg Throughput (/sec),Max Throughput,Threads,Ops in Inferentia,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Latency P100 (ms),Neuron Version,Application,Tutorial "YOLOv4-PT(fp32,b1,c4)",YOLO v4,fp32,PyTorch 1.13,Real Time,Data Parallel,inf1.2xlarge,4,1,180.2,,8,,40.1,,,52,,2.15.0,CV,:ref:`Evaluate YOLO v4 on Inferentia ` "Resnet50-PT(fp32,b5,c4)",Resnet-50,fp32,PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,5,923,,4,,22,,,23,,2.15.0,CV,:ref:`Resnet50 model for Inferentia ` "Resnet50-TF(fp16,b5,c4)",Resnet-50,fp16,Tensorflow 1.15,Batch,Data Parallel,inf1.xlarge,4,10,2207,,8,,17.8,,,22.7,,2.12.0,CV,:ref:`ResNet-50 optimization example ` "OpenPose-TF(fp16,b1,c4)",OpenPose,fp16,Tensorflow 1.15,Real Time,Data Parallel,inf1.xlarge,4,1,57.5,,4,,60.3,,,67.4,,2.12.0,CV,:ref:`Running OpenPose on Inferentia ` "BERT-base-PT(fp32,b6,c4)",BERT base,"fp32, bert-base-cased-finetuned-mrpc, sequence-length=128",PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,6,966,,4,,21,,,22,,2.15.0,NLP,:ref:`HuggingFace Pretrained BERT ` "BERT-base-PT(fp32,b1,c16)",BERT base,"fp32, bert-base-uncased, sequence-length=128",PyTorch 1.13,Real Time,Model Pipeline,inf1.6xlarge,16,1,1988.8,,12,,6,,,6.3,,2.15.0,NLP,:ref:`Using NeuronCore Pipeline ` "BERT-base-TF(fp32,b128,c16)",BERT base,"fp32, distilbert-base-uncased-finetuned-sst-2-english, sequence-length=128",Tensorflow 2.8,Batch,Data Parallel,inf1.6xlarge,16,16,2114.8,,,,30.1,,,33,,2.15.0,NLP,:ref:`HuggingFace distilBERT with Tensorflow2 ` ================================================ FILE: about-neuron/benchmarks/inf1/index.rst ================================================ .. _appnote-performance-benchmark: Inf1 Inference Performance =========================== .. important:: The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder `_. .. contents:: Table of contents :local: The following tables contain the reference inference performance for models in the tutorials. Follow the links on each row to replicate similar results in your own environment. Refer to :ref:`ec2-then-ec2-setenv` documentation to create a new environment based on the latest Neuron release. *Last update: September 16th, 2024* .. _NLP: Encoder Models -------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_encoder.csv') df_prices = pd.read_csv('instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model', 'Scripts', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(0).astype('int',copy=True) .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_encoder.csv') df_prices = pd.read_csv('instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model', 'Scripts', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(0).astype('int',copy=True) .. note:: Throughput and latency numbers in this table were computed using* NeuronPerf_. To reproduce these results, install NeuronPerf and run the provided scripts.* .. _NeuronPerf: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuronperf/index.html Convolutional Neural Networks (CNN) Models ------------------------------------------ .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_cnn.csv') df_prices = pd.read_csv('instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type').query('`Application`=="CV"') df['Cost per 1M inferences'] = ((1.0e6 / df['Avg Throughput (/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model', 'Tutorial', 'Framework', 'Inst. Type', 'Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model details' ] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']).groupby('Model').head(2) int_cols = ['Avg Throughput (/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(0).astype('int',copy=True) .. note:: Throughput and latency numbers in this table were generated using Neuron Tutorials. .. note:: **Cost per 1M inferences** is calculated using US East (N. Virginia) RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. ================================================ FILE: about-neuron/benchmarks/inf1/instance_prices.csv ================================================ Inst. Type,RI-Effective hourly rate inf1.xlarge,0.110 inf1.2xlarge,0.174 inf1.6xlarge,0.567 inf1.24xlarge,2.269 ================================================ FILE: about-neuron/benchmarks/inf1/latency_data_encoder.csv ================================================ Model,Scripts,Source,Framework,Inst. Type,Num Cores,Seq. Length,Avg Throughput (/sec),Max Throughput,Threads,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,N Models,Workers per Model,Model details BERT base (bert-base-cased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,125.7,,8,7.9,,,8.0,Real Time,2.20.0,Data Parallel,1,1,1,"fp32, sequence-length=128" BERT base (bert-base-uncased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,284.7,,8,10.5,,,10.7,Real Time,2.20.0,Data Parallel,3,1,1,"fp32, sequence-length=128" DistilBERT base (distilbert-base-uncased-finetuned-sst-2-english),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,593.4,,8,10.0,,,10.7,Real Time,2.20.0,Data Parallel,5,1,1,"fp32, sequence-length=128" DistilBERT base (distilbert-base-uncased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,538.2,,8,11.1,,,11.5,Real Time,2.20.0,Data Parallel,6,1,1,"fp32, sequence-length=128" DistilRoBERTa base (distilroberta-base),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,417.0,,8,7.0,,,7.8,Real Time,2.20.0,Data Parallel,3,1,1,"fp32, sequence-length=128" ================================================ FILE: about-neuron/benchmarks/inf1/throughput_data_cnn.csv ================================================ Name,Model,Model details,Framework,Application Type,Run Mode,Inst. Type,Num. Cores,Batch Size,Avg Throughput (/sec),Max Throughput,Threads,Ops in Inferentia,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Latency P100 (ms),Neuron Version,Application,Tutorial "YOLOv4-PT(fp32,b1,c4)",YOLO v4,fp32,PyTorch 1.13,Real Time,Data Parallel,inf1.2xlarge,4,1,180.3,,8,,40.0,,,50.8,,2.20.0,CV,:ref:`Evaluate YOLO v4 on Inferentia ` "Resnet50-PT(fp32,b5,c4)",Resnet-50,fp32,PyTorch 1.13,Batch,Data Parallel,inf1.xlarge,4,5,921.5,,4,,21.6,,,22.9,,2.20.0,CV,:ref:`Resnet50 model for Inferentia ` "Resnet50-TF(fp16,b5,c4)",Resnet-50,fp16,Tensorflow 1.15,Batch,Data Parallel,inf1.xlarge,4,10,2207,,8,,17.8,,,22.7,,2.12.0,CV,:ref:`ResNet-50 optimization example ` "OpenPose-TF(fp16,b1,c4)",OpenPose,fp16,Tensorflow 1.15,Real Time,Data Parallel,inf1.xlarge,4,1,57.5,,4,,60.3,,,67.4,,2.12.0,CV,:ref:`Running OpenPose on Inferentia ` ================================================ FILE: about-neuron/benchmarks/inf1/throughput_data_encoder.csv ================================================ Model,Scripts,Source,Framework,Inst. Type,Num Cores,Seq. Length,Avg Throughput (/sec),Max Throughput,Threads,Latency P50 (ms),Latency P90 (ms),Latency P95 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,N Models,Workers per Model,Model details BERT base (bert-base-cased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1095.4,,8,58.3,,,65.0,Batch,2.20.0,Data Parallel,8,4,2,"fp32, sequence-length=128" BERT base (bert-base-uncased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1180.7,,8,40.6,,,45.0,Batch,2.20.0,Data Parallel,6,4,2,"fp32, sequence-length=128" DistilBERT base (distilbert-base-uncased-finetuned-sst-2-english),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1875.3,,8,33.7,,,54.1,Batch,2.20.0,Data Parallel,8,4,2,"fp32, sequence-length=128" DistilBERT base (distilbert-base-uncased),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1876.7,,8,33.7,,,53.2,Batch,2.20.0,Data Parallel,8,4,2,"fp32, sequence-length=128" DistilRoBERTa base (distilroberta-base),:compile-pt:`Compile ` + :benchmark-pt:`Benchmark `,HuggingFace,PyTorch 1.13.1,inf1.xlarge,4,128,1512.9,,8,15.0,,,25.9,Batch,2.20.0,Data Parallel,6,4,1,"fp32, sequence-length=128" BERT base,:ref:`HuggingFace Pretrained BERT `,,PyTorch 1.13,inf1.xlarge,,,1056,,,20,,,21,Batch,2.20.0,Data Parallel,4,,,"fp32, bert-base-cased-finetuned-mrpc, sequence-length=128" BERT base,:ref:`Using NeuronCore Pipeline `,,PyTorch 1.13,inf1.6xlarge,,,2009.1,,,5.9,,,6.3,Real Time,2.20.0,Model Pipeline,1,,,"fp32, bert-base-uncased, sequence-length=128" BERT base,:ref:`HuggingFace distilBERT with Tensorflow2 `,,Tensorflow 2.10,inf1.6xlarge,,,2123.4,,,30.0,,,32.2,Batch,2.20.0,Data Parallel,16,,,"fp32, distilbert-base-uncased-finetuned-sst-2-english, sequence-length=128" ================================================ FILE: about-neuron/benchmarks/inf2/inf2-performance.rst ================================================ .. _inf2-performance: Inf2 Inference Performance ========================== .. important:: The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder `_. .. contents:: Table of contents :local: :depth: 1 *Last update: Feb 26th, 2026* .. _inf2_inference_perf: Encoder Models -------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_encoder.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/second)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Sequence Length', 'Model Data Type','Compilation Autocast Data Type', 'OS Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (inference/second)'] = df['Throughput (inference/second)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_encoder.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/second)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Sequence Length', 'Model Data Type','Compilation Autocast Data Type', 'OS Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (inference/second)'] = df['Throughput (inference/second)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) Encoder-Decoder Models ---------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_encoder_decoder.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree', 'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True) int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: Only for Encoder-Decoder **Throughput (tokens/second)** counts both input and output tokens **Latency per Token** counts both input and output tokens .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_encoder_decoder.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree', 'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True) int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Throughput (tokens/second)** counts both input and output tokens **Latency per Token** counts both input and output tokens Vision Transformers Models -------------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_vision_transformers.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_vision_transformers.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) Convolutional Neural Networks (CNN) Models ------------------------------------------ .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_vision_cnn.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_vision_cnn.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) Stable Diffusion Models ----------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_vision_sd.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Cost per 1M images** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_vision_sd.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Cost per 1M images** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. Diffusion Transformer Models ---------------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_vision_dit.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M images', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Cost per 1M images** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_vision_dit.csv') df_prices = pd.read_csv('inf2_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M images'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Image Size','Scripts','Framework','Inst. Type','Task', 'Throughput (inference/sec)','Latency P50 (ms)','Latency P99 (ms)','Cost per 1M images','Application Type','Neuron Version','Run Mode','Batch Size','Model Data Type', 'Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Image Size', 'Cost per 1M images']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: **Cost per 1M images** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. .. note:: See :ref:`neuron_hw_glossary` for abbreviations and terms ================================================ FILE: about-neuron/benchmarks/inf2/inf2_instance_prices.csv ================================================ Inst. Type,RI-Effective hourly rate Inf2.xlarge,0.328 Inf2.48xlarge,5.608 Inf2.24xlarge,2.804 Inf2.8xlarge,0.850 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,144.98,29.47,42.05,7.41,7.68,Real Time,2.18.1,Tensor Parallel,24,1,8192,128,8064,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,133.63,209.37,232.02,7.47,7.57,Real Time,2.18.1,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,161.67,25,25.88,6.42,6.58,Real Time,2.18.1,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,153.58,101.81,110.6,6.5,6.6,Real Time,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,int8 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.84,745.49,749.48,34.67,35.06,Real Time,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.81,312.86,322.56,33.81,34.13,Real Time,2.18.1,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.16,310.18,315.23,33.14,34.29,Real Time,2.18.1,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.82,80,100.47,32.47,33.03,Real Time,2.18.1,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.9,99.37,142.62,32.48,32.86,Real Time,2.18.1,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,31.28,77.81,78.52,32.2,33.02,Real Time,2.18.1,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,156.1281689,27.63772011,33.7741375,6.46972656,7.07960129,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,145.1665497,29.20985222,33.39338303,7.34019279,7.80153275,Real Time,2.18.0,Tensor Parallel,24,1,8192,128,8064,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,112.520024,25.85077286,26.89838409,9.16552544,9.33074951,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,97.41527724,333.7800503,340.9907818,10.17355919,10.37788391,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,73.16747525,994.1797257,999.7954369,13.49759102,13.97609711,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.06356,76.59531,77.12364,32.89557,33.42032,Real Time,2.18.0,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.92419,96.4396,98.47379,33.13422,33.45966,Real Time,2.18.0,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.07017,76.33042,86.52544,33.15115,34.0786,Real Time,2.18.0,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.426,277.01592,280.12586,33.73241,34.01256,Real Time,2.18.0,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.91353,275.96617,284.77097,34.81936,35.43973,Real Time,2.18.0,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.32725,810.43696,814.87799,34.90329,35.14242,Real Time,2.18.0,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,149.7363908,27.34160423,29.20722961,6.86240196,7.07960129,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,81.7034129,557.9631329,562.8581047,7.86566734,11.64746284,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,95.99325977,539.5913124,557.1010113,10.32972336,10.61367989,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,112.7050057,27.0178318,33.24627876,9.12380219,9.38177109,Real Time,2.18.0,Tensor Parallel,24,1,4096,128,3968,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,97.52121418,338.6683464,340.4603005,10.15138626,10.55026054,Real Time,2.18.0,Tensor Parallel,24,1,8192,4096,4096,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,73.67826681,989.4962311,1000.655413,13.43631744,13.85569572,Real Time,2.18.0,Tensor Parallel,24,1,16384,8192,8192,FP16,Matmult-BF16,bf16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_encoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/second),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type albert-base-v2,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),2119.78480993,0.93722343,1.00183487,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 bert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),1998.20950133,0.99897385,1.04045868,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 bert-large-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),738.64502335,2.69365311,2.77733803,Real Time,2.25.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 distilbert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),3401.96550351,0.57864189,0.67734718,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 google/electra-base-discriminator,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),2020.45540243,0.9958744,1.04618073,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),1989.26102482,0.99945068,1.09100342,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 roberta-large,:benchmark-pt:`Benchmark `,PyTorch 2.8,Inf2.xlarge,Raw Output (AutoModel),738.88441011,2.69317627,2.77304649,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 xlm-roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.48xlarge,Raw Output (AutoModelForMaskedLM),48.80198341,40.66610336,51.05760336,Real Time,2.22.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_encoder_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type t5-3b,`Tutorial `_,NeuronX Distributed,Inf2.24xlarge,Text Generation,108.18,9.25,9.26,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 google/flan-t5-xl,`Tutorial `_,NeuronX Distributed,Inf2.24xlarge,Text Generation,117.6,8.5,8.53,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_vision.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type deepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Multimodal Autoencoding,0.83,1250,1271,Real Time,2.18.0,Data Parallel,1,FP32,None deepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 google/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,709.468,1.406,1.431,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 openai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,163.444,6.113,6.143,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 openai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,61.812,16.172,16.216,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 resnet18,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1385.04,0.72,0.75,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 resnet34,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1187.64,0.83,0.88,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 resnet50,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1044.93,0.95,0.98,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 resnet101,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,882.61,1.13,1.15,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 resnet152,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,736.91,1.35,1.39,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.421,2369.6,2406.8,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.549,1794.5,2103.7,Real Time,2.17.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.188,5306.7,5368.6,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.15,6701.4,6737.4,Real Time,2.17.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.073,13431.7,15739.0,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.8xlarge,Image Generation,0.078,12651.9,15053.9,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 UNet,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Segmentation,420.16,2.37,2.41,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 vgg11,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,524.10,1.90,1.96,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 vgg16,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,435.54,2.29,2.33,Real Time,2.14.0,Data Parallel,1,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_vision_cnn.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type resnet18,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,1669.796,0.596,0.613,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 resnet34,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,1394.211,0.718,0.726,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 resnet50,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,1218.875,0.83,0.846,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 resnet101,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,994.691,1.007,1.024,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 resnet152,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,837.784,1.185,1.219,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 UNet,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Segmentation,447.094,2.232,2.253,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 vgg11,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,629.189,1.59,1.605,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 vgg16,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,508.665,1.956,1.995,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_vision_dit.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type PixArt Alpha,256x256,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,1.975,502.587,537.258,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Alpha,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,0.565,1769.756,1775.697,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Sigma,256x256,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,1.86,540.832,548.41,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Sigma,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,0.543,1841.882,1850.683,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_vision_sd.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type Stable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.494,2023.741,2031.705,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.596,1679.805,1685.442,Real Time,2.21.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.187,5337.509,5357.361,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.133,7546.004,7550.984,Real Time,2.21.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.083,12048.659,12102.431,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.8xlarge,Image Generation,0.095,10546.45,10704.566,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/latency_data_vision_transformers.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type deepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Multimodal Autoencoding,0.853,1170.045,1232.056,Real Time,2.21.0,Data Parallel,1,FP32,None deepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 google/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,746.139,1.322,1.378,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 openai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,161.047,6.213,6.246,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 openai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,73.261,13.643,13.685,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,649.17,68.95,99.28,15.22,15.48,Batch,2.18.1,Tensor Parallel,24,8,8192,128,8064,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,521.96,1992.59,2016.73,15.31,15.64,Batch,2.18.1,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,859.09,66.02,75.73,10.45,10.76,Batch,2.18.1,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,759.15,823.53,832.84,10.5,11.02,Batch,2.18.1,Tensor Parallel,24,8,4096,2048,2048,FP16,Matmult-BF16,int8 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.84,745.49,749.48,34.67,35.06,Batch,2.18.1,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.81,312.86,322.56,33.81,34.13,Batch,2.18.1,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.16,310.18,315.23,33.14,34.29,Batch,2.18.1,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.82,80,100.47,32.47,33.03,Batch,2.18.1,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.9,99.37,142.62,32.48,32.86,Batch,2.18.1,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,31.28,77.81,78.52,32.2,33.02,Batch,2.18.1,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,725.82805,77.36206,87.27574,12.10523,13.05699,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,577.97078,80.11794,89.68878,16.39295,17.81178,Batch,2.18.0,Tensor Parallel,24,8,8192,128,8064,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,589.88712,108.80947,113.89017,14.89663,15.79142,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,351.75817,7083.72855,7158.32424,20.9856,21.80099,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,178.56973,5141.32094,5160.92515,21.70897,22.74466,Batch,2.18.0,Tensor Parallel,24,4,16384,8192,8192,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.06356,76.59531,77.12364,32.89557,33.42032,Batch,2.18.0,Tensor Parallel,24,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.92419,96.4396,98.47379,33.13422,33.45966,Batch,2.18.0,Tensor Parallel,24,1,512,256,256,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,30.07017,76.33042,86.52544,33.15115,34.0786,Batch,2.18.0,Tensor Parallel,24,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,29.426,277.01592,280.12586,33.73241,34.01256,Batch,2.18.0,Tensor Parallel,24,1,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.91353,275.96617,284.77097,34.81936,35.43973,Batch,2.18.0,Tensor Parallel,24,1,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,28.32725,810.43696,814.87799,34.90329,35.14242,Batch,2.18.0,Tensor Parallel,24,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,761.88605,77.62027,86.62724,11.63864,12.49599,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,450.37555,4740.11564,4783.75316,16.54649,17.52925,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,411.04655,11085.12306,11125.86117,18.01157,19.9585,Batch,2.18.0,Tensor Parallel,24,8,16384,8192,8192,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,546.51472,115.81421,121.49906,15.87224,17.21263,Batch,2.18.0,Tensor Parallel,24,8,4096,128,3968,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,333.24073,7115.97776,7231.01234,22.26758,23.81206,Batch,2.18.0,Tensor Parallel,24,8,8192,4096,4096,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,Inf2.48xlarge,Text Generation,178.79017,5136.61623,5192.58666,21.6732,22.73154,Batch,2.18.0,Tensor Parallel,24,4,16384,8192,8192,FP16,Matmult-BF16,bf16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_encoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/second),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type albert-base-v2,:benchmark-pt:`Benchmark `,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),3147.09984049,5.0675869,5.27883291,Batch,2.25.0,Data Parallel,8,128,FP32,Matmult-BF16,U22 bert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,Inf2.xlarge,Raw Output (AutoModel),2674.18956433,5.97381591,6.17100715,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22 bert-large-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Raw Output (AutoModel),950.0496231,8.41140747,8.84652853,Batch,2.21.0,Data Parallel,4,128,FP32,Matmult-BF16,U22 distilbert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,Inf2.xlarge,Raw Output (AutoModel),5307.87660777,6.01053237,6.23083114,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 google/electra-base-discriminator,:benchmark-pt:`Benchmark `,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),2889.75325068,11.02411747,11.97555304,Batch,2.25.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),2920.37954741,5.42390347,5.82957506,Batch,2.25.0,Data Parallel,8,128,FP32,Matmult-BF16,U22 roberta-large,:benchmark-pt:`Benchmark `,PyTorch 2.7,Inf2.xlarge,Raw Output (AutoModel),962.70185508,8.31007957,8.60977411,Batch,2.25.0,Data Parallel,4,128,FP32,Matmult-BF16,U22 xlm-roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.48xlarge,Raw Output (AutoModelForMaskedLM),51.13695938,625.66077709,694.93403673,Batch,2.22.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_encoder_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type t5-3b,`Tutorial `_,NeuronX Distributed,Inf2.24xlarge,Text Generation,111.92,8.97,8.98,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 google/flan-t5-xl,`Tutorial `_,NeuronX Distributed,Inf2.24xlarge,Text Generation,117.61,8.51,8.53,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_vision.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type deepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Multimodal Autoencoding,0.83,1250,1271,Real Time,2.18.0,Data Parallel,1,FP32,None deepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 google/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1632.359,4.716,5.902,Batch,2.14.0,Data Parallel,2,FP32,Matmult-BF16 openai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,5178.833,48.973,57.002,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16 openai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,200.997,78.331,92.452,Batch,2.14.0,Data Parallel,4,FP32,Matmult-BF16 resnet18,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,6635.04,4.80,4.88,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16 resnet34,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,4848.72,6.56,6.66,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16 resnet50,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,4269.12,7.49,7.55,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16 resnet101,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,3066.24,83.38,83.56,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16 resnet152,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,2323.20,110.06,110.21,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16 Stable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.421,2369.6,2406.8,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.549,1794.5,2103.7,Real Time,2.17.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.188,5306.7,5368.6,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.15,6701.4,6737.4,Real Time,2.17.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Generation,0.073,13431.7,15739.0,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.8xlarge,Image Generation,0.078,12651.9,15053.9,Real Time,2.17.0,Data Parallel,1,FP32,Matmult-BF16 UNet,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Segmentation,866.96,18.37,18.86,Batch,2.14.0,Data Parallel,4,FP32,Matmult-BF16 vgg11,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,3955.20,64.15,64.24,Batch,2.14.0,Data Parallel,64,FP32,Matmult-BF16 vgg16,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,1964.16,16.27,16.35,Batch,2.14.0,Data Parallel,8,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_vision_cnn.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type resnet18,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,6949.174,4.587,4.659,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16 resnet34,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,5158.607,6.18,6.251,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16 resnet50,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,4393.304,7.283,7.331,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16 resnet101,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,3164.991,80.818,80.938,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16 resnet152,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,2449.875,104.406,104.531,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16 UNet,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Segmentation,1010.803,15.818,15.875,Batch,2.21.0,Data Parallel,4,FP32,Matmult-BF16 vgg11,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,4734.402,54.044,54.09,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16 vgg16,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,2161.392,14.77,14.832,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_vision_dit.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type PixArt Alpha,256x256,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,1.975,502.587,537.258,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Alpha,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,0.565,1769.756,1775.697,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Sigma,256x256,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,1.86,540.832,548.41,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 PixArt Sigma,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.1,Inf2.xlarge,Image Generation,0.543,1841.882,1850.683,Real Time,2.20,Data Parallel,1,"""FP32, BF16""",Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_vision_sd.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type Stable Diffusion 1.5,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.494,2023.741,2031.705,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2.1,512x512,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.596,1679.805,1685.442,Real Time,2.21.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion 2.1,768x768,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.187,5337.509,5357.361,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion 2 Inpainting,936x624,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.133,7546.004,7550.984,Real Time,2.21.0,Data Parallel,1,"FP32, BF16",Matmult-BF16 Stable Diffusion XL Base,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Generation,0.083,12048.659,12102.431,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 Stable Diffusion XL Base & Refiner,1024x1024,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.8xlarge,Image Generation,0.095,10546.45,10704.566,Real Time,2.21.0,Data Parallel,1,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/inf2/throughput_data_vision_transformers.csv ================================================ Model,Image Size,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Model Data Type,Compilation Autocast Data Type deepmind/multimodal-perceiver,16x224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Multimodal Autoencoding,0.853,1170.045,1232.056,Real Time,2.21.0,Data Parallel,1,FP32,None deepmind/vision-perceiver-learned,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,99.6,18.6,18.7,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-fourier,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,67.9,29.5,29.68,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 deepmind/vision-perceiver-conv,224x224,:benchmark-pt:`Benchmark `,PyTorch 1.13.1,Inf2.xlarge,Image Classification,126.5,14.14,14.2,Real Time,2.18.0,Data Parallel,1,FP32,Matmult-BF16 google/vit-base-patch16-224,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,1955.406,4.087,4.125,Batch,2.21.0,Data Parallel,2,FP32,Matmult-BF16 openai/clip-vit-base-patch32,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,6509.83,135.806,136.003,Batch,2.21.0,Data Parallel,64,FP32,Matmult-BF16 openai/clip-vit-large-patch14,224x224,:benchmark-pt:`Benchmark `,PyTorch 2.5,Inf2.xlarge,Image Classification,285.938,113.117,115.940,Batch,2.21.0,Data Parallel,8,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/trn1/latency_data_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,157.25202,17.09,21.62,7.03,7.16,Real Time,2.18.1,Tensor Parallel,32,1,8192,128,8064,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,140.50031,153.02,159.13,7.04,7.13,Real Time,2.18.1,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,178.18923,14.75,22.94,5.86,6,Real Time,2.18.1,Tensor Parallel,32,1,4096,128,3968,FP16,Matmult-BF16,int8 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,37.70379,547,553.89,26.2,26.79,Real Time,2.18.1,Tensor Parallel,32,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,40.63808,53.2,59.5,24.48,26.17,Real Time,2.18.1,Tensor Parallel,32,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,40.80995,52.53,52.79,26.48,24.22,Real Time,2.18.1,Tensor Parallel,32,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,161.7081305,13.32402229,14.1210556,6.69956207,6.84595108,Real Time,2.18.0,Tensor Parallel,32,1,8192,128,8064,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,60.43330245,864.1381264,865.9124374,9.84406471,10.14947891,Real Time,2.18.0,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,31.3990051,2367.928505,2369.139671,13.40842247,15.76948166,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,39.28574,53.91026,54.9469,25.18129,26.58272,Real Time,2.18.0,Tensor Parallel,32,1,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,39.17668,81.882,98.77896,25.26712,25.7585,Real Time,2.18.0,Tensor Parallel,32,1,512,256,256,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,39.16379,57.75213,64.75568,25.44856,26.1333,Real Time,2.18.0,Tensor Parallel,32,1,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,38.09518,232.47981,239.02893,26.03793,26.17574,Real Time,2.18.0,Tensor Parallel,32,1,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,37.70947,236.78207,241.14895,26.62468,27.02999,Real Time,2.18.0,Tensor Parallel,32,1,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,36.78021,690.95588,695.91761,26.85046,27.04263,Real Time,2.18.0,Tensor Parallel,32,1,4096,2048,2048,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,49.55890938,1322.874308,1325.857162,9.89246368,10.18333435,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,60.21552741,868.635416,870.9816933,9.86456871,10.24436951,Real Time,2.18.0,Tensor Parallel,32,1,8192,4096,4096,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,31.37781421,2372.928381,2375.921965,13.3998394,13.79013062,Real Time,2.18.0,Tensor Parallel,32,1,16384,8192,8192,FP16,Matmult-BF16,bf16 ================================================ FILE: about-neuron/benchmarks/trn1/latency_data_encoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type albert-base-v2,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2321.97758889,0.85997581,0.9086132,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 bert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.8,trn1.2xlarge,Raw Output (AutoModel),2085.45272427,0.94294548,1.02853775,Real Time,2.26.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 bert-large-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),747.48212826,2.66885757,2.73442268,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 distilbert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3672.38478861,0.54264069,0.58531761,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 google/electra-base-discriminator,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2127.07474023,0.93317032,0.9958744,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),2094.37288172,0.95796585,1.00588799,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 roberta-large,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),747.58300171,2.66981125,2.73323059,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 xlm-roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.32xlarge,Raw Output (AutoModelForMaskedLM),46.89836990,42.62268543,44.11746978,Real Time,2.27.0,Data Parallel,1,128,FP32,Matmult-BF16,U22 ================================================ FILE: about-neuron/benchmarks/trn1/latency_data_encoder_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type t5-3b,`Tutorial `_,NeuronX Distributed,trn1.32xlarge,Text Generation,110.23,9.07,9.12,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 google/flan-t5-xl,`Tutorial `_,NeuronX Distributed,trn1.32xlarge,Text Generation,120.29,8.31,8.34,Real Time,2.18.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/trn1/throughput_data_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Output Token Throughput (tokens/sec),TTFT Latency P50 (ms),TTFT Latency P99 (ms),TPOT Latency P50 (ms),TPOT Latency P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type,Weight Storage Data Type Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,933.50053,55.16,61.47,9.95,10.1,Batch,2.18.1,Tensor Parallel,32,8,8192,128,8064,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,770.16291,1265.95,1292.94,10.04,10.33,Batch,2.18.1,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,int8 Llama-3-8B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,1142.69582,49.05,52.79,7.65,7.94,Batch,2.18.1,Tensor Parallel,32,8,4096,128,3968,FP16,Matmult-BF16,int8 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,120.3614,1661.12,1672.71,32.33,33.27,Batch,2.18.1,Tensor Parallel,32,4,4096,2048,2048,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,140.51039,129.86,132.03,28.38,29.11,Batch,2.18.1,Tensor Parallel,32,4,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-3-70B,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,138.01357,130.37,130.48,28.08,28.53,Batch,2.18.1,Tensor Parallel,32,4,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-7b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,917.2452652,66.4024353,70.63961029,10.09511948,10.46204567,Batch,2.18.0,Tensor Parallel,32,8,8192,128,8064,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,371.7031,6668.70475,6689.8005,19.85741,21.0557,Batch,2.18.0,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,bf16 Llama-2-13b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,184.28337,4628.44729,4635.24675,21.09194,22.3856,Batch,2.18.0,Tensor Parallel,32,4,16384,8192,8192,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,141.45357,156.84581,158.41317,26.72362,30.16973,Batch,2.18.0,Tensor Parallel,32,4,256,128,128,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,143.42503,270.15853,270.55573,26.9084,27.90999,Batch,2.18.0,Tensor Parallel,32,4,512,256,256,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,145.12799,156.68869,161.41367,27.21453,30.60174,Batch,2.18.0,Tensor Parallel,32,4,1152,128,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,133.25056,1478.64008,1479.77638,28.55039,29.49882,Batch,2.18.0,Tensor Parallel,32,4,2048,1024,1024,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,129.27628,1478.84846,1482.93161,31.67439,32.01842,Batch,2.18.0,Tensor Parallel,32,4,3072,1024,2048,FP16,Matmult-BF16,bf16 Llama-2-70b,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,120.62953,2722.03422,2730.95036,31.78978,33.2315,Batch,2.18.0,Tensor Parallel,32,4,4096,2048,2048,FP16,Matmult-BF16,bf16 Mistral-7B-Instruct-v0.2,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,484.5773,8614.85291,8630.24068,15.43713,15.9421,Batch,2.18.0,Tensor Parallel,32,8,16384,8192,8192,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,370.97736,6625.1595,6628.26467,19.91653,20.94936,Batch,2.18.0,Tensor Parallel,32,8,8192,4096,4096,FP16,Matmult-BF16,bf16 CodeLlama-13b-hf,:llama-sample:`Sample `,Transformers NeuronX,trn1.32xlarge,Text Generation,184.17898,4626.17469,4630.66864,21.09528,22.16578,Batch,2.18.0,Tensor Parallel,32,4,16384,8192,8192,FP16,Matmult-BF16,bf16 ================================================ FILE: about-neuron/benchmarks/trn1/throughput_data_encoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (inference/sec),Latency P50 (ms),Latency P99 (ms),Application Type,Neuron Version,Run Mode,Batch Size,Sequence Length,Model Data Type,Compilation Autocast Data Type,OS Type albert-base-v2,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3442.53392946,9.28854942,9.35173273,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 bert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3421.56625089,9.34481621,9.41992044,Batch,2.27.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 bert-large-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),1104.43610458,7.24101067,7.29799271,Batch,2.27.0,Data Parallel,4,128,FP32,Matmult-BF16,U22 distilbert-base-uncased,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),6369.44180331,5.00988960,5.09214401,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 google/electra-base-discriminator,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3425.55803570,9.32765007,9.45640087,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),3378.10764201,9.46044921,9.53317165,Batch,2.28.0,Data Parallel,16,128,FP32,Matmult-BF16,U22 roberta-large,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.2xlarge,Raw Output (AutoModel),1123.90475943,14.23048973,14.30106163,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22 xlm-roberta-base,:benchmark-pt:`Benchmark `,PyTorch 2.9,trn1.32xlarge,Raw Output (AutoModelForMaskedLM),46.68898543,342.50581264,350.86465597,Batch,2.27.0,Data Parallel,8,128,FP32,Matmult-BF16,U22 ================================================ FILE: about-neuron/benchmarks/trn1/throughput_data_encoder_decoder.csv ================================================ Model,Scripts,Framework,Inst. Type,Task,Throughput (tokens/second),Latency per Token P50 (ms),Latency per Token P99 (ms),Application Type,Neuron Version,Run Mode,TP Degree,DP Degree,Batch Size,Sequence Length,Input Length,Output Length,Model Data Type,Compilation Autocast Data Type t5-3b,`Tutorial `_,NeuronX Distributed,trn1.32xlarge,Text Generation,116.29,8.58,8.66,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 google/flan-t5-xl,`Tutorial `_,NeuronX Distributed,trn1.32xlarge,Text Generation,122.52,8.16,8.19,Batch,2.17.0,Tensor Parallel,8,1,1,128,128,84,FP32,Matmult-BF16 ================================================ FILE: about-neuron/benchmarks/trn1/training_data_decoder.csv ================================================ Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Sequence Length, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type. Llama-3.1-8B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,32,TP=32 DP=32 PP=1 ZeRO-1,1,1024,AdamW,8192,47.95,strong scaling,2.24.0,`NeuronX Distributed `_,2.7.0.2.8.6896,U22 Llama-3.1-70B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,32,TP=32 DP=4 PP=8,1,1024,AdamW,8192,7.94,strong scaling,2.24.0,`NeuronX Distributed `_,2.7.0.2.8.6896,U22 ================================================ FILE: about-neuron/benchmarks/trn1/training_data_encoder.csv ================================================ Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Sequence Length, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type. HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),16,1048576,Lamb,128,57407.9207,weak scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,FP32,16,[32xNC(DP)] x 16Nodes(DP),8,1048576,Lamb,128,32362.6714,weak scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,16384,AdamW,128,3826.6103,strong scaling,2.28.0,:ref:`hf-bert-pretraining-tutorial`,2.9.0.2.12.21983, U22 ================================================ FILE: about-neuron/benchmarks/trn1/training_data_vision_transformers.csv ================================================ Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Performance [seq/sec],Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type. HuggingFace ViT-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,BF16,1,[32xNC(DP)],64,2048,AdamW,6587.25,weak scaling,2.25.0,`ViT-Base Fine-tuning Example `_,2.7.0.2.9.0, U22 ================================================ FILE: about-neuron/benchmarks/trn1/trn1-inference-performance.rst ================================================ .. _trn1-inference-performance: Trn1/Trn1n Inference Performance ================================ .. important:: The benchmark scripts linked on this page are provided for historical reference only and are not tested with recent versions of the Neuron SDK. They have been moved to the `archive folder `_. .. contents:: Table of contents :local: *Last update: Feb 26th, 2026* .. _NLP: Encoder Models -------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_encoder.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size','Sequence Length', 'Model Data Type','Compilation Autocast Data Type','OS Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_encoder.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (inference/sec)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (inference/sec)', 'Latency P50 (ms)', 'Latency P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'Batch Size','Sequence Length', 'Model Data Type','Compilation Autocast Data Type','OS Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (inference/sec)'] = df['Throughput (inference/sec)'].round(2).astype('float',copy=True) int_cols = ['Latency P50 (ms)', 'Latency P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) Encoder-Decoder Models ---------------------- .. tab-set:: .. tab-item:: Throughput optimized .. df-table:: :header-rows: 1 df = pd.read_csv('throughput_data_encoder_decoder.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree', 'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True) int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: Only for Encoder-Decoder **Throughput (tokens/second)** counts both input and output tokens **Latency per Token** counts both input and output tokens Applicable to all models **Cost per 1M inferences** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. .. tab-item:: Latency optimized .. df-table:: :header-rows: 1 df = pd.read_csv('latency_data_encoder_decoder.csv') df_prices = pd.read_csv('trn1_instance_prices.csv') df = pd.merge(df,df_prices,on='Inst. Type') df['Cost per 1M inferences'] = ((1.0e6 / df['Throughput (tokens/second)']) * (df['RI-Effective hourly rate'] / 3.6e3 )).map('${:,.3f}'.format) cols_to_show = ['Model','Scripts','Framework', 'Inst. Type', 'Task', 'Throughput (tokens/second)', 'Latency per Token P50 (ms)', 'Latency per Token P99 (ms)', 'Cost per 1M inferences', 'Application Type', 'Neuron Version', 'Run Mode', 'TP Degree', 'DP Degree', 'Batch Size', 'Sequence Length', 'Input Length', 'Output Length', 'Model Data Type','Compilation Autocast Data Type'] df = df[cols_to_show].sort_values(['Model', 'Cost per 1M inferences']) df['Throughput (tokens/second)'] = df['Throughput (tokens/second)'].round(2).astype('float',copy=True) int_cols = ['Latency per Token P50 (ms)', 'Latency per Token P99 (ms)'] df[int_cols] = df[int_cols].round(2).astype('float',copy=True) .. note:: Only for Encoder-Decoder **Throughput (tokens/second)** counts both input and output tokens **Latency per Token** counts both input and output tokens .. note:: **Cost per 1M inferences** is calculated using RI-Effective hourly rate. **Real Time** application refers to batch size 1 inference for minimal latency. **Batch** application refers to maximum throughput with minimum cost-per-inference. ================================================ FILE: about-neuron/benchmarks/trn1/trn1-training-performance.rst ================================================ .. _trn1-training-performance: Trn1/Trn1n Training Performance =============================== This section provides benchmark results for training various deep learning models on AWS Trn1 and Trn1n instances powered by AWS Trainium chips. The benchmarks cover a range of model architectures, including encoder models, decoder models, and vision transformers, demonstrating the performance capabilities of Trn1/Trn1n instances for different training workloads. **Last update: February 19th, 2026** .. contents:: Table of contents :local: .. _NLP: Encoder Models -------------- .. csv-table:: :file: training_data_encoder.csv :header-rows: 1 Decoder Models -------------- .. csv-table:: :file: training_data_decoder.csv :header-rows: 1 .. note:: **TP (Tensor Parallel), PP (Pipeline Parallel) and DP (Data Parallel) Topology** configuration refers to the degrees of 3D Parallelism (How the model and data is sharded across NeuronCores). TP and PP are specified in the run script and DP is calculated by dividing **world size** (Number of nodes/instances * Number of neuron cores per instance) by TP * PP degrees. For example, ``TP = 4, PP = 4`` and`` Number of instances is 32 (trn1.32xlarge)``. The world size will be: ``32(num instances) * 32(Neuron Cores per instance) = 1024``. Now, ``DP degree = 1024 (World size)/ 4 (TP) * 4 (PP) = 64`` For more information on batch sizes please refer to :ref:`neuron-batching` Vision Transformer Models -------------------------- .. csv-table:: :file: training_data_vision_transformers.csv :header-rows: 1 .. note:: Read more about strong vs weak scaling here :ref:`neuron-training-faq` ================================================ FILE: about-neuron/benchmarks/trn1/trn1_instance_prices.csv ================================================ Inst. Type,RI-Effective hourly rate trn1.2xlarge,0.512 trn1.32xlarge,8.197 ================================================ FILE: about-neuron/benchmarks/trn1/trn1_trn1n_nlp_data.csv ================================================ Model,Instance-Type,Training Data-Type,Nodes,Topology,Microbatch,Globalbatch, Optimizer, Performance [seq/sec],MFU[%],ComputeCostPerToken(Tflops),Strong/Weak Scaling,Neuron Version,Neuron Tutorial/Example,Pytorch Neuron(torch-neuronx) Version, OS Type. HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),16,1048576,Lamb,53069,25.83,,weak scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20 HuggingFace BERT-Large Ph2 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,16,[32xNC(DP)] x 16Nodes(DP),2,524288,Lamb,7507,15.5,,weak scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16/AMP,16,[32xNC(DP)] x 16Nodes(DP),16,16384,AdamW,24518.47,,,strong scaling,2.14.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.11.0, U20 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,FP32,16,[32xNC(DP)] x 16Nodes(DP),8,1048576,Lamb,28432,13.83,,weak scaling,2.14.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,16384,AdamW,3530,27.49,,strong scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20 HuggingFace BERT-Large Ph1 pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],16,65536,Lamb,3733,29.07,,strong scaling,2.15.0,:ref:`hf-bert-pretraining-tutorial`,1.13.1.1.12.0, U20 GPT3-23B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=32 PP=4,1,1024,AdamW,100,29.65,289,strong scaling,2.15.0,`nemo-megatron `_,1.13.1.1.12.0, U20 GPT3-46B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=16 PP=8,1,1024,AdamW,47.2,27.7,578,strong scaling,2.15.0,`nemo-megatron `_,1.13.1.1.12.0, U20 GPT3-175B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=32 DP=4 PP=8,1,1024,AdamW,12.7,33.14,2197,strong scaling,2.13.0,`nemo-megatron `_,1.13.1.1.10.0, U20 Llama2-7B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=4 PP=4,1,1024,AdamW,82,14.8,336,strong scaling,2.15.0,`nemo-megatron `_,1.13.1.1.12.0, U20 Llama2-13B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,32,TP=8 DP=4 PP=4,1,1024,AdamW,60,20.7,336,strong scaling,2.15.0,`nemo-megatron `_,1.13.1.1.12.0, U20 Llama2-7B pre-training,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+FP32Optimizer,16,TP=8 DP=64,1,1024,AdamW,81,30.8,,strong scaling,2.15.0,`neuronx-distributed `_,1.13.1.1.12.0, U20 HuggingFace ViT-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],64,2048,AdamW,5232.78,,,weak scaling,2.17.0,`ViT-Base Fine-tuning Example `_,1.13.1.1.13.0, U20 HuggingFace CLIP-Base fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],80,2560,AdamW,5152.76,,,weak scaling,2.17.0,`CLIP-Base Fine-tuning `_,1.13.1.1.13.0, U20 HuggingFace Vision-Perveriver-Conv fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],4,128,AdamW,423.32,,,weak scaling,2.17.0,`Vision Perceiver Conv Fine-tuning `_,1.13.1.1.13.1, U20 HuggingFace Language-Perveriver fine-tuning,trn1.32xlarge/trn1n.32xlarge,Autocast:BF16+SR,1,[32xNC(DP)],20,640,AdamW,1407.02,,,weak scaling,2.17.0,`Language Perceiver Fine-tuning `_,1.13.1.1.13.1, U20 ================================================ FILE: about-neuron/beta-participation.rst ================================================ .. meta:: :description: Information about participating in the AWS Neuron SDK beta program. :date-modified: 12/19/2025 Participate in the AWS Neuron SDK Beta Program =============================================== AWS Neuron SDK users can participate in our beta program to get early access to new features and improvements. By joining the beta program, you can provide valuable feedback that helps us enhance the AWS Neuron SDK for everyone. Currently, we are taking requests to join our Beta program for the new Neuron Kernel Interface and its associated features. If you are interested in participating, `fill out this online form `__ and we'll get back to you! Read more about the new NKI features `here `__. .. admonition:: Disclaimer Beta features are not recommended for production workloads. They may contain bugs or incomplete functionality. Use them at your own risk and provide feedback to help us improve. ================================================ FILE: about-neuron/calculator/neuron-calculator.rst ================================================ .. _neuron_calculator: Neuron Calculator ================= .. raw:: html

Number of NeuronCores needed for LLM Inference

Please enter model configurations (You can enter multiple values of each hyperparameter. Press enter after adding each value in the text field)
.. raw:: html ================================================ FILE: about-neuron/faq/contributing-faq.rst ================================================ .. _contribute-faq: Contributing Guidelines FAQs ============================ .. contents:: Table of contents :local: :depth: 1 Whether it's a bug report, new feature, correction, or additional documentation, we greatly value feedback and contributions from our community. Please read through this document before submitting any issues or pull requests to ensure we have all the necessary information to effectively respond to your bug report or contribution. How to reporting Bugs/Feature Requests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We welcome you to use the GitHub issue tracker to report bugs or suggest features. When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: - A reproducible test case or series of steps - The version of our code being used - Any modifications you've made relevant to the bug - Anything unusual about your environment or deployment Contributing via Pull Requests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 1. You are working against the latest source on the *master* branch. 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. To send us a pull request, please: 1. Fork the repository. 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 3. Ensure local tests pass. 4. Commit to your fork using clear commit messages. 5. Send us a pull request, answering any default questions in the pull request interface. 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. GitHub provides additional document on `forking a repository `__ and `creating a pull request `__. How to find contributions to work on ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. What is the code of conduct ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This project has adopted the `Amazon Open Source Code of Conduct `__. For more information see the `Code of Conduct FAQ `__ or contact opensource-codeofconduct@amazon.com with any additional questions or comments. How to notify for a security issue ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our `vulnerability reporting page `__. Please do **not** create a public github issue. What is the licensing ~~~~~~~~~~~~~~~~~~~~~~~~ See the `link `_ and `link `_ files for our project's licensing. We will ask you to confirm the licensing of your contribution. We may ask you to sign a `Contributor License Agreement (CLA) `__ for larger changes. ================================================ FILE: about-neuron/faq/index.rst ================================================ .. _neuron_faq: Other Neuron FAQs ================= Frequently asked questions about AWS Neuron SDK, covering general topics, inference, training, ONNX support, and contributing guidelines. .. note:: This content may not be be up to date as of 2026, and often pertains to older or now-unsupported platforms and components. General FAQs ------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: neuron2-intro-faq :link-type: doc :class-card: sd-border-1 **Neuron 2.x Introduction FAQ** ^^^ Common questions about Neuron 2.x and Trn1 general availability .. grid-item-card:: :link: onnx-faq :link-type: doc :class-card: sd-border-1 **ONNX FAQ** ^^^ Using ONNX models with AWS Neuron .. grid-item-card:: :link: contributing-faq :link-type: doc :class-card: sd-border-1 **Contributing Guidelines FAQ** ^^^ How to report bugs, request features, and contribute to Neuron Inference FAQs --------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: inference/neuron-faq :link-type: doc :class-card: sd-border-1 **Inference with Neuron FAQ** ^^^ Common questions about running inference workloads on AWS Neuron .. grid-item-card:: :link: inference/trouble-shooting-faq :link-type: doc :class-card: sd-border-1 **Troubleshooting for Inf1 FAQ** ^^^ Debugging and troubleshooting inference issues on Inf1 instances Training FAQs ------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: training/neuron-training :link-type: doc :class-card: sd-border-1 **Training with Neuron FAQ** ^^^ Common questions about training models on Trainium instances .. toctree:: :maxdepth: 1 :hidden: Neuron 2.x Introduction FAQ ONNX FAQ Contributing Guidelines FAQ Inference with Neuron FAQ Troubleshooting for Inf1 FAQ Training with Neuron FAQ ================================================ FILE: about-neuron/faq/inference/neuron-faq.rst ================================================ .. _neuron-f1-faq: Inference with Neuron - FAQ --------------------------- .. contents:: Table of contents :local: :depth: 1 What ML model types and operators are supported by AWS Neuron? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AWS Neuron includes a compiler that converts your trained machine learning models to a binary object for execution. The Neuron compiler supports many commonly used machine learning operators used in computer vision, natural language processing, recommender engines and more. A list of supported ML operators and supported inputs are in :ref:`neuron-supported-operators` . It's important to mention that to get good performance doesn't require all of the model operators to run on the chip. In many cases, some of the operators will continue to run on the instance CPUs, like the case of embeddings or image pre-processing, and will still provide a compelling end to end performance. We call this approach auto-partitioning, where the Neuron compiler optimizes the model execution based on operators that are most suitable to run on the CPU or the chip. For the latest model architecture support, please refer to the model architecture fit and performance pages. Why is a compiler needed, and how do I use it? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Neuron compiler converts a model from a framework level Neural Network graph, with operators like convolution and pooling, into a Neuron Device-specific instruction set, builds the schedule for execution of these instructions, and converts the model parameters into format that the neuron device can consume. The supported input formats include TensorFlow, PyTorch, and MXNet. The output from the compiler is a Neuron Executable File Format (NEFF) artifact. The NEFF contains a combination of binary code, the model parameters, and additional meta-data needed by the Neuron runtime and profiler. I am using a ML framework today – what will change for me to use this? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To use Inferentia within the Inf1 instances, the developer needs to perform one-time compilation of the pre-trained model to generate a NEFF, and use this as the inference model in fleet of Inf1 instances. - :doc:`TensorFlow Neuron ` - :ref:`neuron-pytorch` - :ref:`neuron-mxnet` What is a NeuronCore Pipeline? How do I take advantage of it? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A NeuronCore Pipeline is a unique technique to shard a specific Neural Network across multiple NeuronCores, to take advantage of the large on-chip cache instead of moving data in and out of external memory. The result is an increased throughput and reduce latency typically important for real-time inference applications. All Inf1 instances support it, and the Inf1 instances with multiple Inferentia accelerators, such as inf1.6xlarge or inf1.24xlarge support it thanks to the fast chip-to-chip interconnect. Developers can choose to use NeuronCore Pipeline mode during compile stage, with an opt-in flag. :ref:`neuron-cc` provides further details. NeuronCores, NeuronCore Groups and NeuronCore Pipelines: What do they do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each Inferentia chip has four compute engines called NeuronCores. A NeuronCore Group is a way to aggregate NeuronCores to increase hardware utilization and assign models with the right compute sizing for a specific application. If you want to run multiple models in parallel, you can assign different models to separate NeuronCore Groups. A model compiled to use multiple NeuronCores in a NeuronCore Pipeline can be assigned to a NeuronCore Group with enough NeuronCores to load into. Finally- it is also possible for sets of Inferentia devices to be mapped to separate Neuron Runtimes. :ref:`neuron-features-index` section has more information and examples. Can I use TensorFlow networks from tfhub.dev as-is ? if not, what should I do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Models format can be imported into TensorFlow, either as a standard model-server, in which case it appears as a simple command line utility, or via the Python based TensorFlow environment. The primary additional step needed is to compile the model into Inferentia NEFF format. ================================================ FILE: about-neuron/faq/inference/trouble-shooting-faq.rst ================================================ .. _trouble-shooting-inf1-faq: Troubleshooting for Inf1 - FAQ ============================== .. contents:: Table of contents :local: :depth: 1 Performance is not what I expect it to be, what's the next step? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Please check our performance optimization section on performance tuning and other notes on how to use pipelining and batching to improve performance. Do I need to worry about size of model and size of inferentia memory? what problems can I expect to have? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Errors like this will be logged and can be found as shown :ref:`neuron_gatherinfo`. How can I debug / profile my inference request? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ See :ref:`neuron-plugin-tensorboard` How to report Bug/Feature Requests ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We welcome you to use the Neuron GitHub issue tracker to report bugs or suggest features. When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: - A reproducible test case or series of steps - The version of our code being used - Any modifications you've made relevant to the bug - Anything unusual about your environment or deployment ================================================ FILE: about-neuron/faq/neuron2-intro-faq.rst ================================================ .. _neuron2-intro-faq: Neuron 2.x Introduction at Trn1 GA - FAQ ---------------------------------------- .. contents:: Table of contents :local: :depth: 1 .. include:: /release-notes/templates/n2.x-trn1-ga-faq.txt ================================================ FILE: about-neuron/faq/onnx-faq.rst ================================================ .. _onnx-faq: ONNX FAQ --------- .. contents:: Table of contents :local: :depth: 1 Can I use ONNX models with Neuron ? If not, what should I do? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AWS Neuron does not directly support compilation of models in the ONNX file format. The recommended way to compile a model that is in the ONNX file format is to first convert the model to PyTorch using a publicly available tool like `onnx2pytorch `_ . Once the ONNX model is converted to PyTorch, it can then be compiled with the :func:`torch_neuron.trace` function to produce a model that can run on Neuron. ================================================ FILE: about-neuron/faq/roadmap-faq.rst ================================================ .. _neuron_roadmap_faq: Roadmap FAQ =========== .. contents:: Table of contents :local: :depth: 1 Why did you build this? ~~~~~~~~~~~~~~~~~~~~~~~ A: We know that our customers are making decisions and plans based on what we are developing, and we want to provide them with the right visibility to what we are working on, as well as the opportunity to provide direct feedback. What do the roadmap categories mean? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **Roadmap Requests** - Requests we received and we are considering to add to the roadmap, this is a great phase to give us feedback and let us know if you need this feature as well. - **Working on it** - In progress, we might still be working through the implementation details, or scoping stuff out. This is a great phase to give us feedback as to how you want to see something implemented. We’ll benefit from your specific use cases here. - **Completed** - Feature complete and supported by Neuron. Why are there no dates on your roadmap? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: We are not providing exact target dates for releases because we prioritize operational excellence, security and quality over hitting a specific date. If you have an urgent need for a feature, please contact us directly at aws-neuron-support@amazon.com. Is everything on the roadmap? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: We are focusing on upgrades for existing features, as well as building new features. We will keep adding features and capabilities to this roadmap as time progresses. How can I provide feedback or ask for more information? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: When in doubt, please create an issue or post a question on the `AWS Neuron support forum `__. How can I request a feature be added to the roadmap? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: We encourage you to open an issue. All community-submitted issues will be reviewed by the roadmap maintainers. Can I "+1" existing issues? ~~~~~~~~~~~~~~~~~~~~~~~~~~~ A:We strongly encourage you to do so, as it helps us understand which issues will have the widest impact. You can navigate to the issue details page and add a reaction (thumbs up). There are six types of reactions supported (thumbs down “-1”, confused, heart, watching, laugh, hooray, and thumbs up +1). ================================================ FILE: about-neuron/faq/training/neuron-training.rst ================================================ .. _neuron-training-faq: Training with Neuron - FAQ ========================== .. contents:: Table of contents :local: :depth: 2 Compute ------- How do I get started with training my model on Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you select your machine learning framework, you can get started here: :ref:`docs-quick-links` How do I setup EFA for multi-node training? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For setting up EFA that is needed for multi-node training, please see :ref:`setup-trn1-multi-node-execution` How do I know if I can train my models with Trainium? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We aim to support a broad set of models and distribution libraries. We continuously add more capabilities and enable new features via Neuron SDK releases and suggest you will follow our public roadmap and join our slack and email lists. How should I size Trainium NeuronCores vs GPUs? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For simplicity, you should consider each NeuronCore within your instances as an independent deep learning compute engine, the equivalent of a GPU. As point of comparison, a trn1.32xlarge has 32 NeuronCores, and their max performance is 40% higher than of P4d for BF16/FP16/FP8, 2.5X faster for TF32, and 5X faster for FP32. Each NeuronCore is independent and connected to the rest of the NeuronCores within the instance via NeuronLink, and across instances with EFA. Each NeuronCore has also full access to the accelerator memory in the instance, which helps scale large models across NeuronCores using various collective compute ops techniques. What are the time to train advantages of Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ While the answer is largely model dependent, training performance on Trn1 is fast due thanks for multiple system wide optimizations working in concert. Dependent on the data type, you should expect between 1.4-5X higher throughput on Trn1 as compared to the latest GPUs instances (P4d). For distributed workloads, 800Gbps EFA gives customers lower latency, and 2x the throughput as compared to P4d. (a Trn1n 1.6Tb option is coming soon). Each Trainium also has a dedicated collective compute (CC) engine, which enables running the CC ops in parallel to the NeuronCores compute. This enables another 10-15% acceleration of the overall workload. Finally, stochastic rounding enables running at half precision speeds (BF16) while maintaining accuracy at near full precision, this is not only simplifying model development (no need for mixed precision) it also helps the loss function converge faster and reduce memory footprint. What are some of the training performance results for Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ They are great! please refer to the :ref:`benchmark` page for open-source model performance results. We encourage you to try it for your own models/application. Can I use CUDA libraries with AWS Trainium? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AWS Trainium and Neuron are plugged into popular frameworks, and is automatically optimizing model deployment on Neuron devices like Inferentia and Trainium. The Neuron SDK automatically optimizes for Trainium without using closed source dependencies like Nvidia CUDA, not requiring any application level code changes to accelerate models. We believe this intentional approach allows developers freedom of choice with their code and models. If you have applications dependencies on CUDA (or other 3rd party closed source artifacts) you will need to strip them out, and from that point the Neuron compiler will take the model as is and optimize it at the hardware level. Networking ---------- What’s important to know about the networking in Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Trn1 have the fastest EFA in AWS, clocked at 800Gbps they enable more collective communication as compared to other training instances, which is important if your training job spans across multiple servers. You should also expect lower latency as we streamline the communication path between the dedicated collective communication engine on Trainium, and the AWS Nitro EFA NICs. How does Trainium accelerates collective communication operations? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Trainium introduces a dedicated collective compute engine, that runs in parallel to the compute cores (aka NeuronCores). This improves convergence time of intermediate steps as the communication happens in parallel to the compute. This capability, in addition to the faster and optimized EFA, results in better scalability and faster time to train, as compared to other training instances in AWS. What does Strong/Weak Scaling mean? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To enable strong scaling, we optimized Trainium to be efficient at small batch sizes. Compared to GPUs, Trn1 maintains high efficiency even for small batch sizes. This allows you to scale-out to thousands of devices without increasing the global mini-batch size at the same rate, which in turn leads to faster end-to-end training convergence. In weak scaling setup, we show the optimal throughput with sufficiently large batch size per Trainium. The large batch size is set to leverage the high core utilization so that the overall end-to-end training will be fast. This setup also enables a large global batch size as it scales with the total number of nodes in the cluster. Usability --------- What have AWS done to improve usability of Trainium? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Stochastic rounding enables running at half precision speeds (BF16) while maintaining accuracy at near full precision. This of course helps the loss function converge faster and reduce memory footprint, but equally important, it is simplifying model development as you can write your model in FP32, and Neuron/Trainium will auto-cast the model to BF16, and execute it with SR enabled. There is no need to loss accuracy with pure BF16 runs, and more importantly no need for experimenting with mixed precision strategies to find the optimal settings. Eager debug mode provides a convenient utility to step through the code and evaluate operator correctness as part of your model creation/debug. For more details, please refer to the Neuron documentation What other AWS services work with Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Trn1 via its Neuron SDK supports Amazon ECS, EKS, ParallelCluster, Batch, and Amazon SageMaker. Customers can also choose to run in a Neuron container within their self-managed containers orchestration service (e.g., Kubernetes and Ray). What tools are available to develop models with Trn1? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When running training, evaluation or inference workloads you can use Neuron 2.x CLI tools such as neuron-ls and neuron-top to get insights into the NeuronCores and NeuronDevices performance and memory utilization, topology and host vCPU performance and memory utilization. In addition, the Neuron Plugin for TensorBoard provides a standard GUI that enables profile and debug of models. TensorBoard views include: - Model overview: provide a summary of the model and the utilization on the Host and NeuronDevice - Operators’ view: provide a breakdown of ML framework and HLO operators on both Host and NeuronDevice - Code trace view: show a timeline of the model execution at the framework and HLO operators level - Hardware trace view: show a timeline of the model execution at the level of hardware (Host, NeuronDevice, Data Transfer) - Topology view: show the NeuronDevices topology within an instance How will compile time impact my work flow? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We understand compilation is a new step with Trainium, but as long as the overall time to train and cost to train is optimized, the compilation impact on these two metrics is minimized. To further help reduce compilation time impact on usability, Neuron supports a persistent cache, where artifacts that have not changed since the last run can be reused, skipping compilation all together. For developing and experimenting with new models, you can use the eager debug mode, that compiles (and caches) op-by-op, enabling quick evaluation without compiling large models. We are also working on Neuron model analyzer (see Neuron roadmap) that will recommend optimized hyper parameters, skipping full compilation per experiment. ================================================ FILE: about-neuron/faq.rst ================================================ .. _neuron_faq: .. meta:: :description: Frequently Asked Questions (FAQ) about the AWS Neuron SDK, including topics on Neuron 2.x, training, inference, runtime, compiler, containers, and ONNX support. :date-modified: 2025-10-03 Neuron FAQ ========== This topic provides links to frequently asked questions (FAQs) about the AWS Neuron SDK, organized by Neuron component. Neuron 2.x FAQ -------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: neuron2-intro-faq :link-type: ref **Neuron 2.x Introduction FAQ** ^^^ Common questions about Neuron 2.x features and migration Training-specific FAQ --------------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: neuron-training-faq :link-type: ref **Neuron Training FAQ** ^^^ Frequently asked questions about training models on Neuron Inference-specific FAQ ---------------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: neuron-f1-faq :link-type: ref **Neuron F1 FAQ** ^^^ Questions about F1 instance inference capabilities .. grid-item-card:: :link: trouble-shooting-inf1-faq :link-type: ref **Inf1 Troubleshooting FAQ** ^^^ Common ``Inf1`` instance issues and solutions .. grid-item-card:: :link: neuronperf_faq :link-type: ref **NeuronPerf FAQ** ^^^ Performance benchmarking tool questions Neuron Runtime FAQ ------------------ .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: neuron-runtime-faq :link-type: ref **Neuron Runtime FAQ** ^^^ Runtime configuration and execution questions Neuron Compiler FAQ ------------------- .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: neuronx_compiler_faq :link-type: ref **NeuronX Compiler FAQ** ^^^ Questions about the NeuronX compiler for Trn1/Inf2 .. grid-item-card:: :link: neuron_compiler_faq :link-type: ref **Neuron Compiler FAQ** ^^^ Questions about the Neuron compiler for Inf1 Neuron DLCs FAQ --------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: container-faq :link-type: ref **Neuron Containers FAQ** ^^^ Container deployment and configuration questions Support ------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: contribute-faq :link-type: ref **Contribute FAQ** ^^^ Questions about contributing to the Neuron project ================================================ FILE: about-neuron/index.rst ================================================ .. _about-neuron: About the AWS Neuron SDK ======================== AWS Neuron is a software development kit (SDK) enabling high-performance deep learning acceleration using AWS Inferentia and Trainium, AWS's custom designed machine learning accelerators. It enables you to develop, profile, and deploy high-performance machine learning workloads on AWS Inferentia and Trainium instances. The AWS Neuron SDK includes: * **Neuron Compiler** - Compiles high-level, framework-based models for optimal performance on Neuron devices * **Neuron Kernel Interface (NKI)** - Provides direct compiler access to Neuron device capabilities * **Neuron Runtime** - Executes compiled models on Neuron devices * **ML Framework integration** - Deep support for PyTorch and JAX * **Training and inference libraries** - Distributable training and inference libraries for large-scale models * **Deployment support** - Integration with AWS services like SageMaker, EC2, EKS, and ECS * **Developer tools** - Profiling, monitoring, and debugging utilities For a full list of AWS Neuron features, see :ref:`what-is-neuron`. .. admonition:: Join our Beta program Get early access to new Neuron features and tools! `Fill out this form and apply to join our Beta program `__. What is "NeuronX"? ------------------ "NeuronX" refers to the next-generation AWS Neuron SDK, which provides enhanced capabilities for both inference and training on AWS Inferentia and Trainium instances. NeuronX includes: * Support for the latest versions of PyTorch and JAX * Advanced compiler optimizations for improved performance * Enhanced distributed training libraries for large-scale models * Improved profiling and debugging tools * Ongoing feature development and support for new instance types Catch up on the latest Neuron news ----------------------------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: /about-neuron/whats-new :link-type: doc :class-card: sd-border-1 **What's New in Neuron** ^^^ Read about the latest releases and features of the Neuron SDK Learn about AWS Neuron ---------------------- .. grid:: 1 :gutter: 2 .. grid-item-card:: :link: /about-neuron/what-is-neuron :link-type: doc :class-card: sd-border-1 **What is AWS Neuron?** ^^^ Short overview of the AWS Neuron SDK and its components .. grid:: 1 1 2 2 :gutter: 2 .. grid-item-card:: :link: /about-neuron/arch/index :link-type: doc :class-card: sd-border-1 **Neuron architecture** ^^^ Understand the Neuron hardware and software architecture .. grid-item-card:: :link: /about-neuron/arch/neuron-features/index :link-type: doc :class-card: sd-border-1 **Neuron features** ^^^ Overviews of model development features provided by Neuron .. grid-item-card:: :link: /frameworks/index :link-type: doc :class-card: sd-border-1 **Supported ML frameworks** ^^^ Neuron support for popular ML frameworks including PyTorch and JAX .. grid-item-card:: :link: /libraries/index :link-type: doc :class-card: sd-border-1 **NeuronX distributed (NxD) libraries** ^^^ NeuronX distributed libraries for training and inference .. grid-item-card:: :link: /nki/index :link-type: doc :class-card: sd-border-1 **Neuron Kernel Interface (NKI)** ^^^ NKI is a low-level interface for custom, bare-metal kernel development .. grid-item-card:: :link: /compiler/index :link-type: doc :class-card: sd-border-1 **Neuron Compiler** ^^^ The Neuron compiler optimizes models for Neuron hardware .. grid-item-card:: :link: /neuron-runtime/index :link-type: doc :class-card: sd-border-1 **Neuron Runtime** ^^^ Runtime for executing compiled models on Neuron devices .. grid-item-card:: :link: /tools/index :link-type: doc :class-card: sd-border-1 **Neuron developer tools** ^^^ Tools for profiling, debugging, and monitoring Neuron applications .. grid-item-card:: :link: /dlami/index :link-type: doc :class-card: sd-border-1 **Neuron AWS Neuron Deep Learning AMIs** ^^^ Deploy the Neuron SDK on EC2 instances with pre-installed Amazon Machine Images (AMIs) .. grid-item-card:: :link: /containers/index :link-type: doc :class-card: sd-border-1 **Neuron AWS Neuron Deep Learning Containers** ^^^ Deploy the Neuron SDK using pre-built Docker deep learning containers (DLCs) Resources --------- * :ref:`Setup Guide ` * :ref:`Release Notes ` * :ref:`Neuron FAQ ` * :doc:`Older Neuron FAQs ` Support ------- * :doc:`Neuron Open Source GitHub Repos ` * :ref:`AWS Neuron SDK maintenance policy ` .. _contact-us: Contact us ---------- For support, submit a request with AWS Neuron `Github issues `_ or visit the `Neuron AWS forums `_ for an answer. If you want to request a feature or report a critical issue, you can contact us directly at ``aws-neuron-support@amazon.com``. .. toctree:: :maxdepth: 1 :hidden: App Notes Ask Amazon AI helper tools Benchmarks Beta Participation Model Samples Neuron FAQ Neuron Features Open Source SDK Maintenance Policy Security Term Glossary Troubleshooting What is AWS Neuron? Older Neuron FAQS ================================================ FILE: about-neuron/models/index.rst ================================================ .. _model_samples_tutorials: Model samples and tutorials =========================== .. toctree:: :maxdepth: 1 :hidden: Training on Trn1 Inference on Inf2/Trn1/Trn2 Inference on Inf1 This section gives you the consolidated list of code samples and tutorials published by AWS Neuron across documentation and various GitHub repositories. .. card:: Training on Trn1 :link: model_samples_training_trn1 :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Inference on Inf2, Trn1 and Trn2 :link: model_samples_inference_inf2_trn1 :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Inference on Inf1 :link: model_samples_inference_inf1 :link-type: ref :class-body: sphinx-design-class-title-small For links to individual GitHub sample repositories, see :ref:`neuron-github-samples` ================================================ FILE: about-neuron/models/inference-inf1-samples.rst ================================================ .. _model_samples_inference_inf1: Inference Samples/Tutorials (Inf1) ================================== .. important:: The samples linked on this page have been archived and are provided for historical reference only. They are not tested with recent versions of the Neuron SDK. .. contents:: Table of contents :local: :depth: 1 .. _encoder_model_samples_inference_inf1: Encoders -------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - bert-base-cased-finetuned-mrpc - torch-neuron - * HuggingFace pretrained BERT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * `BertBaseCased Inference on Inf1 instances `_ * Bert TorchServe tutorial :ref:`[html] ` * Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - bert-base-uncased - torch-neuron - * NeuronCore Pipeline tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - bert-large-uncased - torch-neuron - * `BertLargeUncased Inference on Inf1 instances `_ * - roberta-base - torch-neuron - * `Roberta-Base inference on Inf1 instances `_ * - distilbert-base-uncased-finetuned-sst-2-english - tensorflow-neuron - * Tensorflow 2.x - HuggingFace Pipelines distilBERT with Tensorflow2 Neuron :ref:`[html] ` :github:`[notebook] ` * - gluon bert - mxnet-neuron - * MXNet 1.8: Using data parallel mode tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` .. _vision_transformer_model_samples_inference_inf1: Vision Transformers ------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - ssd - torch-neuron - * `Inference of SSD model on inf1 instances `_ * - TrOCR - torch-neuron - * `TrOCR inference on Inf1 instances `_ * - vgg - torch-neuron - * `VGG inference on Inf1 instances `_ * - google/vit-base-patch16-224 - torch-neuron - * `ViT model inference on Inf1 `_ .. _cnn_model_samples_inference_inf1: Convolutional Neural Networks(CNN) ---------------------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - EfficientNet - torch-neuron - * `EfficientNet model inference on Inf1 instances `_ * - GFL (MMDetection) - torch-neuron - * `GFL (MMDetection) inference on Inf1 instances `_ * - HRNet - torch-neuron - * `HRNET - Pose Estimation `_ * - MarianMT - torch-neuron - * HuggingFace MarianMT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * `Inference of Pre-trained MarianMT model on Inf1 `_ * - Detectron2 R-CNN - torch-neuron - * `R-CNN inference on Inf1 `_ * - resnet - torch-neuron - * `Inference of Pre-trained Resnet model (18,34,50,101,152) on Inf1 `_ * ResNet-50 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - resnet - tensorflow-neuron - * Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] ` * - resnet - mxnet-neuron - * ResNet-50 tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` * Getting started with Gluon tutorial :ref:`[html] ` :github:`[notebook] ` * NeuronCore Groups tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` * - Resnext - torch-neuron - * `Inference of Resnext model on Inf1 `_ * - Yolov4 - torch-neuron - * PyTorch YOLOv4 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - Yolov5 - torch-neuron - * `Inference of Yolov5 on Inf1 `_ * - Yolov6 - torch-neuron - * `Inference of Yolov6 on Inf1 instances `_ * - Yolov7 - torch-neuron - * `Inference of Yolov7 model on Inf1 `_ * - Yolof - torch-neuron - * `Inference of Yolof model on Inf1 `_ * - fairseq - torch-neuron - * `Inference of fairseq model on Inf1 `_ * - unet - tensorflow-neuron - * `Unet - Tensorflow 2.x tutorial `_ .. _vision_model_samples_inference_inf1: Vision ------ .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - craft-pytorch - torch-neuron - * `CRAFT model inference on Inf1 `_ ================================================ FILE: about-neuron/models/inference-inf2-trn1-samples.rst ================================================ .. _model_samples_inference_inf2_trn1: Inference Samples/Tutorials (Inf2/Trn1/Trn2) ============================================ .. important:: Some samples linked on this page have been archived and are provided for historical reference only. They are not tested with recent versions of the Neuron SDK. For the latest inference tutorials, refer to :ref:`NxD Inference Tutorials `. .. contents:: Table of contents :local: :depth: 1 .. _encoder_model_samples_inference_inf2_trn1: Encoders -------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - bert-base-cased-finetuned-mrpc - torch-neuronx - * :ref:`BERT TorchServe tutorial ` * HuggingFace pretrained BERT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * `LibTorch C++ Tutorial for HuggingFace Pretrained BERT `_ * `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker `_ * - bert-base-cased-finetuned-mrpc - neuronx-distributed - * :ref:`tp_inference_tutorial` * - bert-base-uncased - torch-neuronx - * `HuggingFace Pretrained BERT Inference on Trn1 `_ * - distilbert-base-uncased - torch-neuronx - * `HuggingFace Pretrained DistilBERT Inference on Trn1 `_ * - roberta-base - tensorflow-neuronx - * HuggingFace Roberta-Base :ref:`[html]` :github:`[notebook] ` * - roberta-large - torch-neuronx - * `HuggingFace Pretrained RoBERTa Inference on Trn1 `_ .. _decoder_model_samples_inference_inf2_trn1: Decoders -------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - gpt2 - torch-neuronx - * `HuggingFace Pretrained GPT2 Feature Extraction on Trn1 `_ * - meta-llama/Llama-3.3-70B - neuronx-distributed-inference - * :ref:`nxdi-trn2-llama3.3-70b-tutorial` * :ref:`/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-dp-tutorial.ipynb` * :ref:`nxdi-sd-inference-tutorial` * - meta-llama/Llama-3.1-8b - transformers-neuronx - * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 32k sequence length `_ * `Run Hugging Face Llama 3.1 8B autoregressive sampling on Inf2 & Trn1 with 128k sequence length `_ * `Run meta-llama/Meta-Llama-3.1-8B autoregressive sampling on Inf2 & Trn1 `_ * - meta-llama/Llama-3.1-70b - transformers-neuronx - * `Run Hugging Face Llama 3.1 70B autoregressive sampling on Trn1 with 64k sequence length `_ * `Run Hugging Face meta-llama/Meta-Llama-3.1-70B autoregressive sampling on Inf2 & Trn1 `_ * - meta-llama/Llama-3.1-70b-Instruct - transformers-neuronx - * `Run Hugging Face Llama-3.1-70B-Instruct + Llama-3.2-1B-Instruct Speculative Decoding on Trn1 with transformers-neuronx and vLLM `_ * `Run Hugging Face Llama-3.1-70B-Instruct EAGLE Speculative Decoding on Trn1 with transformers-neuronx and vLLM `_ * - meta-llama/Llama-3.1-405b - neuronx-distributed-inference - * :ref:`Tutorial for deploying Llama-3.1-405B on Trn2 ` * :ref:`nxdi-trn2-llama3.1-405b-speculative-tutorial` * - meta-llama/Llama-3.1-405b - transformers-neuronx - * `Run Hugging Face Llama 3.1 405B autoregressive sampling on Trn1/Trn1n with 16k sequence length `_ * - meta-llama/Llama-3-8b - transformers-neuronx - * `Run Hugging Face meta-llama/Llama-3-8b autoregressive sampling on Inf2 & Trn1 `_ * - meta-llama/Llama-3-70b - transformers-neuronx - * `Run Hugging Face meta-llama/Llama-3-70b autoregressive sampling on Inf2 & Trn1 `_ * - meta-llama/Llama-2-13b - transformers-neuronx - * `Run Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 `_ * - meta-llama/Llama-2-70b - transformers-neuronx - * `Run Hugging Face meta-llama/Llama-2-70b autoregressive sampling on Inf2 & Trn1 `_ * `Run speculative sampling on Meta Llama models [Beta] `_ * - meta-llama/Llama-3.2-1B-Instruct - neuronx-distributed - * `Run meta-llama/Llama-3.2-1B-Instruct on Inf2 and Trn1 `_ * - meta-llama/codellama-13b - neuronx-distributed - * `Run meta-llama/codellama-13b-16k-sampling `_ * - mistralai/Mistral-7B-Instruct-v0.1 - transformers-neuronx - * :ref:`Run Mistral-7B-Instruct-v0.1 autoregressive sampling on Inf2 & Trn1 ` * - mistralai/Mistral-7B-Instruct-v0.2 - transformers-neuronx - * `Run Hugging Face mistralai/Mistral-7B-Instruct-v0.2 autoregressive sampling on Inf2 & Trn1 [Beta] `_ * - Mixtral-8x7B-v0.1 - transformers-neuronx - * `Run Hugging Face mistralai/Mixtral-8x7B-v0.1 autoregressive sampling on Inf2 & Trn1 `_ * - Mixtral-8x7B - neuronx-distributed - * `Mixtral inference with NeuronX Distributed on Inf2 & Trn1 `_ * - DBRX - neuronx-distributed - * `DBRX inference with NeuronX Distributed on Inf2 & Trn1 `_ * - codellama/CodeLlama-13b-hf - transformers-neuronx - * `Run Hugging Face codellama/CodeLlama-13b-hf autoregressive sampling on Inf2 & Trn1 `_ .. _encoder_decoder_model_samples_inference_inf2_trn1: Encoder-Decoders ---------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - t5-large - * torch-neuronx * optimum-neuron - * T5 inference tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - t5-3b - neuronx-distributed - * T5 inference tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - google/flan-t5-xl - neuronx-distributed - * flan-t5-xl inference tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. _vision_transformer_model_samples_inference_inf2_trn1: Vision Transformers ------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - google/vit-base-patch16-224 - torch-neuronx - * `HuggingFace Pretrained ViT Inference on Trn1 `_ * - clip-vit-base-patch32 - torch-neuronx - * `HuggingFace Pretrained CLIP Base Inference on Inf2 `_ * - clip-vit-large-patch14 - torch-neuronx - * `HuggingFace Pretrained CLIP Large Inference on Inf2 `_ .. _cnn_model_samples_inference_inf2_trn1: Convolutional Neural Networks(CNN) ---------------------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - resnet50 - torch-neuronx - * `Torchvision Pretrained ResNet50 Inference on Trn1 / Inf2 `_ * Torchvision ResNet50 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * - resnet50 - tensorflow-neuronx - * :ref:`tensorflow-servingx-neuronrt-visible-cores` * - unet - torch-neuronx - * `Pretrained UNet Inference on Trn1 / Inf2 `_ * - vgg - torch-neuronx - * `Torchvision Pretrained VGG Inference on Trn1 / Inf2 `_ .. _sd_model_samples_inference_inf2_trn1: Stable Diffusion ---------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - stable-diffusion-v1-5 - torch-neuronx - * `HuggingFace Stable Diffusion 1.5 (512x512) Inference on Trn1 / Inf2 `_ * - stable-diffusion-2-1-base - torch-neuronx - * `HuggingFace Stable Diffusion 2.1 (512x512) Inference on Trn1 / Inf2 `_ * - stable-diffusion-2-1 - torch-neuronx - * `HuggingFace Stable Diffusion 2.1 (768x768) Inference on Trn1 / Inf2 `_ * `Deploy & Run Stable Diffusion on SageMaker and Inferentia2 `_ * - stable-diffusion-xl-base-1.0 - torch-neuronx - * `HuggingFace Stable Diffusion XL 1.0 (1024x1024) Inference on Inf2 `_ * `HuggingFace Stable Diffusion XL 1.0 Base and Refiner (1024x1024) Inference on Inf2 `_ * - stable-diffusion-2-inpainting - torch-neuronx - * `stable-diffusion-2-inpainting model Inference on Trn1 / Inf2 `_ .. _diffusion_transformers_samples_inference_inf2_trn1: Diffusion Transformers ---------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - pixart-alpha - torch-neuronx - * `HuggingFace PixArt Alpha (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 `_ * - pixart-sigma - torch-neuronx - * `HuggingFace PixArt Sigma (256x256, 512x512 square resolution) Inference on Trn1 / Inf2 `_ .. _audio_model_samples_inference_inf2_trn1: Audio ----- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - wav2vec2-conformer - torch-neuronx - * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Rotary Position Embeddings Inference on Inf2 `_ * `Run HuggingFace Pretrained Wav2Vec2-Conformer with Relative Position Embeddings Inference on Inf2 & Trn1 `_ .. _multi_modal_model_samples_inference_inf2_trn1: Multi Modal ----------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - multimodal-perceiver - torch-neuronx - * `HuggingFace Multimodal Perceiver Inference on Trn1 / Inf2 `_ * - language-perceiver - torch-neuronx - * `HF Pretrained Perceiver Language Inference on Trn1 / Inf2 `_ * - vision-perceiver-conv - torch-neuronx - * `HF Pretrained Perceiver Image Classification Inference on Trn1 / Inf2 `_ ================================================ FILE: about-neuron/models/training-trn1-samples.rst ================================================ .. _model_samples_training_trn1: Training Samples/Tutorials (Trn1/Trn1n) ======================================= .. contents:: Table of contents :local: :depth: 1 .. _encoder_model_samples_training_trn1: Encoders -------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - bert-base-cased - torch-neuronx - * `Fine-tune a "bert-base-cased" PyTorch model for Text Classification `_ * `How to fine-tune a "bert base cased" PyTorch model with AWS Trainium (Trn1 instances) for Sentiment Analysis `_ * - bert-base-uncased - torch-neuronx - * `Fine-tune a "bert-base-uncased" PyTorch model `_ * `Fine tuning BERT base model from HuggingFace on Amazon SageMaker `_ * - bert-large-cased - torch-neuronx - * `Fine-tune a "bert-large-cased" PyTorch model `_ * - bert-large-uncased - torch-neuronx - * :ref:`hf-bert-pretraining-tutorial` * `Launch Bert Large Phase 1 pretraining job on Parallel Cluster `_ * `Launch a Multi-Node PyTorch Neuron Training Job on Trainium Using TorchX and EKS `_ * :ref:`torch-hf-bert-finetune` * `Fine-tune a "bert-large-uncased" PyTorch model `_ * - roberta-base - tensorflow-neuronx - * `Fine-tune a "roberta-base" PyTorch model `_ * - roberta-large - torch-neuronx - * `Fine-tune a "roberta-large" PyTorch model `_ * - xlm-roberta-base - torch-neuronx - * `Fine-tune a "xlm-roberta-base" PyTorch model `_ * - alberta-base-v2 - torch-neuronx - * `Fine-tune a "alberta-base-v2" PyTorch model `_ * - distilbert-base-uncased - torch-neuronx - * `Fine-tune a "distilbert-base-uncased" PyTorch model `_ * - camembert-base - torch-neuronx - * `Fine-tune a "camembert-base PyTorch model `_ * - cl-tohoku/bert-base-japanese-whole-word-masking - torch-neuronx - * `Fine-tuning & Deployment Hugging Face BERT Japanese model `_ .. _decoder_model_samples_training_trn1: Decoders -------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - gpt-2 - torch-neuronx - * `How to run training jobs for "gpt2" PyTorch model with AWS Trainium `_ * :ref:`zero1-gpt2-pretraining-tutorial` * - gpt-3 - neuronx-nemo-megatron - * `Launch a GPT-3 23B pretraining job using neuronx-nemo-megatron `_ * `Launch a GPT-3 46B pretraining job using neuronx-nemo-megatron `_ * `Launch a GPT-3 175B pretraining job using neuronx-nemo-megatron `_ * - GPT-NEOX-20B - neuronx-distributed - * :ref:`gpt_neox_20b_tp_zero1_tutorial` * `Training GPT-NEOX 20B model using neuronx-distributed `_ * `Pre-train GPT Neox 20b on Wikicorpus dataset using Neuronx Distributed library `_ * - GPT-NEOX-6.9B - neuronx-distributed - * :ref:`gpt_neox_tp_zero1_tutorial` * `Training GPT-NEOX 6.9B model using neuronx-distributed `_ * `Pre-train GPT Neox 6.9b on Wikicorpus dataset using Neuronx Distributed library `_ * - meta-llama/Llama-3.1-70b - neuronx-distributed - * :ref:`llama2_tp_pp_tutorial` * - meta-llama/Llama-3.1-8b - neuronx-distributed - * :ref:`llama2_7b_tp_zero1_tutorial` * - meta-llama/Llama-3-70b - neuronx-distributed - * :ref:`llama2_tp_pp_tutorial` * - meta-llama/Llama-3-8b - nxd-training - * :ref:`hf_llama3_8B_pretraining` * :ref:`hf_llama3_8B_SFT` * - meta-llama/Llama-3-8b - neuronx-distributed - * :ref:`Training Llama3 8B Model with Tensor Parallelism and ZeRO-1 Optimizer ` * :ref:`Tutorial for Fine-tuning Llama3 8B with tensor parallelism and LoRA using Neuron PyTorch-Lightning with NeuronX Distributed ` * - meta-llama/Llama-2-7b - neuronx-distributed - * :ref:`llama2_7b_tp_zero1_tutorial` * `Training Llama2 7B Model with AWS Batch and Trainium `_ * :ref:`llama2_7b_tp_zero1_ptl_finetune_tutorial` * `Pre-train Llama2-7B on Wikicorpus dataset using Neuronx Distributed library `_ * - meta-llama/Llama-2-13b - neuronx-distributed - * :ref:`llama2_tp_pp_tutorial` * - meta-llama/Llama-2-70b - neuronx-distributed - * :ref:`llama2_tp_pp_tutorial` * - codegen25-7b-mono - neuronx-distributed - * :ref:`codegen25_7b_tp_zero1_tutorial` * - meta-llama/Llama-2 - neuronx-nemo-megatron - * `Launch a Llama-2-7B pretraining job using neuronx-nemo-megatron `_ * `Launch a Llama-2-13B pretraining job using neuronx-nemo-megatron `_ * `Launch a Llama-2-70B pretraining job using neuronx-nemo-megatron `_ * - Mistral-7B - neuronx-nemo-megatron - * `Training Mistral-7B `_ .. _encoder_decoder_model_samples_training_trn1: Encoder-Decoders ---------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - t5-small - * torch-neuronx * optimum-neuron - * :ref:`torch-hf-t5-finetune` * - facebook/bart-large - * torch-neuronx - * `How to fine-tune a "Bart-Large" PyTorch model with AWS Trainium (trn1 instances) `_ .. _vision_transformer_model_samples_training_trn1: Vision Transformers ------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - google/vit-base-patch16-224-in21k - torch-neuronx - * `Fine-tune a pretrained HuggingFace vision transformer PyTorch model `_ * - openai/clip-vit-base-patch32 - torch-neuronx - * `Fine-tune a pretrained HuggingFace CLIP-base PyTorch model with AWS Trainium `_ * - openai/clip-vit-large-patch14 - torch-neuronx - * `Fine-tune a pretrained HuggingFace CLIP-large PyTorch model with AWS Trainium `_ .. _sd_model_samples_training_trn1: Stable Diffusion ---------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - stabilityai/stable-diffusion-2-1-base - torch-neuronx - * [Beta] `Train stabilityai/stable-diffusion-2-1-base with AWS Trainium (trn1 instances) `_ * - runwayml/stable-diffusion-v1-5 - torch-neuronx - * [Beta] `Train runwayml/stable-diffusion-v1-5 with AWS Trainium (trn1 instances) `_ .. _multi_modal_model_samples_training_trn1: Multi Modal ----------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - language-perceiver - torch-neuronx - * `How to fine-tune a "language perceiver" PyTorch model with AWS Trainium (trn1 instances) `_ * - vision-perceiver-conv - torch-neuronx - * `How to fine-tune a pretrained HuggingFace Vision Perceiver Conv `_ .. _cnn_model_samples_training_trn1: Convolutional Neural Networks(CNN) ---------------------------------- .. list-table:: :widths: 20 15 45 :header-rows: 1 :align: left :class: table-smaller-font-size * - Model - Frameworks/Libraries - Samples and Tutorials * - resnet50 - torch-neuronx - `How to fine-tune a pretrained ResNet50 Pytorch model with AWS Trainium (trn1 instances) using NeuronSDK `_ ================================================ FILE: about-neuron/monitoring-tools.rst ================================================ .. _monitoring_tools: Monitoring Tools ================= .. toctree:: :maxdepth: 1 Neuron-Monitor User Guide Neuron-Top User Guide Neuron-LS User Guide Neuron-Sysfs User Guide NCCOM-TEST User Guide What's New ================================================ FILE: about-neuron/news-and-blogs/CONTRIBUTING.md ================================================ # Contributing to AWS Neuron News and Blogs Thank you for your interest in sharing content about AWS Neuron, Trainium, and Inferentia! This page collects external articles, blog posts, tutorials, and news to help the community discover valuable content. ## How to Add Your Article ### Quick Steps 1. **Fork the repository** on GitHub 2. **Edit the data file**: `about-neuron/news-and-blogs/news-and-blogs.yaml` 3. **Add your article** following the format below 4. **Submit a pull request** with your changes ### Article Entry Format Add your article to the appropriate section in `news-and-blogs.yaml`: ```yaml - title: "Your Article Title" url: "https://example.com/your-article" description: "A brief 1-2 sentence description of your article content." author: "Your Name or Organization" author_url: "https://your-website.com" # Optional for featured articles date: "YYYY-MM-DD" # Publication date category: "blog" # Options: blog, news, tutorial, case-study, benchmark locale: "en-US" # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR) featured: false # Set to true only if approved by AWS Neuron team icon: "📝" # Optional emoji icon for featured articles ``` ### Sections - **`featured_articles`**: Highlighted content (requires AWS Neuron team approval) - **`all_articles`**: All community and official content ### Categories Choose the most appropriate category for your content: - **`blog`**: Technical blog posts and articles - **`news`**: News announcements and press releases - **`tutorial`**: Step-by-step guides and how-tos - **`case-study`**: Customer success stories and use cases - **`benchmark`**: Performance benchmarks and comparisons ### Locale Codes Specify the language and region of your article using standard locale codes: **Common Locales:** - `en-US` - English (United States) 🇺🇸 - `en-GB` - English (United Kingdom) 🇬🇧 - `ja-JP` - Japanese 🇯🇵 - `zh-CN` - Chinese (Simplified) 🇨🇳 - `zh-TW` - Chinese (Traditional) 🇹🇼 - `ko-KR` - Korean 🇰🇷 - `de-DE` - German 🇩🇪 - `fr-FR` - French 🇫🇷 - `es-ES` - Spanish (Spain) 🇪🇸 - `es-MX` - Spanish (Mexico) 🇲🇽 - `pt-BR` - Portuguese (Brazil) 🇧🇷 - `it-IT` - Italian 🇮🇹 - `nl-NL` - Dutch 🇳🇱 - `ru-RU` - Russian 🇷🇺 - `ar-SA` - Arabic 🇸🇦 - `hi-IN` - Hindi 🇮🇳 A flag emoji will be automatically displayed next to your article based on the locale. If your locale isn't in the list, a 🌐 globe icon will be shown. ### Example Entry ```yaml all_articles: - title: "Building Large Language Models on AWS Trainium" url: "https://example.com/llm-trainium-guide" description: "A comprehensive guide to training and deploying LLMs using AWS Trainium instances with practical code examples." author: "Jane Developer" date: "2026-01-15" category: "tutorial" locale: "en-US" featured: false ``` ### Guidelines 1. **Content must be relevant** to AWS Neuron, Trainium, or Inferentia 2. **Provide accurate information** - ensure URLs work and descriptions are clear 3. **Use proper formatting** - follow YAML syntax exactly 4. **One article per pull request** - makes review easier 5. **Include context** in your PR description about why this content is valuable ### Featured Articles To request your article be featured: 1. Add it to `all_articles` first with `featured: false` 2. In your pull request, explain why it should be featured 3. AWS Neuron team will review and may promote it to `featured_articles` Featured articles should be: - High-quality, in-depth content - Particularly valuable to the community - Recent (typically within the last 6 months) ### Review Process 1. Submit your pull request 2. AWS Neuron team will review within 5-7 business days 3. May request changes or clarifications 4. Once approved, your article will appear on the next documentation build ### Questions? - Open an issue in the repository - Contact your AWS Neuron support representative - Email: aws-neuron-support@amazon.com ## Content Guidelines ### What to Include ✅ Technical tutorials and guides ✅ Performance benchmarks and analysis ✅ Customer success stories ✅ Integration guides with other tools ✅ Best practices and optimization tips ✅ Conference talks and presentations ✅ Research papers using Neuron/Trainium/Inferentia ### What Not to Include ❌ Marketing content without technical substance ❌ Broken or paywalled links ❌ Content unrelated to AWS Neuron ecosystem ❌ Duplicate submissions ❌ Self-promotional content without value to community ## Technical Details This page uses: - **Sphinx** with `sphinxcontrib.datatemplates` extension - **YAML** for data storage - **Jinja2** templates for rendering - **sphinx-design** for grid layouts The system is fully static - no backend required. All content is rendered at build time. ## License By contributing, you agree that your contributions will be licensed under the same license as this project. See the repository LICENSE files for details. ================================================ FILE: about-neuron/news-and-blogs/JIRA-INTEGRATION-DESIGN.md ================================================ # Jira Integration Design for News & Blogs ## Overview This document describes a design for populating the `news-and-blogs.yaml` file from Jira tickets, allowing contributors to submit article links via Jira instead of direct pull requests. ## Design Goals 1. **Simple for contributors**: Submit a Jira ticket with article metadata 2. **Automated**: Minimal manual intervention to add articles to YAML 3. **Quality control**: Review process before articles appear on the site 4. **Compatible**: Works with existing Sphinx build process 5. **No backend required**: Leverages existing CI/CD infrastructure ## Architecture ### Option 1: GitHub Actions + Jira API (Recommended) ``` Jira Ticket Created → GitHub Action Triggered → Parse Ticket → Update YAML → Create PR ``` **Components:** 1. **Jira Ticket Template**: Custom issue type "News Article Submission" 2. **GitHub Action**: Runs on schedule (e.g., hourly) or webhook 3. **Python Script**: Fetches approved tickets, generates YAML entries 4. **Automated PR**: Creates pull request with new articles **Workflow:** ```yaml # .github/workflows/sync-jira-articles.yml name: Sync Jira Articles to YAML on: schedule: - cron: '0 */6 * * *' # Every 6 hours workflow_dispatch: # Manual trigger jobs: sync-articles: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.9' - name: Install dependencies run: pip install jira pyyaml - name: Fetch and process Jira tickets env: JIRA_URL: ${{ secrets.JIRA_URL }} JIRA_USER: ${{ secrets.JIRA_USER }} JIRA_TOKEN: ${{ secrets.JIRA_TOKEN }} run: python scripts/sync_jira_articles.py - name: Create Pull Request uses: peter-evans/create-pull-request@v5 with: commit-message: 'Add articles from Jira' title: 'Add news articles from Jira submissions' body: 'Automated PR from Jira article submissions' branch: jira-articles-sync ``` ### Option 2: Jira Automation + Webhook ``` Jira Ticket Approved → Webhook to GitHub → GitHub Action → Update YAML → Create PR ``` **Advantages:** - Real-time updates when tickets are approved - No polling required - More efficient **Setup:** 1. Configure Jira Automation rule 2. Trigger on status change to "Approved" 3. Send webhook to GitHub repository dispatch endpoint ### Option 3: Manual Script (Simplest) ``` Developer runs script → Fetches approved tickets → Updates YAML → Commits changes ``` **Use case:** Lower volume, manual review preferred ## Jira Ticket Structure ### Custom Fields Required ``` Issue Type: News Article Submission Fields: - Article Title (text, required) - Article URL (URL, required) - Description (text area, required) - Author Name (text, required) - Author URL (URL, optional) - Publication Date (date, required) - Category (dropdown: blog|news|tutorial|case-study|benchmark) - Locale (dropdown: en-US|ja-JP|zh-CN|ko-KR|de-DE|fr-FR|es-ES|pt-BR) - Keywords (labels or multi-select) - Featured (checkbox) - Icon (text, optional, for featured articles) Status Workflow: - Submitted → Under Review → Approved → Published → Rejected ``` ### Example Jira Ticket ``` Title: Add Karakuri AWS Trainium Tutorial Fields: - Article Title: AWS Trainium: 50 Exercises - Article URL: https://zenn.dev/karakuri_blog/articles/5ccedeee1beb08 - Description: Learn how to build LLMs for Trainium accelerators... - Author Name: Karakuri - Author URL: https://about.karakuri.ai/ - Publication Date: 2026-02-19 - Category: tutorial - Locale: ja-JP - Keywords: trainium, llm, training, tutorial - Featured: Yes - Icon: 🚀 ``` ## Implementation Script ### `scripts/sync_jira_articles.py` ```python #!/usr/bin/env python3 """ Sync approved Jira article submissions to news-and-blogs.yaml """ import os import yaml from jira import JIRA from datetime import datetime # Configuration JIRA_URL = os.environ.get('JIRA_URL') JIRA_USER = os.environ.get('JIRA_USER') JIRA_TOKEN = os.environ.get('JIRA_TOKEN') YAML_FILE = 'about-neuron/news-and-blogs/news-and-blogs.yaml' # JQL to find approved, unpublished articles JQL_QUERY = 'project = NEURON AND issuetype = "News Article Submission" AND status = "Approved" AND labels != "published"' def connect_jira(): """Connect to Jira instance""" return JIRA(server=JIRA_URL, basic_auth=(JIRA_USER, JIRA_TOKEN)) def fetch_approved_articles(jira): """Fetch approved article submissions from Jira""" issues = jira.search_issues(JQL_QUERY, maxResults=100) articles = [] for issue in issues: article = { 'title': issue.fields.customfield_10001, # Article Title 'url': issue.fields.customfield_10002, # Article URL 'description': issue.fields.customfield_10003, # Description 'author': issue.fields.customfield_10004, # Author Name 'date': issue.fields.customfield_10006, # Publication Date 'category': issue.fields.customfield_10007.value, # Category 'locale': issue.fields.customfield_10008.value, # Locale 'keywords': [label.name for label in issue.fields.labels if label.name != 'published'], 'featured': bool(issue.fields.customfield_10009), # Featured checkbox } # Optional fields if issue.fields.customfield_10005: # Author URL article['author_url'] = issue.fields.customfield_10005 if issue.fields.customfield_10010: # Icon (for featured) article['icon'] = issue.fields.customfield_10010 articles.append({ 'article': article, 'issue_key': issue.key }) return articles def load_yaml(): """Load existing YAML file""" with open(YAML_FILE, 'r', encoding='utf-8') as f: return yaml.safe_load(f) def article_exists(data, url): """Check if article URL already exists in YAML""" all_urls = [a['url'] for a in data.get('featured_articles', [])] all_urls.extend([a['url'] for a in data.get('all_articles', [])]) return url in all_urls def add_articles_to_yaml(data, new_articles): """Add new articles to appropriate sections""" added_keys = [] for item in new_articles: article = item['article'] # Skip if already exists if article_exists(data, article['url']): print(f"Skipping duplicate: {article['title']}") continue # Add to appropriate section if article.get('featured', False): data['featured_articles'].append(article) else: data['all_articles'].append(article) added_keys.append(item['issue_key']) print(f"Added: {article['title']} ({item['issue_key']})") return added_keys def save_yaml(data): """Save updated YAML file""" with open(YAML_FILE, 'w', encoding='utf-8') as f: yaml.dump(data, f, allow_unicode=True, sort_keys=False, default_flow_style=False) def mark_as_published(jira, issue_keys): """Add 'published' label to Jira tickets and transition to Published status""" for key in issue_keys: issue = jira.issue(key) # Add published label labels = issue.fields.labels if 'published' not in labels: labels.append('published') issue.update(fields={'labels': labels}) # Transition to Published status (adjust transition ID as needed) try: jira.transition_issue(issue, 'Published') except Exception as e: print(f"Could not transition {key}: {e}") def main(): print("Connecting to Jira...") jira = connect_jira() print("Fetching approved articles...") new_articles = fetch_approved_articles(jira) if not new_articles: print("No new articles to add.") return print(f"Found {len(new_articles)} approved articles") print("Loading existing YAML...") data = load_yaml() print("Adding articles to YAML...") added_keys = add_articles_to_yaml(data, new_articles) if added_keys: print("Saving YAML...") save_yaml(data) print("Marking Jira tickets as published...") mark_as_published(jira, added_keys) print(f"Successfully added {len(added_keys)} articles!") else: print("No new articles added (all were duplicates)") if __name__ == '__main__': main() ``` ## Setup Instructions ### 1. Configure Jira 1. Create custom issue type "News Article Submission" 2. Add custom fields (see structure above) 3. Configure workflow: Submitted → Under Review → Approved → Published 4. Create Jira API token for automation user ### 2. Configure GitHub Secrets Add these secrets to your GitHub repository: ``` JIRA_URL: https://your-company.atlassian.net JIRA_USER: automation@your-company.com JIRA_TOKEN: ``` ### 3. Add GitHub Action Create `.github/workflows/sync-jira-articles.yml` with the workflow above. ### 4. Install Dependencies Add to `requirements.txt`: ``` jira==3.5.0 PyYAML==6.0 ``` ### 5. Test 1. Create a test Jira ticket 2. Approve it 3. Run workflow manually: Actions → Sync Jira Articles → Run workflow 4. Verify PR is created with new article ## Alternative: Simpler Webhook Approach If you want something lighter without Jira API polling: ### Jira Automation Rule ``` Trigger: Issue transitioned to "Approved" Condition: Issue type = "News Article Submission" Action: Send web request URL: https://api.github.com/repos/aws-neuron/aws-neuron-sdk/dispatches Method: POST Headers: Authorization: Bearer ${GITHUB_TOKEN} Accept: application/vnd.github.v3+json Body: { "event_type": "jira-article-approved", "client_payload": { "issue_key": "{{issue.key}}", "title": "{{issue.customfield_10001}}", "url": "{{issue.customfield_10002}}", "description": "{{issue.customfield_10003}}", "author": "{{issue.customfield_10004}}", "date": "{{issue.customfield_10006}}", "category": "{{issue.customfield_10007}}", "locale": "{{issue.customfield_10008}}" } } ``` Then GitHub Action receives webhook and processes directly without Jira API calls. ## Maintenance ### Regular Tasks 1. **Monitor failed syncs**: Check GitHub Action logs 2. **Review PRs**: Automated PRs should still be reviewed before merge 3. **Clean up Jira**: Archive old Published tickets 4. **Update mappings**: If custom field IDs change, update script ### Troubleshooting **Articles not syncing:** - Check Jira API credentials - Verify custom field IDs match - Check JQL query returns expected tickets **Duplicate articles:** - Script checks URL before adding - Manually remove duplicates from YAML if needed **Formatting issues:** - Validate YAML after sync: `python -m yaml about-neuron/news-and-blogs/news-and-blogs.yaml` - Check for special characters in descriptions ## Security Considerations 1. **API Tokens**: Store in GitHub Secrets, never commit 2. **Permissions**: Use dedicated Jira service account with minimal permissions 3. **Validation**: Sanitize all input from Jira before adding to YAML 4. **Review**: Always review automated PRs before merging ## Cost & Complexity | Approach | Setup Time | Maintenance | Cost | |----------|-----------|-------------|------| | GitHub Actions + Jira API | 4-6 hours | Low | Free (GitHub Actions) | | Webhook + GitHub Actions | 2-3 hours | Very Low | Free | | Manual Script | 1-2 hours | Medium | Free | ## Recommendation **For production use**: Start with **Option 3 (Manual Script)** to validate the workflow, then upgrade to **Option 1 (GitHub Actions)** once the process is proven and volume increases. **For high volume**: Use **Option 2 (Webhook)** for real-time updates. ## Future Enhancements 1. **Validation**: Add URL validation, duplicate detection in Jira 2. **Preview**: Generate preview of how article will appear 3. **Scheduling**: Support future publication dates 4. **Analytics**: Track article submissions and approval rates 5. **Notifications**: Notify submitters when articles are published 6. **Bulk import**: Support CSV upload for multiple articles ================================================ FILE: about-neuron/news-and-blogs/README.md ================================================ # AWS Neuron News and Blogs System This directory contains a dynamic, community-driven news and blogs page for AWS Neuron, Trainium, and Inferentia content. ## Overview The system allows external contributors to add links to relevant articles, blog posts, and news through a simple YAML data file, without requiring any backend infrastructure. ## Architecture ``` about-neuron/news-and-blogs/ ├── index.rst # Main page (uses datatemplate directives) ├── news-and-blogs.yaml # Data file with all article metadata ├── featured-articles.tmpl # Jinja2 template for featured section ├── all-articles.tmpl # Jinja2 template for all articles section ├── CONTRIBUTING.md # Contribution guidelines └── README.md # This file ``` ## How It Works 1. **Data Storage**: Article metadata is stored in `news-and-blogs.yaml` 2. **Templating**: Jinja2 templates (`*.tmpl`) define how articles are rendered 3. **Rendering**: Sphinx's `datatemplates` extension processes the YAML and templates at build time 4. **Output**: Static HTML with grid cards using `sphinx-design` ## Key Features - ✅ **No backend required** - fully static site generation - ✅ **Easy contributions** - edit a YAML file and submit a PR - ✅ **Version controlled** - all changes tracked in Git - ✅ **Automated rendering** - Sphinx handles everything at build time - ✅ **Responsive design** - uses sphinx-design grid system - ✅ **Maintainable** - clear separation of data, templates, and content ## Adding New Articles See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed instructions. Quick example: ```yaml all_articles: - title: "My Article Title" url: "https://example.com/article" description: "Brief description" author: "Author Name" date: "2026-01-15" category: "blog" featured: false ``` ## Modifying Templates Templates use Jinja2 syntax and have access to the YAML data structure. ### Featured Articles Template (`featured-articles.tmpl`) Renders articles from the `featured_articles` section with: - Large cards with borders - Icons and bold titles - Author attribution with links - Publication dates ### All Articles Template (`all-articles.tmpl`) Renders articles from the `all_articles` section with: - 2-column grid on desktop, 1-column on mobile - Simple card layout - Title and description ## Customization ### Adding New Fields 1. Add field to YAML entries: ```yaml - title: "Article" new_field: "value" ``` 2. Update template to use it: ```jinja {{ article.new_field }} ``` ### Changing Layout Edit the grid directive in templates: ```rst .. grid:: 1 1 2 3 # 1 col mobile, 1 tablet, 2 desktop, 3 wide :gutter: 2 ``` ### Adding Filters/Sorting You can add Jinja2 filters in templates: ```jinja {% for article in all_articles | sort(attribute='date', reverse=True) %} {# Sorted by date, newest first #} {% endfor %} ``` ## Dependencies Required Sphinx extensions (already in `conf.py`): - `sphinxcontrib.datatemplates` - YAML data processing - `sphinx_design` - Grid card layouts ## Testing Locally 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Build documentation: ```bash sphinx-build -b html . _build/html ``` 3. View the page: ```bash open _build/html/about-neuron/news-and-blogs/index.html ``` ## Troubleshooting ### Template Not Found Error Ensure templates are in the same directory as `index.rst` or add the directory to `templates_path` in `conf.py`. ### YAML Parse Error Validate your YAML: ```bash python -c "import yaml; yaml.safe_load(open('news-and-blogs.yaml'))" ``` ### Articles Not Rendering Check that: 1. YAML file is in the same directory as `index.rst` 2. Template files exist and have correct names 3. YAML structure matches template expectations ## Future Enhancements Possible improvements: - Add category filtering/grouping - Add search functionality - Add RSS feed generation - Add automatic link checking - Add article metadata validation - Sort by date automatically - Add pagination for large lists ## Support For questions or issues: - Open a GitHub issue - Contact AWS Neuron support team - See main repository CONTRIBUTING.md ================================================ FILE: about-neuron/news-and-blogs/article-template.yaml ================================================ # Article Entry Template # # Copy this template and fill in your article details. # Then add it to the appropriate section in news-and-blogs.yaml # # For featured articles (requires AWS Neuron team approval): # Add to the 'featured_articles' section with featured: true and an icon # # For regular articles: # Add to the 'all_articles' section with featured: false # TEMPLATE - Copy everything below this line # ============================================ - title: "Your Article Title Here" url: "https://your-website.com/path-to-article" description: "A clear, concise description of your article in 1-2 sentences. Explain what readers will learn or discover." author: "Your Name or Organization Name" author_url: "https://your-website.com" # Optional: Your website or profile URL (for featured articles) date: "YYYY-MM-DD" # Publication date in YYYY-MM-DD format (e.g., 2026-01-27) category: "blog" # Choose ONE: blog, news, tutorial, case-study, benchmark locale: "en-US" # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR) keywords: ["keyword1", "keyword2", "keyword3"] # List of 3-10 relevant keywords for filtering/search featured: false # Set to false unless approved by AWS Neuron team icon: "📝" # Optional: Single emoji for featured articles (e.g., 🚀 📊 🎯 💡 ⚡) # ============================================ # FIELD DESCRIPTIONS # ============================================ # # title (required): # - Clear, descriptive title of your article # - Keep under 100 characters # - Use title case # # url (required): # - Full HTTPS URL to your article # - Must be publicly accessible # - Should not require login or paywall # # description (required): # - Brief summary of article content # - 20-500 characters recommended # - Focus on what readers will learn # - Avoid marketing language # # author (required): # - Your name or organization # - Will be displayed as attribution # # author_url (optional): # - Link to your website or profile # - Only used in featured articles # - Must be valid HTTPS URL # # date (required): # - Publication date in YYYY-MM-DD format # - Use the date the article was published # - Not the date you're adding it here # # category (required): # - blog: Technical blog posts and articles # - news: News announcements and press releases # - tutorial: Step-by-step guides and how-tos # - case-study: Customer success stories and use cases # - benchmark: Performance benchmarks and comparisons # # locale (required): # - Language and region code in format: language-REGION # - Examples: en-US (English-US), ja-JP (Japanese), zh-CN (Chinese-Simplified) # - Common codes: en-US, en-GB, ja-JP, zh-CN, zh-TW, ko-KR, de-DE, fr-FR, # es-ES, es-MX, pt-BR, pt-PT, it-IT, nl-NL, ru-RU, ar-SA, hi-IN # - A flag emoji will be displayed based on the locale # - Unknown locales will display 🌐 globe icon # # keywords (required): # - List of 3-10 relevant keywords for filtering and search # - Use lowercase, hyphenated format (e.g., "machine-learning", "pytorch") # - Include technology names, topics, and key concepts # - Examples: ["trainium", "inference", "pytorch", "llm", "optimization"] # - Keywords help users find your article through filtering and search # # featured (required): # - true: Article appears in featured section (requires approval) # - false: Article appears in all articles section # - Most submissions should use false # # icon (optional): # - Single emoji character # - Only used for featured articles # - Examples: 🚀 📊 🎯 💡 ⚡ 🔥 ✨ 🌟 📈 🛠️ # # ============================================ # EXAMPLES # ============================================ # Example 1: Tutorial Article - title: "Getting Started with PyTorch on AWS Trainium" url: "https://example.com/pytorch-trainium-tutorial" description: "A comprehensive guide to training PyTorch models on AWS Trainium instances, including setup, optimization tips, and common pitfalls to avoid." author: "Jane Developer" date: "2026-01-15" category: "tutorial" locale: "en-US" keywords: ["pytorch", "trainium", "training", "tutorial", "getting-started"] featured: false # Example 2: Benchmark Article - title: "BERT Inference Performance: Inferentia2 vs GPU Comparison" url: "https://example.com/bert-benchmark" description: "Detailed performance comparison of BERT inference on AWS Inferentia2 vs leading GPU instances, including cost analysis and throughput metrics." author: "ML Performance Lab" date: "2026-01-20" category: "benchmark" locale: "en-US" keywords: ["inferentia", "bert", "benchmark", "performance", "gpu-comparison"] featured: false # Example 3: Case Study - title: "How Acme Corp Reduced ML Training Costs by 60% with Trainium" url: "https://example.com/acme-case-study" description: "Learn how Acme Corp migrated their large language model training to AWS Trainium and achieved significant cost savings while maintaining performance." author: "Acme Corp Engineering Team" date: "2026-01-25" category: "case-study" locale: "en-US" keywords: ["trainium", "cost-optimization", "llm", "case-study", "migration"] featured: false # Example 4: Featured Article (requires approval) - title: "Advanced Optimization Techniques for Neuron Compiler" url: "https://example.com/neuron-optimization" description: "Deep dive into advanced compiler optimization techniques for AWS Neuron, with practical examples and performance improvements." author: "AWS Neuron Team" author_url: "https://aws.amazon.com/machine-learning/neuron/" date: "2026-01-27" category: "blog" locale: "en-US" keywords: ["neuron", "compiler", "optimization", "performance", "advanced"] featured: true icon: "⚡" # Example 5: Japanese Article - title: "AWS Trainiumで大規模言語モデルを訓練する" url: "https://example.jp/trainium-llm-training" description: "AWS Trainiumを使用して大規模言語モデルを効率的に訓練する方法を詳しく解説します。" author: "日本のMLエンジニア" date: "2026-01-20" category: "tutorial" locale: "ja-JP" keywords: ["trainium", "llm", "training", "japanese", "tutorial"] featured: false ================================================ FILE: about-neuron/news-and-blogs/index.rst ================================================ .. meta:: :description: Links to external news and blog articles about AWS Neuron and Trainium/Inferentia ML accelerators. :date-modified: 02/26/2026 .. _neuron-news: AWS Neuron News and Blogs ========================= Stay up to date with the latest news, announcements, and technical blog posts about AWS Neuron, AWS Trainium, and AWS Inferentia. Discover customer success stories, performance benchmarks, best practices, and deep dives into machine learning acceleration on AWS. ---- Featured Articles ----------------- Read recent blogs and technical content about Neuron, Trainium, and Inferentia from AWS subject matter experts and our highly experienced customers. .. datatemplate:yaml:: news-and-blogs.yaml .. grid:: 1 :gutter: 2 {% for article in data.featured_articles %} {% if article.locale == 'en-US' %}{% set flag = '🇺🇸' %}{% set locale_name = 'English' %}{% elif article.locale == 'ja-JP' %}{% set flag = '🇯🇵' %}{% set locale_name = 'Japanese' %}{% elif article.locale == 'zh-CN' %}{% set flag = '🇨🇳' %}{% set locale_name = 'Chinese' %}{% elif article.locale == 'ko-KR' %}{% set flag = '🇰🇷' %}{% set locale_name = 'Korean' %}{% else %}{% set flag = '🌐' %}{% set locale_name = 'Unknown' %}{% endif %} .. grid-item-card:: :class-card: sd-border-2 :link: {{ article.url }} {{ article.icon }} **{{ article.title }}** ^^^ {{ article.description }} +++ **Published on**: {{ article.date }} | {{ flag }} ({{ locale_name }}) | Content by `{{ article.author }} <{{ article.author_url }}>`__ {% endfor %} .. note:: This page is regularly updated with new content. Bookmark it to stay informed about the latest developments in AWS Neuron, Trainium, and Inferentia. **For the full list of featured articles and posts, go to the :ref:`News & Blogs ` section of this page.** .. _all-articles: News & Blogs ------------- Explore the latest news, press releases, and industry coverage about AWS Neuron, Trainium, and Inferentia. .. raw:: html
.. datatemplate:yaml:: news-and-blogs.yaml .. grid:: 1 1 2 2 :gutter: 2 :class-container: articles-grid news-blogs-grid {% for article in data.all_articles|sort(attribute='date', reverse=True) %} {% if article.locale == 'en-US' %}{% set flag = '🇺🇸' %}{% set locale_name = 'English' %}{% elif article.locale == 'ja-JP' %}{% set flag = '🇯🇵' %}{% set locale_name = 'Japanese' %}{% elif article.locale == 'zh-CN' %}{% set flag = '🇨🇳' %}{% set locale_name = 'Chinese' %}{% elif article.locale == 'ko-KR' %}{% set flag = '🇰🇷' %}{% set locale_name = 'Korean' %}{% else %}{% set flag = '🌐' %}{% set locale_name = 'Unknown' %}{% endif %} .. grid-item-card:: :link: {{ article.url }} :class-card: sd-border-1 article-card :class-body: article-locale-{{ article.locale }} **{{ article.title }}** ^^^ {{ article.description }} +++ **Published on**: {{ article.date }} | {{ flag }} ({{ locale_name }}) {% endfor %} .. raw:: html .. important:: AWS and Neuron provide links to external articles and posts to help you discover them, but do not commission or own any content not created by AWS employees. This list is curated based on internal and customer recommendations. **Want to add your article?** Go to `https://github.com/aws-neuron/aws-neuron-sdk `_, edit ``about-neuron/news-and-blogs/news-and-blogs.yaml`` to add your submission, and submit a pull request. ================================================ FILE: about-neuron/news-and-blogs/news-and-blogs.yaml ================================================ # AWS Neuron News and Blogs Data File # # This file contains metadata for external articles, blog posts, and news about # AWS Neuron, Trainium, and Inferentia. # # To contribute a new article: # 1. Add a new entry to the appropriate section below # 2. Follow the existing format exactly # 3. Submit a pull request with your changes # # Entry format: # - title: "Article Title" # url: "https://example.com/article" # description: "Brief description of the article content" # author: "Author Name or Organization" # date: "YYYY-MM-DD" # category: "blog|news|tutorial|case-study|benchmark" # locale: "en-US" # Language/region code (e.g., en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR) # keywords: ["keyword1", "keyword2", "keyword3"] # List of relevant keywords for filtering # featured: true|false # Set to true for featured articles section featured_articles: - title: "AWS Trainium: 50 Exercises" url: "https://zenn.dev/karakuri_blog/articles/5ccedeee1beb08" description: "Learn how to build LLMs for Trainium accelerators with this rich 50-lesson guide from customer Karakuri." author: "Karakuri" author_url: "https://about.karakuri.ai/" date: "2026-02-19" category: "tutorial" locale: "en-US" keywords: ["trainium", "llm", "training", "tutorial", "japanese"] featured: true icon: "🚀" - title: "Cost-effective AI image generation with PixArt-Sigma inference on AWS Trainium and AWS Inferentia" url: "https://aws.amazon.com/blogs/machine-learning/cost-effective-ai-image-generation-with-pixart-sigma-inference-on-aws-trainium-and-aws-inferentia/" description: "Learn how to use AWS Trainium and Inferentia to deploy a PixArt-Sigma diffusion transformer model." author: "AWS Neuron Team" author_url: "https://aws.amazon.com/machine-learning/neuron/" date: "2026-02-19" category: "blog" locale: "en-US" keywords: ["inferentia", "trainium", "inference", "diffusion", "image-generation"] featured: true icon: "📊" all_articles: # Japanese Articles - title: "AWS Neuron 関連記事まとめ" url: "https://zenn.dev/tosshi/articles/36f3615e26c323" description: "AWS Neuron エコシステムに関する自身が作成した一連の技術記事のインデックス" author: "littlemex" date: "2026-02-20" category: "blog" locale: "ja-JP" keywords: ["trainium", "neuron", "collective-communication", "architecture", "japanese"] featured: false - title: "【AWS re:Invent 2025 速報】AWS 自社設計 AIチップ AWS Trainium3 の全貌" url: "https://zenn.dev/aws_japan/articles/06808526d5c75f" description: "AWS re:Invent 2025で発表されたAWS Trainium3カスタムAIチップの完全な概要をお届けします。" author: "AWS Japan" date: "2025-12-06" category: "news" locale: "ja-JP" keywords: ["trainium3", "reinvent", "announcement", "ai-chip"] featured: false - title: "【AWS Trainium 50本ノック #0】はじめに" url: "https://zenn.dev/karakuri_blog/articles/77d93c40b27b60" description: "AWS Trainium 50本ノックシリーズの紹介 - 入門ガイド。" author: "Karakuri" date: "2025-11-18" category: "tutorial" locale: "ja-JP" keywords: ["trainium", "tutorial", "getting-started", "series"] featured: false - title: "「Syn Pro」開発レポート:AWS TrainiumとRFTによる高性能日本語LLMの実現" url: "https://zenn.dev/karakuri_blog/articles/b923acfc86083b" description: "AWS TrainiumとRFTを使用した高性能日本語LLMの構築に関する開発レポート。" author: "Karakuri" date: "2025-10-24" category: "case-study" locale: "ja-JP" keywords: ["trainium", "llm", "japanese", "rft", "case-study"] featured: false - title: "AWS Inferentia2 + Llama 3.2 にできること" url: "https://zenn.dev/exwzd/articles/20250930-inferentia-llama" description: "AWS Inferentia2とLlama 3.2モデルでできることを紹介します。" author: "exwzd" date: "2025-09-30" category: "blog" locale: "ja-JP" keywords: ["inferentia2", "llama", "capabilities", "inference"] featured: false - title: "AWS Inferentia2とvLLMでLlama 3.2の推論サーバーを構築する手順" url: "https://zenn.dev/exwzd/articles/20250827_inferentia_compile" description: "AWS Inferentia2とvLLMを使用してLlama 3.2推論サーバーを構築するステップバイステップガイド。" author: "exwzd" date: "2025-08-28" category: "tutorial" locale: "ja-JP" keywords: ["inferentia2", "vllm", "llama", "inference", "tutorial"] featured: false - title: "【開催報告】Neuron Community – Vol.2" url: "https://aws.amazon.com/jp/blogs/news/neuron-community-vol-2/" description: "Neuron Community Vol.2の開催報告。" author: "AWS Japan" date: "2025-07-24" category: "news" locale: "ja-JP" keywords: ["community", "event", "neuron", "japan"] featured: false - title: "KARAKURI VL - 日本語コンピュータユースに特化した視覚言語モデル" url: "https://zenn.dev/karakuri_blog/articles/28c73f2ada797a" description: "日本語コンピュータユースに特化したビジョン言語モデルKARAKURI VLの紹介。" author: "Karakuri" date: "2025-07-11" category: "blog" locale: "ja-JP" keywords: ["vision-language", "japanese", "multimodal", "karakuri"] featured: false - title: "LLM-jp Chatbot Arenaを試験運用しました" url: "https://llm-jp.nii.ac.jp/ja/blog/blog-836/" description: "LLM-jp Chatbot Arenaの試験運用に関するレポート。" author: "LLM-jp" date: "2025-05-12" category: "blog" locale: "ja-JP" keywords: ["llm", "chatbot", "arena", "japanese"] featured: false - title: "【開催報告】Neuron Community – Day One" url: "https://aws.amazon.com/jp/blogs/news/neuron-community-day-one/" description: "初回Neuron Community Dayの開催報告。" author: "AWS Japan" date: "2025-04-14" category: "news" locale: "ja-JP" keywords: ["community", "event", "neuron", "japan"] featured: false - title: "EKS Auto Mode でサクッと機械学習用インスタンスを利用してみる。 AWS 独自設計チップ搭載の Trainium と Inferentia を使ってみた!" url: "https://dev.classmethod.jp/articles/eks-auto-mode-gpu-aws-trainium-inferentia/" description: "EKS Auto Modeを使用してMLインスタンスを簡単に利用する方法。AWS TrainiumとInferentiaチップの活用ガイド。" author: "Classmethod" date: "2025-01-02" category: "tutorial" locale: "ja-JP" keywords: ["eks", "trainium", "inferentia", "kubernetes", "tutorial"] featured: false # Korean Articles - title: "Nota AI가 제안하는 AWS Inferentia에서 다양한 LLM 모델 양자화 최적화기법 사용하기" url: "https://aws.amazon.com/ko/blogs/tech/llm-model-quantization-techniques-for-aws-inferentia-by-nota-ai/" description: "Nota AI가 제안하는 AWS Inferentia에서 LLM 모델 양자화 최적화 기법." author: "Nota AI / AWS Korea" date: "2026-01-20" category: "blog" locale: "ko-KR" keywords: ["inferentia", "quantization", "llm", "optimization", "nota-ai"] featured: false - title: "Nota AI가 제안하는 Transformer 모델을 AWS Inferentia/Trainium에 손쉽게 배포하는 방법" url: "https://aws.amazon.com/ko/blogs/tech/tips-for-using-transformer-models-on-aws-inf-and-trn/" description: "Nota AI가 제안하는 AWS Inferentia/Trainium에서 Transformer 모델을 쉽게 배포하는 방법." author: "Nota AI / AWS Korea" date: "2025-04-09" category: "blog" locale: "ko-KR" keywords: ["transformer", "deployment", "inferentia", "trainium", "nota-ai"] featured: false - title: "콜드스타트 추천 문제를 AWS Trainium과 vLLM으로 해결하는 자동화 전략" url: "https://blog.a-cloud.co.kr/2025/07/25/%EC%BD%9C%EB%93%9C%EC%8A%A4%ED%83%80%ED%8A%B8-%EC%B6%94%EC%B2%9C-%EB%AC%B8%EC%A0%9C%EB%A5%BC-aws-trainium%EA%B3%BC-vllm%EC%9C%BC%EB%A1%9C-%ED%95%B4%EA%B2%B0%ED%95%98%EB%8A%94-%EC%9E%90%EB%8F%99/" description: "AWS Trainium과 vLLM을 사용하여 콜드 스타트 추천 문제를 해결하는 자동화 전략." author: "A-Cloud" date: "2025-07-25" category: "blog" locale: "ko-KR" keywords: ["trainium", "vllm", "cold-start", "recommendations", "automation"] featured: false - title: "DeepSeek-R1 모델 AWS 출시" url: "https://aws.amazon.com/ko/blogs/korea/deepseek-r1-models-now-available-on-aws/" description: "AWS에서 DeepSeek-R1 모델을 사용할 수 있게 되었습니다." author: "AWS Korea" date: "2025-02-05" category: "news" locale: "ko-KR" keywords: ["deepseek", "r1", "model", "launch", "aws"] featured: false # Chinese Articles - title: "使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型(一)" url: "https://aws.amazon.com/cn/blogs/china/deploying-the-deepseek-r1-distillation-model-using-amazon-inferentia2/" description: "使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型(第一部分)。" author: "AWS China" date: "2025-02-12" category: "tutorial" locale: "zh-CN" keywords: ["inferentia2", "deepseek", "r1", "deployment", "distillation"] featured: false - title: "使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型(二)" url: "https://aws.amazon.com/cn/blogs/china/deploying-the-deepseek-r1-distillation-model-using-amazon-inferentia2-part-two/" description: "使用亚马逊云科技自研芯片 Inferentia2 部署 DeepSeek R1 Distillation 模型(第二部分)。" author: "AWS China" date: "2025-02-14" category: "tutorial" locale: "zh-CN" keywords: ["inferentia2", "deepseek", "r1", "deployment", "distillation"] featured: false - title: "Bytedance processes billions of daily videos using their multimodal video understanding models on AWS Inferentia2" url: "https://aws.amazon.com/blogs/machine-learning/bytedance-processes-billions-of-daily-videos-using-their-multimodal-video-understanding-models-on-aws-inferentia2/" description: "How Bytedance processes billions of daily videos using multimodal models on AWS Inferentia2." author: "AWS" date: "2025-02-26" category: "case-study" locale: "en-US" keywords: ["inferentia2", "bytedance", "video", "multimodal", "case-study"] featured: false - title: "基于 HAMi 实现亚马逊云科技 Trainium 与 Inferentia 核心级共享与策略性拓扑调度" url: "https://aws.amazon.com/cn/blogs/china/achieve-trainium-and-inferentia-core-level-sharing-and-strategic-topology-scheduling/" description: "基于 HAMi 实现亚马逊云科技 Trainium 与 Inferentia 核心级共享与策略性拓扑调度。" author: "AWS China" date: "2025-11-06" category: "blog" locale: "zh-CN" keywords: ["trainium", "inferentia", "hami", "scheduling", "topology"] featured: false # Red Hat / AWS Neuron Collaboration - title: "Red Hat to Deliver Enhanced AI Inference Across AWS" url: "https://www.redhat.com/en/about/press-releases/red-hat-deliver-enhanced-ai-inference-across-aws" description: "Red Hat and AWS expand collaboration to power enterprise-grade generative AI using Red Hat AI Inference Server on AWS Inferentia2 and Trainium3." author: "Red Hat" date: "2025-12-02" category: "news" locale: "en-US" keywords: ["red-hat", "inferentia2", "trainium3", "vllm", "openshift", "inference", "collaboration"] featured: false - title: "Run cost-effective AI workloads on OpenShift with AWS Neuron Operator" url: "https://developers.redhat.com/articles/2025/12/02/cost-effective-ai-workloads-openshift-aws-neuron-operator" description: "How to use the AWS Neuron Operator to run LLM inference with vLLM on AWS AI chips in Red Hat OpenShift." author: "Red Hat" date: "2025-12-02" category: "tutorial" locale: "en-US" keywords: ["red-hat", "openshift", "neuron-operator", "vllm", "inferentia", "trainium", "kubernetes"] featured: false - title: "AWS Neuron Operator for AI Chips on AWS — GitHub Releases" url: "https://github.com/awslabs/operator-for-ai-chips-on-aws/releases" description: "Open-source AWS Neuron Operator for Kubernetes and Red Hat OpenShift, enabling native support for AWS Inferentia and Trainium accelerators." author: "AWS" date: "2025-12-02" category: "news" locale: "en-US" keywords: ["neuron-operator", "kubernetes", "openshift", "open-source", "inferentia", "trainium"] featured: false - title: "Red Hat AI Inference Server — vLLM Neuron Container Image (RHEL 9)" url: "https://catalog.redhat.com/en/software/containers/rhaiis/vllm-neuron-rhel9/698c42b20b626d81c97abd7f" description: "Certified container image for the Red Hat AI Inference Server with vLLM optimized for AWS Inferentia and Trainium accelerators via the AWS Neuron SDK. Provides enterprise-grade, high-performance LLM inference serving on RHEL 9, enabling production deployment of generative AI models on AWS AI chips through Red Hat OpenShift or Podman." author: "Red Hat" date: "2025-12-02" category: "news" locale: "en-US" keywords: ["red-hat", "vllm", "neuron", "inferentia", "trainium", "container", "rhel9", "inference", "openshift"] featured: true ================================================ FILE: about-neuron/news-and-blogs/validate_articles.py ================================================ #!/usr/bin/env python3 """ Validation script for news-and-blogs.yaml This script validates the structure and content of article entries to ensure they meet the required format before submission. Usage: python validate_articles.py """ import sys from pathlib import Path from datetime import datetime import re try: import yaml except ImportError: print("Error: PyYAML is required. Install with: pip install pyyaml") sys.exit(1) VALID_CATEGORIES = {'blog', 'news', 'tutorial', 'case-study', 'benchmark'} REQUIRED_FIELDS = {'title', 'url', 'description', 'author', 'date', 'category', 'locale', 'keywords'} OPTIONAL_FIELDS = {'featured', 'author_url', 'icon'} ALL_FIELDS = REQUIRED_FIELDS | OPTIONAL_FIELDS # Valid locale codes VALID_LOCALES = { 'en-US', 'en-GB', 'en-CA', 'en-AU', 'en-NZ', 'en-IE', 'en-IN', 'en-SG', 'en-ZA', 'ja-JP', 'zh-CN', 'zh-TW', 'zh-HK', 'ko-KR', 'th-TH', 'vi-VN', 'id-ID', 'ms-MY', 'fil-PH', 'de-DE', 'fr-FR', 'es-ES', 'es-MX', 'es-AR', 'pt-BR', 'pt-PT', 'it-IT', 'nl-NL', 'pl-PL', 'ru-RU', 'tr-TR', 'sv-SE', 'da-DK', 'no-NO', 'fi-FI', 'cs-CZ', 'hu-HU', 'ro-RO', 'el-GR', 'uk-UA', 'ar-SA', 'ar-AE', 'ar-EG', 'he-IL', 'fa-IR', 'hi-IN', 'bn-BD', 'ur-PK', 'sw-KE' } def validate_url(url): """Validate URL format""" url_pattern = re.compile( r'^https?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|' # domain r'localhost|' # localhost r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # or IP r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) return url_pattern.match(url) is not None def validate_date(date_str): """Validate date format (YYYY-MM-DD)""" try: datetime.strptime(date_str, '%Y-%m-%d') return True except ValueError: return False def validate_article(article, index, section): """Validate a single article entry""" errors = [] warnings = [] # Check for required fields missing_fields = REQUIRED_FIELDS - set(article.keys()) if missing_fields: errors.append(f"Missing required fields: {', '.join(missing_fields)}") # Check for unknown fields unknown_fields = set(article.keys()) - ALL_FIELDS if unknown_fields: warnings.append(f"Unknown fields (will be ignored): {', '.join(unknown_fields)}") # Validate title if 'title' in article: if not article['title'] or not isinstance(article['title'], str): errors.append("Title must be a non-empty string") elif len(article['title']) > 200: warnings.append(f"Title is very long ({len(article['title'])} chars). Consider shortening.") # Validate URL if 'url' in article: if not validate_url(article['url']): errors.append(f"Invalid URL format: {article['url']}") # Validate description if 'description' in article: if not article['description'] or not isinstance(article['description'], str): errors.append("Description must be a non-empty string") elif len(article['description']) < 20: warnings.append("Description is very short. Consider adding more detail.") elif len(article['description']) > 500: warnings.append(f"Description is very long ({len(article['description'])} chars). Consider shortening.") # Validate author if 'author' in article: if not article['author'] or not isinstance(article['author'], str): errors.append("Author must be a non-empty string") # Validate author_url (optional) if 'author_url' in article: if article['author_url'] and not validate_url(article['author_url']): errors.append(f"Invalid author_url format: {article['author_url']}") # Validate date if 'date' in article: if not validate_date(str(article['date'])): errors.append(f"Invalid date format: {article['date']}. Use YYYY-MM-DD") else: article_date = datetime.strptime(str(article['date']), '%Y-%m-%d') if article_date > datetime.now(): warnings.append(f"Date is in the future: {article['date']}") # Validate category if 'category' in article: if article['category'] not in VALID_CATEGORIES: errors.append(f"Invalid category: {article['category']}. Must be one of: {', '.join(VALID_CATEGORIES)}") # Validate locale if 'locale' in article: if not isinstance(article['locale'], str): errors.append("Locale must be a string") elif article['locale'] not in VALID_LOCALES: warnings.append(f"Locale '{article['locale']}' not in standard list. Will display with 🌐 globe icon. Common locales: en-US, ja-JP, zh-CN, de-DE, fr-FR, es-ES, pt-BR, ko-KR") # Validate keywords if 'keywords' in article: if not isinstance(article['keywords'], list): errors.append("Keywords must be a list") elif len(article['keywords']) == 0: warnings.append("Keywords list is empty. Consider adding relevant keywords for better filtering") else: for i, keyword in enumerate(article['keywords']): if not isinstance(keyword, str): errors.append(f"Keyword at index {i} must be a string") elif len(keyword.strip()) == 0: warnings.append(f"Keyword at index {i} is empty or whitespace") if len(article['keywords']) > 10: warnings.append(f"Article has {len(article['keywords'])} keywords. Consider limiting to 5-10 most relevant keywords") # Validate featured if 'featured' in article: if not isinstance(article['featured'], bool): errors.append("Featured must be true or false (boolean)") if section == 'all_articles' and article['featured']: warnings.append("Article marked as featured but in all_articles section") # Validate icon (optional) if 'icon' in article: if not isinstance(article['icon'], str) or len(article['icon']) > 10: warnings.append("Icon should be a short string (emoji recommended)") return errors, warnings def main(): """Main validation function""" yaml_file = Path(__file__).parent / 'news-and-blogs.yaml' if not yaml_file.exists(): print(f"❌ Error: {yaml_file} not found") return 1 print(f"Validating {yaml_file}...\n") try: with open(yaml_file, 'r', encoding='utf-8') as f: data = yaml.safe_load(f) except yaml.YAMLError as e: print(f"❌ YAML Parse Error: {e}") return 1 if not isinstance(data, dict): print("❌ Error: YAML file must contain a dictionary") return 1 total_errors = 0 total_warnings = 0 # Validate featured_articles section if 'featured_articles' in data: print("📌 Validating featured_articles section...") if not isinstance(data['featured_articles'], list): print("❌ Error: featured_articles must be a list") total_errors += 1 else: for i, article in enumerate(data['featured_articles'], 1): errors, warnings = validate_article(article, i, 'featured_articles') if errors or warnings: print(f"\n Article #{i}: {article.get('title', 'NO TITLE')}") for error in errors: print(f" ❌ Error: {error}") total_errors += 1 for warning in warnings: print(f" ⚠️ Warning: {warning}") total_warnings += 1 print() # Validate all_articles section if 'all_articles' in data: print("📚 Validating all_articles section...") if not isinstance(data['all_articles'], list): print("❌ Error: all_articles must be a list") total_errors += 1 else: for i, article in enumerate(data['all_articles'], 1): errors, warnings = validate_article(article, i, 'all_articles') if errors or warnings: print(f"\n Article #{i}: {article.get('title', 'NO TITLE')}") for error in errors: print(f" ❌ Error: {error}") total_errors += 1 for warning in warnings: print(f" ⚠️ Warning: {warning}") total_warnings += 1 print() # Summary print("=" * 60) if total_errors == 0 and total_warnings == 0: print("✅ Validation passed! No errors or warnings found.") return 0 else: print(f"Validation complete:") if total_errors > 0: print(f" ❌ {total_errors} error(s) found - must be fixed") if total_warnings > 0: print(f" ⚠️ {total_warnings} warning(s) found - should be reviewed") if total_errors > 0: print("\n❌ Validation FAILED - please fix errors before submitting") return 1 else: print("\n✅ Validation PASSED - warnings are optional to fix") return 0 if __name__ == '__main__': sys.exit(main()) ================================================ FILE: about-neuron/oss/index.rst ================================================ .. meta:: :description: GitHub repositories for AWS Neuron open source components, libraries, and tools. :date-modified: 12/02/2025 Neuron Open Source Repositories and Contribution =================================================== AWS Neuron provides open source code and samples for some of its components, libraries, and tools under the Apache 2.0 license. The current public repositories open to contribution at this time are listed below. Neuron Open Source GitHub Repositories --------------------------------------- .. grid:: 1 :gutter: 3 .. grid-item-card:: :class-body: sphinx-design-class-title-small **TorchNeuron PyTorch Extension Open Source** ^^^ Source code for the Neuron Native PyTorch extension and the TorchNeuron library that implements it for AWS Trainium. * Neuron GitHub source repository: https://github.com/aws-neuron/torch-neuronx .. grid-item-card:: :class-body: sphinx-design-class-title-small **Neuron Kernel Library Open Source** ^^^ Source code and specifications for the pre-built kernels that ship with the NKI Library . * Neuron GitHub source repository: https://github.com/aws-neuron/nki-library .. grid-item-card:: :class-body: sphinx-design-class-title-small **vLLM for Neuron Open Source** ^^^ Source code for the vLLM integrations with Neuron, supporting AWS Trainium and Inferentia. * Neuron GitHub source repository: https://github.com/vllm-project/vllm-neuron * **Note**: Released under vLLM project license (`LICENSE `__). .. grid-item-card:: :class-body: sphinx-design-class-title-small **NKI Samples** ^^^ Full code examples that support NKI kernel development. * Neuron GitHub source repository: https://github.com/aws-neuron/nki-samples How to Contribute to Neuron Open Source ---------------------------------------- Contributions via pull requests are appreciated! Before sending us a pull request, please ensure that: 1. You are working against the latest source on the `main`` branch. 2. You check existing open and recently merged pull requests and GitHub Issues to make sure someone else hasn't addressed the problem already. 3. You open a GitHub Issue for the repo to discuss any significant work. To send us a pull request: 1. Fork the repository. 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 3. Ensure local tests pass. 4. Commit to your fork using clear commit messages. 5. Send us a pull request, answering any default questions in the pull request interface. 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. GitHub provides documentation on `forking a repository `_ and `creating a pull request `_. For the specific details on licenses and contributing to each OSS repo, review the ``CONTRIBUTING.md`` pages linked below: * Contribute to TorchNeuron: https://github.com/aws-neuron/torch-neuronx/blob/main/CONTRIBUTING.md * Contribute to the NKI Library: https://github.com/aws-neuron/nki-library/blob/main/CONTRIBUTING.md * Contribute the the NKI samples: https://github.com/aws-neuron/nki-samples/blob/main/CONTRIBUTING.md .. Re-add this when available: * Contribute to vLLM Neuron: https://github.com/vllm-project/vllm-neuron/blob/main/CONTRIBUTING.md ================================================ FILE: about-neuron/profiling-tools.rst ================================================ .. _profiling-tools: Profiling Tools ================ .. toctree:: :maxdepth: 1 Neuron Profiler User Guide Neuron Profiler 2.0 (Beta) User Guide What's New ================================================ FILE: about-neuron/quick-start/_specs/REFACTORING_NOTES.md ================================================ # Quick-Start Refactoring Notes ## Summary The quick-start documentation has been restructured with a modern, task-based information architecture. The new structure eliminates the need for .txt includes in the primary quickstart paths. ## New Structure (No .txt includes) ### Primary Quickstarts (Self-contained) - `index.rst` - Main landing page with decision tree - `training-quickstart.rst` - Complete training workflow (no includes) - `inference-quickstart.rst` - Complete inference workflow (no includes) These files follow the procedural-quickstart template and contain all content inline. No external includes required. ### Supporting Pages - `docs-quicklinks.rst` - Quick navigation links - `github-samples.rst` - GitHub repository links ## Legacy Structure (Uses .txt includes) ### Legacy Quick-Start Pages (Inf1 only) - `torch-neuron.rst` - Uses tab-inference-torch-neuronx.txt and tab-inference-torch-neuron.txt - `tensorflow-neuron.rst` - Uses tab-inference-tensorflow-neuronx.txt and tab-inference-tensorflow-neuron.rst - `mxnet-neuron.rst` - Uses tab-inference-mxnet-neuron.txt These legacy pages: - Target Inf1 instances (NeuronCore v1) - Use .txt includes that reference `/src/helperscripts/installationScripts/python_instructions.txt` - Are de-emphasized in the new navigation (under "Legacy" section) - Are preserved for backward compatibility and existing links ### .txt Include Files (Legacy only) All .txt files in this directory are used exclusively by the legacy quick-start pages: - `tab-inference-torch-neuronx*.txt` (various OS versions) - `tab-inference-torch-neuron*.txt` (various OS versions) - `tab-inference-tensorflow-neuronx*.txt` (various OS versions) - `tab-inference-tensorflow-neuron*.txt` (various OS versions) - `tab-inference-mxnet-neuron*.txt` (various OS versions) - `select-framework-note.txt` ## Design Decision **Why not refactor legacy files?** 1. They target deprecated Inf1 hardware 2. They're not prominently featured in new navigation 3. Refactoring would require updating installation script references 4. Risk of breaking existing external links 5. New users are directed to the new self-contained quickstarts **Why are new quickstarts self-contained?** 1. Easier to maintain (all content in one place) 2. Better for AI/LLM context retrieval 3. Follows modern docs-as-code best practices 4. Clearer for human readers (no jumping between files) 5. Follows the procedural-quickstart template structure ## Migration Path For users currently using legacy quick-starts: - Inf1 users: Continue using legacy pages (torch-neuron.rst, etc.) - New projects: Use new quickstarts (training-quickstart.rst, inference-quickstart.rst) - Inf2/Trn1/Trn2/Trn3 users: Use new quickstarts ## Future Cleanup When Inf1 support is fully deprecated: 1. Archive legacy quick-start pages to `/archive/quick-start/` 2. Remove .txt include files 3. Update any remaining cross-references 4. Update neuron_tag.py to remove special handling ================================================ FILE: about-neuron/quick-start/docs-quicklinks.rst ================================================ .. _docs-quick-links: Neuron Quick Links ================== .. grid:: 2 :gutter: 2 .. grid-item-card:: Overview * :ref:`neuron-quickstart` * :ref:`amazon-q-dev` * :ref:`model_samples_tutorials` * :ref:`benchmark` * :ref:`neuron_release_notes` * :ref:`announcements-main` .. grid-item-card:: ML frameworks * :ref:`pytorch-neuronx-main` * :ref:`jax-neuron-main` * :ref:`tensorflow-neuron-main` * :doc:`MXNet Neuron (archived) ` .. grid-item-card:: ML libraries * :ref:`nxdt` * :ref:`NxD Inference ` * :ref:`neuronx-distributed-index` * :ref:`transformers_neuronx_readme` * :ref:`nemo-megatron-index` .. grid-item-card:: User Guides * :ref:`neuron_runtime` * :ref:`neuron_cc` * :ref:`Neuron Kernel Interface (NKI) (beta) ` * :ref:`Neuron Custom C++ Operators (beta) ` * :ref:`monitoring_tools` * :ref:`profiling-tools` * :ref:`setup-guide-index` * :ref:`neuron-dlami-overview` * :ref:`neuron_containers` * :ref:`neuron-devflows` .. grid-item-card:: Learn AWS Neuron * :ref:`neuron-architecture-index` * :ref:`neuron-features-index` * :ref:`neuron-appnotes-index` * :ref:`neuron_faq` * :ref:`general-troubleshooting` .. grid-item-card:: About AWS Neuron * :ref:`neuron_release_notes` ================================================ FILE: about-neuron/quick-start/github-samples.rst ================================================ .. _neuron-github-samples: Neuron GitHub Samples ===================== .. grid:: 2 .. dropdown:: Training Samples for ``Trn1`` :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in :open: * `PyTorch Neuron (torch-neuronx) samples for Trn1 `_ * `Nemo Megatron for Neuron for Trn1 `_ * `AWS Neuron samples for ParallelCluster `_ * `AWS Neuron samples for EKS `_ * `AWS Neuron samples for SageMaker `_ * `AWS Neuron samples for Batch `_ .. dropdown:: Inference Samples for ``Inf2 & Trn1`` :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in :open: * `PyTorch Neuron (torch-neuronx) samples for Inf2 & Trn1 `_ * `Transformers Neuron (transformers-neuronx) samples `_ * `AWS Neuron samples for SageMaker `_ .. dropdown:: Inference Samples for ``Inf1`` :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in :open: * `PyTorch Neuron (torch-neuron) samples for Inf1 `_ * `TensorFlow Neuron (tensorflow-neuron) samples for Inf1 `_ ================================================ FILE: about-neuron/quick-start/index.rst ================================================ .. meta:: :description: Get started quickly with AWS Neuron SDK for PyTorch, JAX, and TensorFlow on Inferentia and Trainium :keywords: neuron, quickstart, getting started, pytorch, jax, tensorflow, inferentia, trainium, training, inference :instance-types: inf2, trn1, trn2, trn3 :content-type: navigation-hub :date-modified: 2026-03-03 .. _neuron-quickstart: Get Started with AWS Neuron ============================ Get up and running with AWS Neuron SDK in minutes. These quickstarts guide you through your first training or inference workload on Inferentia and Trainium instances. .. note:: **First time using AWS Neuron?** These quickstarts assume you have: - An active AWS account with EC2 access - Basic familiarity with your chosen ML framework (PyTorch, JAX, or TensorFlow) - SSH access to launch and connect to EC2 instances For detailed installation instructions, see the :doc:`Setup Guide `. Choose Your Path ---------------- Select the quickstart that matches your use case: .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: 🚀 Training Quickstart :link: training-quickstart :link-type: ref :class-card: sd-border-2 Train your first model on Trainium - Launch a Trn1 instance - Run a PyTorch training script - Monitor training progress **Time**: ~15 minutes :bdg-primary:`Trn1` :bdg-primary:`Trn2` :bdg-primary:`Trn3` .. grid-item-card:: 🎯 Inference Quickstart :link: inference-quickstart :link-type: ref :class-card: sd-border-2 Run your first inference on Inferentia - Launch an Inf2 instance - Load a pre-compiled model - Run predictions **Time**: ~10 minutes :bdg-success:`Inf2` :bdg-success:`Trn1` Specialized Quickstarts ----------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: 💬 LLM Serving with vLLM :class-card: sd-border-1 Deploy large language models for production inference - :doc:`Online serving ` (OpenAI-compatible API) - :doc:`Offline batch inference ` **Time**: ~20 minutes :bdg-info:`Inf2` :bdg-info:`Trn1` .. grid-item-card:: 🤖 Amazon AI helper tools :link: amazon-q-dev :link-type: ref :class-card: sd-border-1 Use AI-powered code assistance for Neuron development - Get code suggestions - Debug Neuron applications - Optimize performance **Time**: ~5 minutes Framework-Specific Guides ------------------------- Need framework-specific setup instructions? .. grid:: 1 1 3 3 :gutter: 2 .. grid-item-card:: PyTorch :link: /setup/pytorch/index :link-type: doc :class-card: sd-border-1 :class-body: sphinx-design-class-title-small PyTorch 2.9+ setup .. grid-item-card:: JAX :link: /setup/jax/index :link-type: doc :class-card: sd-border-1 :class-body: sphinx-design-class-title-small JAX 0.7+ setup .. grid-item-card:: TensorFlow :link: /archive/tensorflow/index :link-type: doc :class-card: sd-border-1 :class-body: sphinx-design-class-title-small TensorFlow 2.x setup Additional Resources -------------------- - :doc:`/about-neuron/models/index` - Pre-tested model samples and tutorials - :doc:`/devflows/ec2-flows` - Detailed EC2 deployment workflows - :doc:`/containers/index` - Use Deep Learning Containers - :doc:`docs-quicklinks` - Quick links to all Neuron documentation - :doc:`github-samples` - GitHub sample repositories Legacy Quick-Start Pages (Inf1) -------------------------------- .. warning:: The following pages are for legacy Inf1 instances only. For new projects, use the quickstarts above for Inf2, Trn1, Trn2, or Trn3. - :doc:`torch-neuron` - PyTorch on Inf1 - :doc:`tensorflow-neuron` - TensorFlow on Inf1 - :doc:`mxnet-neuron` - MXNet on Inf1 .. toctree:: :hidden: :maxdepth: 1 training-quickstart inference-quickstart /libraries/nxd-inference/vllm/quickstart-vllm-online-serving /libraries/nxd-inference/vllm/quickstart-vllm-offline-serving /about-neuron/amazonq-getstarted docs-quicklinks github-samples torch-neuron tensorflow-neuron mxnet-neuron ================================================ FILE: about-neuron/quick-start/inference-quickstart.rst ================================================ .. meta:: :description: Run your first inference workload on AWS Inferentia with PyTorch and Neuron SDK :keywords: neuron, inference, quickstart, pytorch, inferentia, inf2, getting started :instance-types: inf2, trn1 :content-type: quickstart :date-modified: 2026-03-03 .. _inference-quickstart: Quickstart: Run Inference on Inferentia ======================================== This quickstart guides you through running your first PyTorch inference workload on AWS Inferentia. You'll launch an Inf2 instance, compile a model for Neuron, and run predictions. When you complete this quickstart, you'll understand the basic workflow for deploying models on Inferentia. **This quickstart is for**: ML engineers and developers deploying inference workloads **Time to complete**: ~10 minutes Prerequisites ------------- Before you begin, ensure you have: - An AWS account with EC2 launch permissions - AWS CLI configured with your credentials - SSH key pair for EC2 access - Basic familiarity with PyTorch - Terminal access (Linux, macOS, or WSL on Windows) Step 1: Launch an Inferentia instance -------------------------------------- In this step, you will launch an Inf2 instance using the AWS Deep Learning AMI. Launch an Inf2.xlarge instance with the latest Deep Learning AMI: .. code-block:: bash aws ec2 run-instances \ --image-id resolve:ssm:/aws/service/deep-learning-base-neuron/ubuntu-22-04/latest \ --instance-type inf2.xlarge \ --key-name YOUR_KEY_NAME \ --security-group-ids YOUR_SECURITY_GROUP \ --subnet-id YOUR_SUBNET_ID .. note:: Replace ``YOUR_KEY_NAME``, ``YOUR_SECURITY_GROUP``, and ``YOUR_SUBNET_ID`` with your values. Alternatively, launch the instance through the `EC2 Console `_. Connect to your instance via SSH: .. code-block:: bash ssh -i YOUR_KEY.pem ubuntu@YOUR_INSTANCE_IP Verify Neuron devices are available: .. code-block:: bash neuron-ls You should see output showing available NeuronCores: .. code-block:: text +--------+--------+--------+---------+ | NEURON | NEURON | NEURON | PCI | | DEVICE | CORES | MEMORY | BDF | +--------+--------+--------+---------+ | 0 | 2 | 32 GB | 00:1e.0 | +--------+--------+--------+---------+ Step 2: Set up your environment -------------------------------- In this step, you will create a Python virtual environment and install PyTorch with Neuron support. Create and activate a virtual environment: .. code-block:: bash python3 -m venv neuron_env source neuron_env/bin/activate Install PyTorch Neuron and dependencies: .. code-block:: bash pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com Verify the installation: .. code-block:: bash python -c "import torch; import torch_neuronx; print(f'PyTorch: {torch.__version__}')" You should see output confirming PyTorch is installed: .. code-block:: text PyTorch: 2.9.0+cpu Step 3: Compile a model for Neuron ----------------------------------- In this step, you will create a simple model and compile it for Neuron inference. Create a file named ``compile_model.py``: .. code-block:: python import torch import torch.nn as nn import torch_neuronx # Simple neural network class SimpleNet(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 128) self.fc2 = nn.Linear(128, 10) self.relu = nn.ReLU() def forward(self, x): x = self.relu(self.fc1(x)) return self.fc2(x) # Create model and set to eval mode model = SimpleNet() model.eval() # Create example input example_input = torch.randn(1, 784) # Trace and compile for Neuron print("Compiling model for Neuron...") neuron_model = torch_neuronx.trace(model, example_input) # Save compiled model neuron_model.save('simple_net_neuron.pt') print("Model compiled and saved to simple_net_neuron.pt") Run the compilation script: .. code-block:: bash python compile_model.py You should see compilation progress and success message: .. code-block:: text Compiling model for Neuron... INFO:Neuron:Compiling function _NeuronGraph$1 with neuronx-cc INFO:Neuron:Compilation successful Model compiled and saved to simple_net_neuron.pt .. note:: Model compilation happens once. The compiled model (``simple_net_neuron.pt``) can be reused for inference without recompiling. Step 4: Run inference ---------------------- In the final step, you will load the compiled model and run predictions. Create a file named ``run_inference.py``: .. code-block:: python import torch import torch_neuronx # Load compiled model print("Loading compiled model...") neuron_model = torch.jit.load('simple_net_neuron.pt') # Create sample input sample_input = torch.randn(1, 784) # Run inference print("Running inference...") with torch.no_grad(): output = neuron_model(sample_input) # Get prediction predicted_class = output.argmax(dim=1).item() print(f"Predicted class: {predicted_class}") print(f"Output logits: {output[0][:5].tolist()}") # Show first 5 logits # Run multiple inferences to measure throughput print("\nRunning 100 inferences...") import time start = time.time() with torch.no_grad(): for _ in range(100): output = neuron_model(sample_input) elapsed = time.time() - start throughput = 100 / elapsed print(f"Throughput: {throughput:.2f} inferences/second") print(f"Latency: {elapsed/100*1000:.2f} ms per inference") Run the inference script: .. code-block:: bash python run_inference.py You should see inference results: .. code-block:: text Loading compiled model... Running inference... Predicted class: 7 Output logits: [0.123, -0.456, 0.789, -0.234, 0.567] Running 100 inferences... Throughput: 245.67 inferences/second Latency: 4.07 ms per inference Monitor Neuron device utilization in another terminal: .. code-block:: bash neuron-top This shows real-time NeuronCore utilization and inference metrics. Confirmation ------------ Congratulations! You've successfully run inference on AWS Inferentia. You should have: - ✅ Launched an Inf2 instance with Neuron SDK - ✅ Installed PyTorch with Neuron support - ✅ Compiled a model for Neuron inference - ✅ Ran predictions and measured throughput - ✅ Monitored inference with Neuron tools If you encountered any issues, see the **Common issues** section below. Common issues ------------- **Issue**: ``ModuleNotFoundError: No module named 'torch_neuronx'`` **Solution**: Ensure you activated the virtual environment and installed packages: .. code-block:: bash source neuron_env/bin/activate pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com **Issue**: ``RuntimeError: No Neuron devices found`` **Solution**: Verify you're on an Inferentia instance and devices are visible: .. code-block:: bash neuron-ls If no devices appear, check instance type and driver installation. **Issue**: Compilation takes a long time **Solution**: Model compilation is a one-time cost. For this simple model, compilation should take 1-2 minutes. Larger models take longer but only need to be compiled once. The compiled model can be saved and reused. **Issue**: Lower throughput than expected **Solution**: This quickstart uses a small model and batch size for demonstration. For production workloads: - Use larger batch sizes (e.g., 4, 8, 16) - Enable dynamic batching - Use multiple NeuronCores in parallel - See :doc:`/frameworks/torch/torch-neuronx/programming-guide/inference/index` for optimization techniques Clean up -------- To avoid ongoing charges, terminate your instance when finished: .. code-block:: bash # From your local machine aws ec2 terminate-instances --instance-ids YOUR_INSTANCE_ID Or use the EC2 Console to terminate the instance. Next steps ---------- Now that you've completed this quickstart, explore more advanced inference topics: - :doc:`/frameworks/torch/torch-neuronx/programming-guide/inference/index` - Comprehensive inference guide - :doc:`/libraries/nxd-inference/index` - Production inference with NeuronX Distributed - :doc:`/libraries/nxd-inference/vllm/quickstart-vllm-online-serving` - Deploy LLMs with vLLM - :doc:`/about-neuron/models/index` - Pre-tested model samples - :doc:`/tools/neuron-explorer/index` - Profile and optimize inference performance Further reading --------------- - :doc:`/setup/pytorch/index` - Detailed PyTorch installation options - :doc:`/devflows/ec2-flows` - EC2 deployment workflows - :doc:`/frameworks/torch/index` - Complete PyTorch Neuron documentation - :doc:`/compiler/index` - Understanding Neuron compilation ================================================ FILE: about-neuron/quick-start/mxnet-neuron.rst ================================================ .. _mxnet_quick_start: Get Started with Apache MXNet Neuron ===================================== This page provide links that will assist you to quickly start with :doc:`MXNet Neuron ` (supporting inference only). .. note:: Below instructions are for Ubuntu20, if you looking for complete setup instructions for different platforms, please :ref:`Check Here. ` .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /setup/install-templates/launch-instance.txt .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 5 :end-line: 6 .. include:: /includes/setup/tab-inference-mxnet-neuron.txt ================================================ FILE: about-neuron/quick-start/tab-inference-tensorflow-neuron.rst ================================================ .. dropdown:: Install TensorFlow Neuron (``tensorflow-neuron``) :class-title: drop-down-class-title-small :class-body: drop-down-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=compiler_framework .. dropdown:: Get Started with Inference (``Inf1``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in :ref:`ResNet-50 ` .. card:: Visit TensorFlow Neuron section for more :class-body: sphinx-design-class-body-small :link: tensorflow-neuron-main :link-type: ref ================================================ FILE: about-neuron/quick-start/tensorflow-neuron.rst ================================================ .. _tensorflow_quick_start: Get Started with TensorFlow Neuron ================================== This page provide links that will assist you to quickly start with :ref:`tensorflow-neuron-main`. .. note:: Below instructions are for Ubuntu20, if you looking for complete setup instructions for different platforms, please :ref:`Check Here. ` .. _tensorflow_quick_start_inference: .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /setup/install-templates/launch-instance.txt .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 5 :end-line: 6 .. tab-set:: .. tab-item:: tensorflow-neuronx (``Trn1, Inf2``) .. include:: /includes/setup/tab-inference-tensorflow-neuronx.txt .. tab-item:: tensorflow-neuron (``Inf1``) .. include:: /includes/setup/tab-inference-tensorflow-neuron.rst ================================================ FILE: about-neuron/quick-start/torch-neuron-tab-training.rst ================================================ .. dropdown:: Launch Trn1 Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /setup/install-templates/launch-instance.txt .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. code:: bash # Configure Linux for Neuron repository updates sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <` .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /setup/install-templates/launch-instance.txt .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 5 :end-line: 6 .. tab-set:: .. tab-item:: torch-neuronx (``Trn1, Inf2``) .. include:: /includes/setup/tab-inference-torch-neuronx.txt .. tab-item:: torch-neuron (``Inf1``) .. include:: /includes/setup/tab-inference-torch-neuron.txt ================================================ FILE: about-neuron/quick-start/training-quickstart.rst ================================================ .. meta:: :description: Train your first model on AWS Trainium with PyTorch and Neuron SDK :keywords: neuron, training, quickstart, pytorch, trainium, trn1, getting started :instance-types: trn1, trn2, trn3 :content-type: quickstart :date-modified: 2026-03-03 .. _training-quickstart: Quickstart: Train a Model on Trainium ====================================== This quickstart guides you through training your first PyTorch model on AWS Trainium. You'll launch a Trn1 instance, install Neuron SDK, and run a simple training script. When you complete this quickstart, you'll understand the basic workflow for training models with Neuron. **This quickstart is for**: ML engineers and data scientists new to AWS Trainium **Time to complete**: ~15 minutes Prerequisites ------------- Before you begin, ensure you have: - An AWS account with EC2 launch permissions - AWS CLI configured with your credentials - SSH key pair for EC2 access - Basic familiarity with PyTorch - Terminal access (Linux, macOS, or WSL on Windows) Step 1: Launch a Trainium instance ----------------------------------- In this step, you will launch a Trn1 instance using the AWS Deep Learning AMI. First, launch a Trn1.2xlarge instance with the latest Deep Learning AMI: .. code-block:: bash aws ec2 run-instances \ --image-id resolve:ssm:/aws/service/deep-learning-base-neuron/ubuntu-22-04/latest \ --instance-type trn1.2xlarge \ --key-name YOUR_KEY_NAME \ --security-group-ids YOUR_SECURITY_GROUP \ --subnet-id YOUR_SUBNET_ID .. note:: Replace ``YOUR_KEY_NAME``, ``YOUR_SECURITY_GROUP``, and ``YOUR_SUBNET_ID`` with your values. Alternatively, launch the instance through the `EC2 Console `_. Once the instance is running, connect via SSH: .. code-block:: bash ssh -i YOUR_KEY.pem ubuntu@YOUR_INSTANCE_IP Verify Neuron devices are available: .. code-block:: bash neuron-ls You should see output showing available NeuronCores: .. code-block:: text +--------+--------+--------+---------+ | NEURON | NEURON | NEURON | PCI | | DEVICE | CORES | MEMORY | BDF | +--------+--------+--------+---------+ | 0 | 2 | 32 GB | 00:1e.0 | | 1 | 2 | 32 GB | 00:1f.0 | +--------+--------+--------+---------+ Step 2: Set up your environment -------------------------------- In this step, you will create a Python virtual environment and install PyTorch with Neuron support. Create and activate a virtual environment: .. code-block:: bash python3 -m venv neuron_env source neuron_env/bin/activate Install PyTorch Neuron and dependencies: .. code-block:: bash pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com Verify the installation: .. code-block:: bash python -c "import torch; import torch_neuronx; print(f'PyTorch: {torch.__version__}')" You should see output confirming PyTorch is installed: .. code-block:: text PyTorch: 2.9.0+cpu Step 3: Create a training script --------------------------------- In this step, you will create a simple PyTorch training script that uses Neuron acceleration. Create a file named ``train_simple.py``: .. code-block:: python import torch import torch.nn as nn import torch.optim as optim import torch_neuronx # Simple neural network class SimpleNet(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(784, 128) self.fc2 = nn.Linear(128, 10) self.relu = nn.ReLU() def forward(self, x): x = self.relu(self.fc1(x)) return self.fc2(x) # Create model and move to Neuron device model = SimpleNet().to('neuron') criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01) # Generate dummy training data batch_size = 32 num_batches = 100 print("Starting training...") model.train() for batch_idx in range(num_batches): # Create dummy batch inputs = torch.randn(batch_size, 784).to('neuron') targets = torch.randint(0, 10, (batch_size,)).to('neuron') # Training step optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() if batch_idx % 10 == 0: print(f"Batch {batch_idx}/{num_batches}, Loss: {loss.item():.4f}") print("Training complete!") This script creates a simple neural network, moves it to the Neuron device, and trains it on synthetic data. Step 4: Run training --------------------- In the final step, you will run the training script and monitor its progress. Execute the training script: .. code-block:: bash python train_simple.py You should see training progress output: .. code-block:: text Starting training... Batch 0/100, Loss: 2.3156 Batch 10/100, Loss: 2.2845 Batch 20/100, Loss: 2.2534 ... Training complete! Monitor Neuron device utilization in another terminal: .. code-block:: bash neuron-top This shows real-time NeuronCore utilization, memory usage, and other metrics. Confirmation ------------ Congratulations! You've successfully trained your first model on AWS Trainium. You should have: - ✅ Launched a Trn1 instance with Neuron SDK - ✅ Installed PyTorch with Neuron support - ✅ Created and ran a training script on Neuron devices - ✅ Monitored training with Neuron tools If you encountered any issues, see the **Common issues** section below. Common issues ------------- **Issue**: ``ModuleNotFoundError: No module named 'torch_neuronx'`` **Solution**: Ensure you activated the virtual environment and installed packages: .. code-block:: bash source neuron_env/bin/activate pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com **Issue**: ``RuntimeError: No Neuron devices found`` **Solution**: Verify you're on a Trainium instance and devices are visible: .. code-block:: bash neuron-ls If no devices appear, check instance type and driver installation. **Issue**: Training is slower than expected **Solution**: This quickstart uses a small model for demonstration. For production workloads: - Use larger batch sizes - Enable XLA compilation with ``torch.compile()`` - See :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide` for optimization techniques Clean up -------- To avoid ongoing charges, terminate your instance when finished: .. code-block:: bash # From your local machine aws ec2 terminate-instances --instance-ids YOUR_INSTANCE_ID Or use the EC2 Console to terminate the instance. Next steps ---------- Now that you've completed this quickstart, explore more advanced training topics: - :doc:`/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide` - Comprehensive training guide - :doc:`/libraries/nxd-training/index` - Distributed training with NeuronX Distributed - :doc:`/about-neuron/models/index` - Pre-tested model samples - :doc:`/tools/neuron-explorer/index` - Profile and optimize training performance Further reading --------------- - :doc:`/setup/pytorch/index` - Detailed PyTorch installation options - :doc:`/devflows/ec2-flows` - EC2 deployment workflows - :doc:`/frameworks/torch/index` - Complete PyTorch Neuron documentation ================================================ FILE: about-neuron/quick-start/user-guide-quickstart.rst ================================================ .. _userguide-quickstart: User Guide Quick Start ====================== * :ref:`setup-guide-index` * :ref:`Neuron Containers ` * :ref:`neuron-devflows` ================================================ FILE: about-neuron/sdk-policy.rst ================================================ .. _sdk-maintenance-policy: .. _neuron-maintenance-policy: Neuron Software Maintenance policy ================================== .. contents:: Table of Contents :local: :depth: 3 Overview -------- This document outlines software maintenance policy for AWS Neuron Software Development Kit (SDK), Neuron Components, both extension and standalone components, supported model classes, features, APIs, DLAMIs and DLCs, and dependency software. AWS Neuron is the SDK for Amazon EC2 `Inferentia `__ and Amazon EC2 `Trainium `__ based instances purpose-built for deep learning. Neuron integrates with popular Machine Learning (ML) frameworks like PyTorch, JAX, and TensorFlow and includes a compiler, runtime, driver, profiling tools, and libraries to support high performance training of generative AI models on Trainium and Inferentia powered instances. This document addresses Neuron Software life-cycle and the Neuron SDK release versioning. .. _neuron-software-definitions: Neuron Software Definitions --------------------------- Neuron Software refers to the complete set of software elements provided by AWS Neuron, including: Neuron SDK ~~~~~~~~~~ The core software development kit that enables users to build, train, and deploy machine learning models on Inferentia and Trainium based instances. The Neuron SDK encompasses the entire set of components, features, APIs, and other elements that are bundled together and made available in a particular version of the Neuron SDK release. Neuron components ~~~~~~~~~~~~~~~~~ Neuron components refer to any packages or libraries within the Neuron SDK that offer specific functionality. These components are typically accessible through PIP, RPM, or Debian packages for easy installation and usage. There are two main categories of Neuron components: Neuron extension components and Neuron standalone components. Neuron extension components ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron extension components are components that integrate Neuron support into open source machine learning frameworks, libraries or tools enhancing their functionality and extending their capabilities as necessary. When referring to Neuron extension components, we are also referring to the parts of the open source machine learning framework or library that are supported by Neuron. The software life-cycle of the open source machine learning frameworks, libraries or tools that are extended by Neuron is managed and maintained by their respective communities or the vendors responsible for those specific components. Examples for Neuron extension components are: - **Third party ML Library**: Examples include Neuron Nemo Megatron. - **Third party ML Framework**: Examples include PyTorch NeuronX and TensorFlow Neuron. Neuron standalone components ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron standalone components are self-contained components within the Neuron SDK. Examples of such components are Neuron Compiler, Neuron Tools and Neuron Runtime. Neuron Model Classes ~~~~~~~~~~~~~~~~~~~~ A Neuron supported model class is tightly coupled with a specific Neuron extension component (e.g. PyTorch NeuronX) or Neuron library (e.g. NeuronX Distributed) and the workload type (e.g. Training or Inference). For example a model can be supported at Beta level in PyTorch NeuronX for training and Stable level in PyTorch NeuronX for inference. Neuron features ~~~~~~~~~~~~~~~ A Neuron feature refers to any functionality or attribute that is part of the Neuron SDK, whether it belongs to the entire Neuron SDK or to one of its specific components. Neuron APIs ~~~~~~~~~~~ A Neuron API refers to any API, CLI, environment variables, or flag that belong to to the entire Neuron SDK or to one the Neuron components. A Neuron API allows developers to interact with and leverage the capabilities of the Neuron SDK and its components. Examples include :ref:`Neuron Trace API ` and :ref:`Neuron Compiler flags ` Dependency software components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ External software components or frameworks that the Neuron SDK and its components rely on for proper functioning and compatibility, such as language runtimes or operating systems. The software life-cycle of the dependency software components, is managed and maintained by their respective communities or the vendors responsible for those specific dependency software components. The following terms are examples of underlying dependency software components: - **Operating System (OS)**: Examples include Ubuntu 22 and Amazon Linux 2023 - **Language Runtime**: Examples include Python 3.10 Neuron Deep Learning AMIs and Deep Learning Containers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :ref:`Neuron Deep Learning AMIs (DLAMIs) ` and :ref:`Neuron Deep Learning Containers (DLCs) ` are pre-configured Amazon Machine Images and Docket container that come with the Neuron SDK and necessary dependencies pre-installed, providing a ready-to-use environment for machine learning development. .. _neuron-software-lifecycle: Neuron Software Life-cycle -------------------------- The typical life-cycle for Neuron software consists of several phases, though not all phases are applicable to every type of Neuron software. The phases are as follows: - **Developer Preview or Beta** (these terms are used interchangeably in Neuron collaterals) - **Release Candidate (RC)** - **General Availability (GA) or Stable** (these terms are used interchangeably in Neuron collaterals) - **Maintenance** - **End-of-Support (EOS)** The following table outlines the details for each phase for Neuron software: +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | | Description | Comments | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | Developer Preview (Beta) | In this phase, Neuron Software is not supported, should not be used in production environments, | | | | and is meant for early access and feedback purposes only. It is possible for future releases | | | | to introduce breaking changes. | | | | See :ref:`Neuron Software Classification ` for more information | | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | Release Candidate (RC) | Once AWS identifies a release to be a stable product, it may be marked as a Release Candidate (RC). | This phase applies only to Neuron SDK | | | This phase is usually short and during it AWS will provide for Neuron Software on an as-needed basis. | and Neuron components | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | General Availability (Stable) | During this phase, AWS releases :ref:`regular ` updates for the Neuron Software based | | | | on a predefined release cadence of the Neuron SDK or provides :ref:`maintenance updates `| | | | for Neuron Software on an as-needed basis. | | | | See :ref:`Neuron Software Classification ` for more information | | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | Maintenance | During the maintenance phase, AWS will provide :ref:`maintenance updates ` | This phase does not apply to Dependency Software | | | for Neuron Software on an as-needed basis. Any new PIP, RPM, and Debian packages for the Neuron | Components, Neuron DLCs, | | | Software, as well as updated versions of the Neuron DLAMIs and Neuron DLCs, will be released | Neuron DLAMIs, Neuron Features and APIs | | | only when deemed necessary by the AWS Neuron team. | | | | Users can expect updates to be less frequent compared to :ref:`regular ` | | | | as the focus will be on addressing critical issues and ensuring the stability of the software. | | | | | | | | Maintenance Announcement: AWS will make a public :ref:`announcement ` at least one month | | | | before the Neuron Software enters Maintenance phase. | | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ | End of Support (EOS) | When Neuron Software reaches the end of its support lifecycle, it will no longer receive | | | | :ref:`regular ` updates and :ref:`maintenance updates ` | | | | (including security updates). While AWS will continue to provide access to all previously released | | | | PIP, RPM, and Debian packages for the Neuron Software, as well as earlier versions of the Neuron DLAMIs | | | | and Neuron DLCs, it's important to note that these older versions will not receive any updates or support. | | | | Customers can still use these resources at their own discretion, but it is highly recommended to upgrade | | | | to the latest available versions | | | | | | | | End of Support Announcement: AWS will make a public :ref:`announcement ` at least one month | | | | before a Neuron Software enters End of Support. | | +-------------------------------+----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------+ .. _neuron-regular-updates: Neuron Software Regular Updates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regular updates for Neuron Software address the following areas: new features, feature improvements, performance enhancements, bug resolution, security vulnerability fixes, upgrades to Neuron dependency software components and upgrades to Neuron extension components. To handle these regular updates, AWS will release a new version of the Neuron SDK, incrementing the minor version (the second digit in the version number) for a minor release or incrementing the major version (the first digit in the version number) for a major release when significant changes that break compatibility are introduced. It's important to note that any bug-fixes or security issues in regular updates are not applied retroactively to previous versions of the Neuron SDK. To benefit from these updates, users must adopt the latest release. For more information see: - :ref:`Neuron DLAMIs and DLCs Updates ` - :ref:`Neuron Extension Components Updates ` - :ref:`Neuron Software Versioning ` **Neuron SDK Installation and Update instructions** To install and update to the latest Neuron packages, customers need to pin the major version of the Neuron package. For example, to install latest Neuron tools package, call ``sudo apt-get install aws-neuronx-tools=2.*`` and to install latest PyTorch Neuron package for Trn1, call ``pip install torch-neuronx==2.1.0.1.*``. This is done to future-proof instructions for new, backwards-incompatible major version releases. .. _neuron-maintenance-updates: Neuron Software Maintenance Updates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Maintenance updates for Neuron Software address three key areas: resolving bugs, fixing security vulnerabilities, and upgrading dependency software components. At AWS discretion, additional critical features or performance enhancement may also be included. To handle these maintenance updates, AWS will release a new version of the Neuron SDK, incrementing the patch number (the last digit in the version number) to indicate a patch release. Major or minor releases may also contain maintenance updates. It's important to note that these maintenance updates are not applied retroactively to previous versions of the Neuron SDK. To take advantage of these updates, users must adopt the latest patch release. For more information see: - :ref:`Neuron DLAMIs and DLCs Updates ` - :ref:`Neuron Extension Components Updates ` - :ref:`Neuron Software Versioning ` .. _neuron-dlami-dlc-updates: Neuron DLAMIs and DLCs Updates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ AWS will address :ref:`regular ` updates, life-cycle changes, maintenance updates, and security issues related to any third-party software included in the Neuron DLAMI or DLCs by releasing new versions of the Neuron DLAMI or DLCs. However, updates won't be applied retroactively to older versions of the Neuron DLAMI or DLCs. Instead, users will need to use the new versions to get the latest updates. Generally, Neuron DLAMIs and Deep Learning Containers (DLCs) will support one latest LTS Linux Distribution version (Ubuntu, Amazon Linux, and Rocky9), with exceptions. Neuron Base DLAMIs (which come pre-installed with Neuron driver, EFA, and Neuron tools) will support the two latest versions of LTS Linux Distributions. For more information see: - :ref:`Neuron Extension Components Updates ` - :ref:`Neuron Software Versioning ` .. _neuron-extension-components-updates: Neuron Extension Components Updates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When a new version of an open source ML framework (e.g. PyTorch) is supported by a Neuron extension component (e.g., PyTorch NeuronX), the Neuron extension component for the latest supported ML framework version will become the default for installation. If users wish to use a Neuron extension component for an earlier supported ML framework version, they will need to explicitly specify the desired version during installation. After upgrading a Neuron extension component to support a newer version of an ML framework, AWS will continue to provide :ref:`regular updates ` for the Neuron extension component that supports the earlier ML framework version for a minimum of 6 months. After the 6 months period, the Neuron extension component for the earlier supported ML framework version may transition into a maintenance mode. In the maintenance mode, updates for the older Neuron extension component versions will be provided on an as-needed basis, focusing on critical bug fixes and security patches. For more information see: :ref:`Neuron extension component versioning ` .. _neuron-communication: Communication methods ~~~~~~~~~~~~~~~~~~~~~ Neuron software classification and lifecycle announcements are communicated as follows: - Neuron SDK documentation under `Announcements `__ To see the list of available Neuron SDK versions and supported dependency software components versions: - Neuron SDK documentation under `Release Content `__ - Neuron SDK documentation under `What’s New `__ .. _neuron-software-versioning: Neuron Software Versioning -------------------------- Neuron SDK Documentation Versioning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Neuron SDK documentation is versioned and maps to the corresponding Neuron SDK version. Users can switch to earlier versions of the Neuron SDK documentation by selecting the version from the dropdown in bottom left portion of the side bar. Neuron SDK Versioning ~~~~~~~~~~~~~~~~~~~~~ The AWS SDK release versions are in the form of ``[A.B.C]`` where ``(A)`` represents the major version, ``(B)`` represents the minor version, and ``(C)`` represents the patch version. .. _neuron-extension-components-versioning: Neuron extension components Versioning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Neuron extension components versioning (like PyTorch NeuronX) is in the form ``[X.Y.Z].[A.B.C]``, where ``[X.Y.Z]`` represents the third party component’s major (``X``), minor (``Y``), and patch (``Z``) versions and ``[A.B.C]`` represents the Neuron extension components (``A``), minor (``B``), and patch (``C``) versions. Neuron Standalone Component Versioning ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Neuron Component versioning (except of Neuron extension components like PyTorch NeuronX) is in the form ``[A.B.C.D]``, where ``A`` represents the major version, ``B`` represents the minor version, and ``C.D`` represents the patch version. .. _neuron-releases-types: Neuron Software Release Types ----------------------------- Major release ~~~~~~~~~~~~~~~~~ Increasing the major version indicates that the Neuron software underwent significant and substantial changes in an incompatible manner. Applications need to be updated in order for them to work with the newest SDK version. It is important to update major versions carefully and in accordance with the upgrade guidelines provided by AWS. After increasing the major version, the Neuron software may not maintain compatibility with previous supported versions of :ref:`Neuron Runtime `, :ref:`Neuron Compiler `, and :ref:`NEFF `. Minor release ~~~~~~~~~~~~~~~~~ Increasing the minor version indicates that the Neuron software added functionality in a backwards compatible manner. Patch release ~~~~~~~~~~~~~~~~~ Increasing the patch version indicates that the Neuron software added backward compatible bug or security fixes. A bug fix is defined as an internal change that fixes incorrect behavior. Pre-releases ~~~~~~~~~~~~~~~~ - **Developer Preview (Beta)**: During this phase, the Neuron software is not supported, should not be used in production environments, and is meant for early access and feedback purposes only. It is possible for future releases to introduce breaking changes. In the case of a Developer Preview (Beta) release, the minor version will include a lower case ``b`` along with a (Beta) tag. - **Release Candidate (RC)**: Once Neuron identifies a release to be a stable product, it may mark it as a Release Candidate. Release Candidates are ready for GA release unless significant bugs emerge, and will receive full AWS Neuron support. In the case of a RC release, the minor version will include a lower case ``rc`` along with a (RC) tag. .. _sdk-classification: Neuron Software Classification ------------------------------ This section explains the Neuron software classification for APIs, libraries, packages, features, and Neuron supported model classes mentioned in the Neuron documentation. Neuron SDK and Neuron components ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-----------------+-----------------+-----------------+-------------+ | | Testing | Features | Performance | +=================+=================+=================+=============+ | Developer | Basic | Minimal Viable | | | Preview (Beta) | | Product (MVP) \*| | +-----------------+-----------------+-----------------+-------------+ | Release | Basic | Minimal Viable | Tested | | Candidate (RC) | | Product (MVP)\* | | +-----------------+-----------------+-----------------+-------------+ | GA (Stable) | Standard | Incremental | Tested | | | Product Testing | additions or | | | | | changes | | | | | in new releases | | +-----------------+-----------------+-----------------+-------------+ \* A minimum viable product (MVP) for a Neuron Component contains just enough features to be usable by early customers who can then provide feedback for future development. MVP can be different per use case and depends on the specific package/library of interest. Please note that in many cases, an MVP can also represent an advanced level of features. .. _neuron-apis-classification: Neuron APIs ~~~~~~~~~~~ +----------------------+----------------------+----------------------+ | | API Contract | API Backward | | | | Compatibility | +======================+======================+======================+ | Alpha | Unstable and | No | | | undocumented | | +----------------------+----------------------+----------------------+ | Developer Preview | Major changes may | No | | (Beta) | happen | | +----------------------+----------------------+----------------------+ | GA (Stable) | Incremental changes | Yes \* | | | in new releases | | | | (without breaking | | | | the API contract) | | +----------------------+----------------------+----------------------+ \* In certain cases, when necessary, AWS may introduce API changes that may break compatibility, with notice provided ahead of time. .. _neuron-features-classification: Neuron Features ~~~~~~~~~~~~~~~ +-----------------+-----------------+------------------------+-------------+ | | Testing | Functionality | Performance | +=================+=================+========================+=============+ | | No formal | Partial funcitonality | Not tested | | Alpha | testing done | with limited set of | or | | | | core capabilities, | evaluated | | | | far from Minium Viable | | | | | Product (MVP) \* | | +-----------------+-----------------+------------------------+-------------+ | Developer | Basic | Minimum Viable | | | Preview (Beta) | | Product (MVP) \* | | +-----------------+-----------------+------------------------+-------------+ | GA (Stable) | Standard | Incremental | Tested | | | Product Testing | additions or changes | | | | | in new releases | | +-----------------+-----------------+------------------------+-------------+ \* A minimum viable product (MVP) for a Neuron Feature contains just enough functionality to be usable by early customers who can then provide feedback for future development. MVP can be different per use case and depends on the specific feature of interest. Please note that in many cases, an MVP can also represent an advanced level of functionality. .. _neuron-models-classification: Neuron Supported Model Classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +----------------------+----------------------+----------------------+ | | Accuracy / | Throughput / Latency | | | Convergence | | +======================+======================+======================+ | Developer Preview | Validated | Tested | | (Beta) | | | +----------------------+----------------------+----------------------+ | GA (Stable) | Validated | Tested | +----------------------+----------------------+----------------------+ ================================================ FILE: about-neuron/security.rst ================================================ .. meta:: :description: Security disclosures and notification for the AWS Neuron SDK. :date-modified: 01/27/2026 .. _security: Neuron Security Disclosures =========================== If you think you've found a potential security issue, please do not post it in the Issues. Instead, please follow the instructions here (https://aws.amazon.com/security/vulnerability-reporting/) or email AWS security directly (`mailto:aws-security@amazon.com `__). Important Security Information for Trainium Hardware ----------------------------------------------------- Trainium hardware is designed to optimize performance for machine learning workloads. To deliver high performance, applications with access to Trainium devices have unrestricted access to instance physical memory. What this means for your deployment: * Instance-level isolation is maintained: AWS EC2 ensures Trainium devices cannot access physical memory of other EC2 instances. * As a best practice to prevent unrestricted access to host physical memory by any user/application, we recommend implementing a permission model where: * A dedicated system group owns the device nodes * Only explicitly authorized users are added to this group * Device permissions prevent access by users outside the group Customer responsibility: Ensure that only trusted applications have access to Tranium devices on Trainium instances. For more information, see `the AWS Shared Responsibility Model `__. Example Implementation Steps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The steps below are an example you can follow to implement a security group using udev rules: 1. Create a dedicated security group (in this example, ``neuron``): ``sudo groupadd -r neuron`` 2. Add authorized users to that security group: ``sudo usermod -aG neuron {username-to-add-here}``, repeat for each user 3. Configure udev rules. Create a udev rule to automatically set correct ownership and permissions when Trainium (neuron) devices are detected. Create the file ``/etc/udev/rules.d/neuron-udev.rules`` with the following content: .. code-block:: shell # Neuron device access control # Only members of the 'neuron' group can access 'neuron' devices. SUBSYSTEM=="neuron*", KERNEL=="neuron*", GROUP="neuron", MODE="0660" 4. Apply the configuration: ``sudo udevadm control —-reload`` ``sudo udevadm trigger —-subsystem-match=neuron`` 5. Verify the configuration: ``ls -l /dev/neuron*`` Expected output: ``crw-rw---- 1 root neuron 239, 0 Jan 9 15:58 /dev/neuron0`` ================================================ FILE: about-neuron/troubleshooting.rst ================================================ .. _general-troubleshooting: Troubleshooting Guide ===================== .. contents:: Table of contents :local: :depth: 1 Training Only Troubleshooting ----------------------------- * :ref:`PyTorch Neuron for Training ` Inference Only Troubleshooting ------------------------------ * :ref:`PyTorch Neuron for Inference ` * :ref:`NeuronPerf ` * :ref:`MXNet Neuron ` Runtime Troubleshooting ------------------------------ * :ref:`Neuron Runtime Troubleshooting on Inf1 and Trn1 ` Containers Troubleshooting -------------------------- * :ref:`Containers ` Setup Troubleshooting --------------------- * :ref:`neuron-setup-troubleshooting` ================================================ FILE: about-neuron/what-is-neuron.rst ================================================ .. _what-is-neuron: .. meta:: :description: AWS Neuron is a software development kit for high-performance machine learning on AWS Inferentia and Trainium, enabling developers to compile, optimize, and deploy deep learning models at scale. What is AWS Neuron? =================== AWS Neuron is the software stack for running deep learning and generative AI workloads on AWS Trainium and AWS Inferentia. Built on an open source foundation, Neuron enables developers to build, deploy and explore natively with PyTorch and JAX frameworks and with ML libraries such as Hugging Face, vLLM, PyTorch Lightning, and others without modifying your code. It includes a compiler, runtime, training and inference libraries, and developer tools for monitoring, profiling, and debugging. Neuron supports your end-to-end machine learning (ML) development lifecycle from building and deploying deep learning and AI models, optimizing to achieve highest performance and lowest cost, and getting deeper insights into model behavior. Neuron enables rapid experimentation, production scale training of frontier models, low level performance optimization through the Neuron Kernel Interface (NKI) for custom kernels, cost optimized inference deployment for agentic AI and reinforcement learning workloads, and comprehensive profiling and debugging with Neuron Explorer. For more details, see the detailed documentation under :ref:`About the AWS Neuron SDK `. Who is AWS Neuron for? ----------------------- * **ML engineers** can use Neuron's vLLM integration to migrate their models to Trainium for improved performance and without code modifications. They can * **Performance engineers** can use NKI and our Developer Tools to create new ML kernels and optimize existing ones. * **ML researchers** can use their existing PyTorch experience and ecosystem tools to experiment freely on Trainium using our native PyTorch implementatio, without having to learn new frameworks or APIs What is AWS Neuron used for? ----------------------------- **Research and Development**: Neuron provides native PyTorch execution on Trainium with full Eager mode compatibility. The stack supports standard distributed training patterns including FSDP, DDP, and DTensor for model sharding across devices and nodes. torch.compile integration enables graph optimization, while existing frameworks like TorchTitan and HuggingFace Transformers run without code modifications. JAX support includes XLA compilation targeting Inferentia and Trainium hardware. **Production Inference**: Neuron implements vLLM V1 API compatibility on Trainium and Inferentia with optimizations for large-scale inference workloads. The runtime supports Expert Parallelism for MoE models, disaggregated inference architectures, and speculative decoding. Optimized kernels from the NKI Library provide hardware-specific implementations. Training workflows integrate with HuggingFace Optimum Neuron, PyTorch Lightning, and TorchTitan, with seamless deployment through standard vLLM interfaces. **Performance Engineering**: Neuron Kernel Interface (NKI) provides direct access to Trainium instruction set architecture with APIs for memory management, execution scheduling, and low-level kernel development. The NKI Compiler, built on MLIR, offers full visibility into the compilation pipeline from high-level operations to hardware instructions. The NKI Library contains optimized kernel implementations with source code and performance benchmarks. Neuron Explorer enables comprehensive profiling from application code to hardware execution, supporting both single-node and distributed workload analysis with detailed performance metrics and optimization recommendations. AWS Neuron Core Components ---------------------------- **vLLM** Neuron enables production inference deployment with standard frameworks and APIs on Trainium and Inferentia. Use Neuron's vLLM integration with standard APIs to deliver high-performance model serving with optimized kernels from the NKI Library. It provides: * **Standard vLLM APIs**: Full compatibility with vLLM V1 APIs, enabling customers to use familiar vLLM interfaces on Neuron hardware without code changes * **Advanced Inference Features**: Support for Expert Parallelism for MoE models, disaggregated inference for flexible deployment architectures, and speculative decoding for improved latency * **Optimized Performance**: Pre-optimized kernels from the NKI Library for peak performance across dense, MoE, and multimodal models * **Open Source**: Source code released under the vLLM project organization with source code on GitHub, enabling community contributions **Native PyTorch** Neuron provides native integration with PyTorch, enabling researchers and ML developers to run existing code unchanged on Trainium. Train models with familiar workflows and tools, from pre-training to post-training with reinforcement learning, while leveraging Trainium's performance and cost advantages for both experimentation and production scale training. It provides: * **Native Device Support**: Neuron registers as a native device type in PyTorch with standard device APIs like ``torch.tensor([1,2,3], device='neuron')`` and ``.to('neuron')`` * **Standard Distributed Training APIs**: Support for FSDP, DTensor, DDP, tensor parallelism, context parallelism, and distributed checkpointing * **Eager Mode Execution**: Immediate operation execution for interactive development and debugging in notebook environments * **torch.compile Integration**: Support for ``torch.compile`` for optimized performance * **Open Source**: Released as an open source package on GitHub under Apache 2.0, enabling community contributions. **Neuron Kernel Interface (NKI)** For performance engineers seeking maximum hardware efficiency, Neuron provides complete control through the Neuron Kernel Interface (NKI), with direct access to the NeuronISA (NISA) instruction set, memory allocation, and execution scheduling. Developers can create new operations not available in standard frameworks and optimize performance critical code with custom kernels. It includes: * The NKI Compiler, built on MLIR, which provides greater transparency into the kernel compilation process * The NKI Library , which provides pre-built kernels you can use to optimize the performance of your models **Neuron Tools** Debug and profiling utilities including: * Neuron Monitor for real-time performance monitoring * Neuron Explorer, built on the Neuron Profiler (``neuron-profile``), for detailed performance analysis Neuron Explorer provides: * **Hierarchical Profiling**: Top-down visualization from framework layers through HLO operators to hardware instructions, enabling developers to understand execution at any level of the stack * **Code Linking**: Direct navigation between PyTorch, JAX, and NKI source code and performance timeline with automatic annotations showing metrics for specific code lines * **IDE Integration**: VSCode extension for profile visualization and analysis directly within the development environment * **Device Profiling**: Unified interface for comprehensive view of system-wide metrics and device-specific execution details **Neuron Compiler** Optimizes machine learning models for AWS Inferentia and Trainium chips, converting models from popular frameworks into efficient executable formats. **Neuron Runtime** Manages model execution on Neuron devices, handling memory allocation, scheduling, and inter-chip communication for maximum throughput. **AWS DLAMIs and DLCs** Orchestrate and deploy your models using Deep Learning AWS Machine Images (DLAMIs) and Deep Learning Containers (DLCs). Neuron DLAMIs come pre-configured with the Neuron SDK, popular frameworks, and helpful libraries, allowing you to quickly begin training and running inference on AWS Inferentia. Or, quickly deploy models using pre-configured AWS Neuron Deep Learning Containers (Neuron DLCs) with optimized frameworks for AWS Trainium and Inferentia. Supported Hardware ------------------ **AWS Inferentia** Purpose-built for high-performance inference workloads: * ``Inf1`` instances - First-generation Inferentia chips * ``Inf2`` instances - Second-generation with improved performance and efficiency **AWS Trainium** Designed for distributed training of large models: * ``Trn1`` instances - High-performance training acceleration * ``Trn1n`` instances - Enhanced networking for large-scale distributed training * ``Trn2`` instances - Next-generation Trainium with superior performance * ``Trn2`` UltraServer - High-density Trainium servers for massive training workloads * ``Trn3`` UltraServer -- The next generation of Trainium servers for massive training workloads How do I get more information? ------------------------------ * Review the comprehensive documentation and follow the tutorials on this site * Check the Neuron GitHub repositories for code examples. GitHub repos include: * `Neuron SDK code samples `_ * `Neuron NKI ML kernel samples `_ * `Neuron container confirguations `_ * `Helm charts for Kubernetes deployment `_ * `NeuronX Distributed Core library sources `_ * `NeuronX Distributed Training library sources `_ * `NeuronX Distributed Inference library sources `_ * `Linux kernel driver sources `_ * `Neuron workshop model samples `_ * Visit the `AWS Neuron support forum `_ for community assistance ================================================ FILE: about-neuron/whats-new.rst ================================================ .. _main_whats-new: .. meta:: :description: Blog posts for the latest features and updates for the AWS Neuron SDK :date-modified: 03/13/2026 What's New in the AWS Neuron SDK ================================ .. toctree:: :hidden: :maxdepth: 1 Release Notes *Explore detailed posts about the latest releases, updates, and upcoming changes to the AWS Neuron SDK.* .. grid:: 1 :gutter: 2 .. grid-item-card:: Neuron Release Notes :link: /release-notes/index :link-type: doc :class-header: sd-bg-primary sd-text-white **Latest release**: 2.29.0 (04/09/2026) ---- .. _whats-new-2026-04-02-v2_29: AWS Neuron SDK 2.29.0: NKI Exits Beta, CPU Simulator, and Expanded NKI Library ------------------------------------------------------------------------------- **Posted on**: April 09, 2026 Today we are releasing AWS Neuron SDK 2.29.0. This release brings NKI 0.3.0 out of Beta into Stable, featuring the new NKI Standard Library and an experimental CPU Simulator for local kernel development without Trainium hardware. The NKI Library adds 7 new experimental kernels including Conv1D, a Transformer TKG megakernel, and fused communication-compute primitives, along with improvements to existing attention, MLP, and MoE kernels. NxD Inference delivers performance gains for Qwen2 VL, Qwen3 VL, and Flux.1 models. Neuron Runtime introduces new APIs for collective stream management and network proxy tuning. Neuron Explorer is now out of Beta and Stable, with full Device widget support in the System Trace Viewer and availability on the VS Code Extension Marketplace. The Neuron Driver adds support for new Trn3 Gen2 Ultraserver configurations. Neuron Kernel Interface (NKI) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AWS Neuron SDK 2.29.0 introduces NKI 0.3.0, the latest update to the Neuron Kernel Interface. NKI 0.3.0 is now out of Beta and Stable. It features the NKI Standard Library (``nki-stdlib``), which provides developer-visible code for all NKI APIs and native language objects (such as ``NkiTensor``). This release provides new exposed Trainium capabilities and features in the NKI API and introduces ``nki.language`` APIs. **NKI CPU Simulator (Experimental)**: NKI 0.3.0 includes a CPU Simulator, which executes NKI kernels entirely on CPU and allows for a fast development cycle on inexpensive CPUs and compute instances to validate kernel correctness, using standard Python step-by-step debugging tools and instrumentation to print results for every line of kernel code. Activate it with ``NKI_SIMULATOR=1`` or use ``nki.simulate(kernel)``. **New Language APIs (Experimental)**: Introduced ``nki.language`` high-level convenience wrappers including ``nl.load``, ``nl.store``, ``nl.copy``, ``nl.matmul``, ``nl.transpose``, and ``nl.softmax``. **New ISA and Hardware Features**: Added the ability to set DMA priority of DMA operations and collectives operations for Trn3 (NeuronCore-v4). A dedicated ``nki.isa.exponential`` instruction is optimized for vectorising exponents (``exp``) with VectorE. Matmul accumulation control is added via the ``accumulate`` parameter on ``nc_matmul`` and ``nc_matmul_mx``. Variable-length all-to-all collectives are now available via ``nki.collectives.all_to_all_v``. **Breaking Changes**: NKI 0.3.0 includes several API breaking changes that improve correctness and consistency. All kernels must be updated to NKI 0.3.0; mixing with Beta 2 kernels in the same model is not supported. For the full list of changes and migration examples, see the :doc:`NKI 0.3.0 Update Guide `. For more details, see :ref:`nki-2-29-0-rn`. NKI Library ^^^^^^^^^^^ **New Experimental Kernels (7 added)**: Conv1D provides 1D convolution with stride, padding, dilation, bias, activation fusion, and LNC sharding. Transformer TKG is a multi-layer transformer forward pass megakernel for token generation. Fine-Grained All-Gather and FGCC (All-Gather + Matmul) enable ring-based communication with compute overlap on Trn2. SBUF-to-SBUF All-Gather provides two variants for small and large tensors. Top-K Reduce supports MoE output gathering with LNC sharding. Dynamic Elementwise Add handles runtime-variable M-dimension tiling. The ``find_nonzero_indices`` subkernel is promoted from experimental to core. **Key Improvements to Existing Kernels**: Attention CTE increases max batch size from 32 to 512 and max sequence length from 36,864 to 131,072 with sequence packing support. Attention Block TKG adds fused QK-norm before RoPE and KVDP attention sharding. MLP adds BufferManager support and MXFP4/MXFP8 quantization paths. MoE TKG introduces a dynamic all-expert algorithm with ``block_size``. QKV adds flexible weight layout support. PyTorch reference implementations are added for 22 kernels. **Breaking Changes**: Multiple kernel signatures have changed with new parameters inserted mid-signature; callers using positional arguments must switch to keyword arguments. ``SbufManager`` is renamed to ``BufferManager``. MoE TKG replaces boolean sharding flags with ``LNCShardingStrategy`` enum. For the full list of breaking changes, see :ref:`nki-lib-2-29-0-rn`. For more details, see :ref:`nki-lib-2-29-0-rn`. Inference Updates ^^^^^^^^^^^^^^^^^ **NxD Inference 0.9.17155**: Qwen2 VL gains vision data parallelism with 7% QPS improvement for image-heavy workloads. Qwen3 VL adds text-model sequence parallelism with 2.2x QPS throughput improvement. Flux.1 adds CFG parallelism with 19% end-to-end latency improvement and 23% instance throughput improvement. **vLLM Neuron Plugin 0.5.0**: Updated alongside NxD Inference with model performance improvements. **Hardware Support Change**: NxD Inference no longer supports Trn1/Inf2. Only Trn2 and newer hardware is supported. Pin to Neuron SDK 2.28 for Trn1/Inf2 support. For more details, see :ref:`nxd-inference-2-29-0-rn`. Runtime and Driver ^^^^^^^^^^^^^^^^^^ **Neuron Runtime Library 2.31**: New ``nrt_cc_create_stream`` API creates a collective stream to be used by host-initiated collectives, replacing the previous environment variable approach. New ``nrt_get_attached_efa_bdf`` API returns the BDF string of the EFA device for optimal network interface selection. New environment variables ``NEURON_RT_ONE_THREAD_PER_CORE`` (up to 2x improvement in collective communication latency) and ``NEURON_RT_RANKS_PER_NETWORK_PROXY`` provide fine-grained control over network proxy threading. RDMA support extends to Trn3. Collectives XU gains profiling support, context caching with up to 90% performance improvement, and removal of the 512 queue set instance limit. The async API version is bumped from 2.x to 3.0; applications using the async API must be recompiled. **Neuron Driver 2.27**: Adds support for new Trn3 Gen2 Ultraserver configurations: US3 (2-node), US4 (4-node), US16 (4-node), and US18 (4-node). Top-level DMA reset support is added during TPB reset on Trn3 and later platforms. **Neuron Collectives 2.31**: EFA device processing is restructured to per-stream granularity for improved stability. Fixed incorrect interface selection in multi-ultraserver collectives and crash on channel initialization failures. For more details, see :ref:`runtime-2-29-0-rn`. Neuron Explorer ^^^^^^^^^^^^^^^ Neuron Explorer is now out of Beta and Stable. The System Trace Viewer now supports the full suite of Device widgets, enabling multi-device profile analysis across all linked Device Profiles within a single System Profile. The Summary Viewer includes system-level profile data for both system and device profiles. New System Timeline HBM Usage shows device HBM usage with memory allocation breakdown by category. Box Selection Summary enables viewing aggregated device profile information for a selected region in the trace viewer. Neuron Explorer for VS Code is now available on the Visual Studio Code Extension Marketplace and Open VSX, enabling simpler installation and automatic updates. For more details, see :ref:`dev-tools-2-29-0-rn`. PyTorch Framework ^^^^^^^^^^^^^^^^^ PyTorch 2.7 and 2.8 have reached end of support starting with this release. Use PyTorch 2.9 on Ubuntu 24.04. Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. For more details, see :ref:`pytorch-2-29-0-rn`. End of Support and Migration Notices ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Effective this release:** * PyTorch 2.7 and 2.8 have reached end of support. Pin to Neuron SDK 2.28 if required. * NeuronX Distributed Training (NxDT) and NxD Core training APIs reach end of support; DLCs and DLAMI virtual environments pinned to SDK 2.28.0. * ``neuron-profile analyze`` subcommand is no longer supported. Migrate to Neuron Explorer. * Ubuntu 22.04 Multi-Framework DLAMI is no longer published. Use Ubuntu 24.04. **Hardware support:** * NxD Inference no longer supports Trn1/Inf2. Pin to Neuron SDK 2.28 for continued support. **NKI namespace migration:** * Removal of ``neuronxcc.nki.*`` namespace postponed to a future release. Both ``neuronxcc.nki.*`` and ``nki.*`` namespaces continue to work. Migration to ``nki.*`` is encouraged. **Effective with PyTorch 2.10 support:** * PyTorch/XLA will be replaced by TorchNeuron. * Read the :doc:`Neuron 2.29.0 component release notes ` for specific Neuron component improvements and details. ---- .. _whats-new-2026-03-13-v2_28_1: AWS Neuron SDK 2.28.1 Patch Available -------------------------------------- **Posted on**: March 13, 2026 AWS Neuron provides a patch version, 2.28.1, to address a Neuron Driver compatibility issue with Linux kernel 6.18. .. _whats-new-2026-02-26-v2_28: AWS Neuron SDK 2.28.0: Enhanced Profiling, Vision Language Models, and Expanded NKI Capabilities -------------------------------------------------------------------------------------------------- **Posted on**: February 26, 2026 Today we are releasing AWS Neuron SDK 2.28.0. This release enhances Neuron Explorer with system profiling, Tensor Viewer, and Database Viewer for comprehensive performance analysis. NxD Inference adds support for Qwen2/Qwen3 VL vision language models, Flux.1 inpainting capabilities, and Eagle3 speculative decoding. The NKI Library expands with 9 new kernels including RoPE, MoE operations, and experimental kernels for attention and cross entropy. NKI (Beta 2) introduces LNC multi-core support with intra-LNC collectives and new APIs. Kubernetes users gain Neuron DRA Driver support for advanced resource allocation. Developer Tools and Profiling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Neuron Explorer Enhancements** - Added system profiling support with drill-down navigation to device profiles. New Tensor Viewer helps identify memory bottlenecks by displaying tensor names, shapes, sizes, and memory usage. Database Viewer provides an interactive interface for querying profiling data using SQL or natural language. Profile Manager now supports tag-based organization and search. A migration guide from Neuron Profiler/Profiler 2.0 is now available. **nccom-test Improvements** - Enhanced data integrity checks use pseudo-random data patterns for better corruption detection. Added support for ``alltoallv`` collective operation for benchmarking variable-sized all-to-all communication patterns. For more details, see :ref:`dev-tools-2-28-0-rn`. Inference Updates ^^^^^^^^^^^^^^^^^ **NxD Inference 0.8.16251** - Added support for vision language models including Qwen2 VL (Qwen2-VL-7B-Instruct) and Qwen3 VL (Qwen3-VL-8B-Thinking) for processing text and image inputs (Beta). Pixtral model support improved with batch size 32 and sequence length 10240 on Trn2 with vLLM V1. Flux.1 model gains new functionality for in-paint, out-paint, canny edge detection, and depth-based image generation (Beta). **vLLM Neuron Plugin 0.4.1** - Multi-LoRA serving enhancements enable streaming LoRA adapters via vLLM's ``load_adapter`` API with dynamic runtime loading. Users can now run the base model alone when multi-LoRA serving is enabled. Added Eagle3 speculative decoding support for Llama 3.1 8B. Updated to support vLLM v0.13.0 and PyTorch 2.9. For more details, see :ref:`nxd-inference-2-28-0-rn`. NKI Library ^^^^^^^^^^^ **9 New Kernels** - The NKI Library expands from 7 to 16 documented kernel APIs. New core kernels include RoPE (Rotary Position Embedding), Router Top-K (expert selection for MoE), MoE CTE (Context Encoding), MoE TKG (Token Generation), and Cumsum. New experimental kernels include Attention Block TKG (fused attention for token generation), Cross Entropy (forward and backward passes), Depthwise Conv1D, and Blockwise MM Backward (for MoE training). **Enhanced Quantization Support** - Existing kernels receive FP8 and MX quantization support across QKV, MLP, and Output Projection kernels. QKV kernel adds fused FP8 KV cache quantization and block-based KV cache layout. MLP kernel adds gate/up projection clamping and fp16 support for TKG mode. Attention CTE kernel adds strided Q slicing for context parallelism. **Improved Utilities** - TensorView gains ``rearrange`` method for dimension reordering and ``has_dynamic_access`` for runtime-dependent addressing checks. SbufManager provides hierarchical tree-formatted allocation logging with new query methods for SBUF utilization. New utilities include ``rmsnorm_mx_quantize_tkg``, ``interleave_copy``, ``LncSubscriptable``, and ``TreeLogger``. For more details, see :ref:`nki-lib-2-28-0-rn`. Neuron Kernel Interface (NKI) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **NKI Beta 2 (0.2.0)** - This release includes LNC multi-core support for LNC=2, enabling kernels to leverage multiple NeuronCores within a logical NeuronCore. The compiler now tracks ``shared_hbm`` tensors and canonicalizes LNC kernel outputs. Users can declare tensors private to a single NeuronCore using ``private_hbm`` memory type. **New nki.collectives Module** - Enables collective communication across multiple NeuronCores with operations including ``all_reduce``, ``all_gather``, ``reduce_scatter``, ``all_to_all``, ``collective_permute`` variants, and ``rank_id``. **New APIs and Features** - New ``nki.isa`` APIs include ``nonzero_with_count`` for sparse computation and ``exponential`` for element-wise operations. New ``float8_e4m3fn`` dtype supports FP8 workloads. Language features include ``no_reorder`` blocks for instruction ordering control, ``__call__`` special method support, ``tensor.view`` method for reshaping, and shared constants as string arguments. **API Improvements** - ``dma_transpose`` now supports indirect addressing, ``dma_copy`` adds the ``unique_indices`` parameter, and ``register_alloc`` accepts optional tensor arguments for pre-filling. The compiler no longer truncates diagnostic output. For more details, see :ref:`nki-2-28-0-rn`. Kubernetes Support ^^^^^^^^^^^^^^^^^^ **Neuron DRA Driver** - Introduced Neuron Dynamic Resource Allocation (DRA) Driver enabling advanced resource allocation using the Kubernetes DRA API for flexible and efficient Neuron device management. The DRA API provides topology-aware scheduling, atomic resource allocation, and per-workload configuration. Neuron Helm Charts now include DRA Driver support. For more details, see :ref:`containers-2-28-0-rn`. PyTorch Framework (torch-neuronx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Transition to Native PyTorch Support** - Starting with PyTorch 2.10 support (planned for a future Neuron release), AWS Neuron will transition from PyTorch/XLA to native PyTorch support via TorchNeuron. PyTorch 2.9 is the last version using PyTorch/XLA. Users will need to update their scripts when upgrading to PyTorch 2.10 or later. See :ref:`native-pytorch-trainium` for migration guidance. For more details, see :ref:`pytorch-2-28-0-rn`. * Read the :doc:`Neuron 2.28.1 component release notes ` for specific Neuron component improvements and details. .. _whats-new-2025-12-19-v2_27: AWS Neuron SDK 2.27.0: Trainium3 Support, Enhanced NKI, and Unified Profiling with Neuron Explorer --------------------------------------------------------------------------------------------------- **Posted on**: December 19, 2025 Today we are releasing AWS Neuron SDK 2.27.0. This release adds support for Trainium3 (``Trn3``) instances. Enhanced NKI with new NKI Compiler introduces the ``nki.*`` namespace with updated APIs and language constructs. The NKI Library provides pre-optimized kernels for common model operations including attention, MLP, and normalization. Neuron Explorer delivers a unified profiling suite with AI-driven optimization recommendations. vLLM V1 integration is now available through the vLLM-Neuron Plugin. Deep Learning Containers and AMIs are updated with vLLM V1, PyTorch 2.9, JAX 0.7, Ubuntu 24.04, and Python 3.12. In addition to this release, we are introducing new capabilities and features in private beta access (see Private Beta Access section). We are also announcing our transition to PyTorch native support starting with PyTorch 2.10 in Neuron 2.28, plans to simplify NxDI in upcoming releases, and other important updates. Neuron Kernel Interface (NKI) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **NKI Compiler** - The new ``nki.*`` namespace replaces the legacy ``neuronxcc.nki.*`` namespace. Top-level kernel functions now require the ``@nki.jit`` annotation. Neuron 2.27 supports both namespaces side by side; the legacy namespace will be removed in Neuron 2.28. A kernel migration guide is available in the documentation. For more details, see :ref:`neuron-2-27-0-nki`. NKI Library ^^^^^^^^^^^ The NKI Library provides pre-optimized kernels: Attention CTE, Attention TKG, MLP, Output Projection CTE, Output Projection TKG, QKV, and RMSNorm-Quant. Kernels are accessible via the ``nkilib.*`` namespace in neuronx-cc or from the GitHub repository. For more details, see :ref:`neuron-2-27-0-nkilib`. Developer Tools ^^^^^^^^^^^^^^^ **Neuron Explorer** - A a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. This release features improved performance and user expereince for device profiling, with four core viewers to provide insights into model performance: * **Hierarchy Viewer**: Visualizes model structure and component interactions * **AI Recommendation Viewer**: Delivers AI-driven optimization recommendations * **Source Code Viewer**: Links profiling data directly to source code * **Summary Viewer**: Displays high-level performance metrics Neuron Explorer is available through UI, CLI, and VSCode IDE integration. Existing NTFF files are compatible but require reprocessing for new features. New tutorials cover profiling NKI kernels, multi-node training jobs, and vLLM inference workloads. The ``nccom-test`` tool now includes fine-grained collective communication support. For more details, see :ref:`neuron-2-27-0-tools`. Inference Updates ^^^^^^^^^^^^^^^^^ **vLLM V1** - The vLLM-Neuron Plugin enables vLLM V1 integration for inference workloads. vLLM V0 support ends in Neuron 2.28. **NxD Inference** - Model support expands with beta releases of Qwen3 MoE (Qwen3-235B-A22B) for multilingual text and Pixtral (Pixtral-Large-Instruct-2411) for image understanding. Both models use HuggingFace checkpoints and are supported on ``Trn2`` and ``Trn3`` instances. For more details, see :ref:`neuron-2-27-0-nxd-inference`. Neuron Graph Compiler ^^^^^^^^^^^^^^^^^^^^^ Default accuracy settings are now optimized for precision. The ``--auto-cast`` flag defaults to ``none`` (previously ``matmul``), and ``--enable-mixed-precision-accumulation`` is enabled by default. FP32 models may see performance impacts; restore previous behavior with ``--auto-cast=matmul`` and ``--disable-mixed-precision-accumulation``. Python 3.10 or higher is now required. For more details, see :ref:`neuron-2-27-0-compiler`. Runtime Improvements ^^^^^^^^^^^^^^^^^^^^ **Neuron Runtime Library 2.29** adds support for Trainium3 (``Trn3``) instances and delivers performance improvements for Collectives Engine overhead, NeuronCore branch overhead, NEFF program startup, and all-gather latency. For more details, see :ref:`neuron-2-27-0-runtime`. Deep Learning AMIs and Containers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Platform Updates** - All DLCs are updated to Ubuntu 24.04 and Python 3.12. DLAMIs add Ubuntu 24.04 support for base, single framework, and multi-framework configurations. **Framework Updates**: * vLLM V1 single framework DLAMI and multi-framework virtual environments * PyTorch 2.9 single framework DLAMIs and multi-framework virtual environments (Amazon Linux 2023, Ubuntu 22.04, Ubuntu 24.04) * JAX 0.7 single framework DLAMI and multi-framework virtual environments **New Container** - The ``pytorch-inference-vllm-neuronx`` 0.11.0 DLC provides a complete vLLM inference environment with PyTorch 2.8 and all dependencies. For more details, see :ref:`neuron-2-27-0-dlami` and :ref:`neuron-2-27-0-dlc`. End of Support and Migration Notices ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Effective this release:** * :ref:`announcement-python-3-9-eol` * :ref:`announcement-end-of-support-pytorch-2-6` * :ref:`announce-no-support-tensorflow2-10` * :ref:`announce-eos-inf1-virtual-environments` * :ref:`announcement-end-of-support-parallel-model-trace` * :ref:`announce-eos-tensorboard-tools` **Effective Neuron 2.28:** * :ref:`announcement-end-of-support-neuronxcc-nki` * :ref:`announcement-nki-library-namespace-changes` * :ref:`announcement-nki-library-kernel-migration` * :ref:`announcement-end-of-support-vllm-v0` **Effective with PyTorch 2.10 support:** * :ref:`announce-transition-pytorch-trainium` * :ref:`announcement-end-of-support-nxdt-nxd-core` **Future Releases:** * :ref:`announce-nxdi-changes` * :ref:`announce-eos-dlami-ubuntu-22-04` * :ref:`announce-eos-pytorch-profling-api` * :ref:`announce-eos-neuron-profiler` Detailed Release Notes ^^^^^^^^^^^^^^^^^^^^^^^ * Read the :doc:`Neuron 2.27.0 component release notes ` for specific Neuron component improvements and details. ---- .. _whats-new-2025-12-02-riv: AWS Neuron Expands with Trainium3, Native PyTorch, Faster NKI, and Open Source at re:Invent 2025 ------------------------------------------------------------------------------------------------ **Posted on**: 12/02/2025 .. image:: /images/NeuronStandalone_white_small.png :alt: AWS Neuron Logo :align: right :width: 120px At re:Invent 2025, AWS Neuron introduces support for `Trainium3 UltraServer `__ with expanded open source components and enhanced developer experience. These updates enable standard frameworks to run unchanged on Trainium, removing barriers for researchers to experiment and innovate. For developers requiring deeper control, the enhanced Neuron Kernel Interface (NKI) provides direct access to hardware-level optimizations, enabling customers to scale AI workloads with improved performance. **Expanded capabilities and enhancements include**: * :doc:`Trainium3 UltraServer support `: Enabling customers to scale AI workloads with improved performance * :doc:`Native PyTorch support `: Standard PyTorch runs unchanged on Trainium without platform-specific modifications * :doc:`Enhanced Neuron Kernel Interface (NKI) ` with open source :doc:`NKI Compiler `: Improved programming capabilities with direct access to Trainium hardware instructions and fine-grained optimization control, compiler built on MLIR * :doc:`NKI Library `: Open source collection of optimized, ready-to-use kernels for common ML operations * :doc:`Neuron Explorer `: Tools suite to support developers and performance engineers in their performance optimization journey from framework operations to hardware instructions * :doc:`Neuron DRA for Kubernetes `: Kubernetes-native resource management eliminating custom scheduler extensions * :doc:`Expanded open source components `: Open sourcing more components including NKI Compiler, Native PyTorch, NKI Library, and more released under Apache 2.0 AI development requires rapid experimentation, hardware optimization, and production scale workloads. These updates enable researchers to experiment with novel architectures using familiar workflows, ML developers to build AI applications using standard frameworks, and performance engineers to optimize workloads using low-level hardware optimization. .. admonition:: Looking to try out our Beta features? Submit your beta access request through `this form `__ and the Neuron Product team will get back to you. Native PyTorch Support ^^^^^^^^^^^^^^^^^^^^^^ **Private Preview** AWS Neuron now natively supports PyTorch through TorchNeuron, an open source native PyTorch backend for Trainium. TorchNeuron integrates with PyTorch through the PrivateUse1 device backend mechanism, registering Trainium as a native device alongside other backends and allowing researchers and ML developers to run their code without modifications. TorchNeuron provides eager mode execution for interactive development and debugging, native distributed APIs including FSDP and DTensor for distributed training, and torch.compile support for optimization. TorchNeuron enables compatibility with minimal code changes with ecosystem tools like TorchTitan and HuggingFace Transformers. Use TorchNeuron to run your PyTorch research and training workloads on Trainium without platform-specific code changes. **Learn more**: :doc:`documentation `, and `TorchNeuron GitHub repository `__. **Access**: Contact your AWS account team for access. Enhanced NKI ^^^^^^^^^^^^ **Public Preview** The enhanced Neuron Kernel Interface (NKI) provides developers with complete hardware control through advanced APIs for fine-grained scheduling and allocation. The enhanced NKI enables instruction-level programming, memory allocation control, and execution scheduling with direct access to the Trainium ISA. We are also releasing the NKI Compiler as open source under Apache 2.0, built on MLIR to enable transparency and collaboration with the broader compiler community. NKI integrates with PyTorch and JAX, enabling developers to use custom kernels within their training workflows. Use Enhanced NKI to innovate and build optimized kernels on Trainium. Explore the NKI Compiler source code to inspect and contribute to the MLIR-based compilation pipeline. .. note:: The NKI Compiler source code is currently in **Private Preview**, while the NKI programming interface is in **Public Preview**. **Learn more**: :doc:`NKI home page ` and :doc:`NKI Language Guide `. NKI Library ^^^^^^^^^^^ **Public Preview** The NKI Library provides an open source collection of optimized, ready-to-use kernels for common ML operations. The library includes kernels for dense transformer operations, MoE-specific operations, and attention mechanisms, all with complete source code, documentation, and benchmarks. Use NKI Library kernels directly in your models to improve performance, or explore the implementations as reference for best practices of performance optimizations on Trainium. **Learn more**: `GitHub repository `__ and :doc:`API documentation `. Neuron Explorer ^^^^^^^^^^^^^^^ **Public Preview** Neuron Explorer is a tools suite that supports developers and performance engineers in their performance optimization journey. It provides capabilities to inspect and optimize code from framework operations down to hardware instructions with hierarchical profiling, source code linking, IDE integration, and AI-powered recommendations for optimization insights. Use Neuron Explorer to understand and optimize your model performance on Trainium, from high-level framework operations to low-level hardware execution. **Learn more**: :doc:`Neuron Explorer documentation `. Kubernetes-Native Resource Management with Neuron DRA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Private Preview** Neuron Dynamic Resource Allocation (DRA) provides Kubernetes-native resource management for Trainium, eliminating custom scheduler extensions. DRA enables topology-aware scheduling using the default Kubernetes scheduler, atomic UltraServer allocation, and flexible per-workload configuration. Neuron DRA supports EKS, SageMaker HyperPod, and UltraServer configurations. The driver is open source with container images in AWS ECR public gallery. Use Neuron DRA to simplify Kubernetes resource management for your Trainium workloads with native scheduling and topology-aware allocation. **Learn more**: :doc:`Neuron DRA documentation `. **Access**: Contact your AWS account team to participate in the Private Preview. Resources and Additional Information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For more information visit the `AWS Trainium official page `__, the :doc:`AWS Neuron Documentation `, and :doc:`the AWS Neuron GitHub repositories `. ================================================ FILE: archive/helper-tools/index.rst ================================================ .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 Helper Tools ============ .. toctree:: :maxdepth: 1 Check Model GatherInfo ================================================ FILE: archive/helper-tools/tutorial-neuron-check-model.rst ================================================ .. _neuron_check_model: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 Neuron Check Model ^^^^^^^^^^^^^^^^^^ Overview ======== Neuron Check Model tool provides user with basic information about the compiled and uncompiled model's operations without the use of TensorBoard-Neuron. For additional visibility into the models, please see :ref:`neuron-plugin-tensorboard`. Neuron Check Model tool scans the user's uncompiled model and provides a table of the operations within the uncompiled model. By default, the table shows each operation type and number of instances of that type within model, and whether the type is supported in Neuron. If --show_names option is specified, the table shows each operation by name and whether the type of that operation is supported in Neuron. If the model is already compiled, the tool also provides the table of operations as for uncompiled model. The table include the Neuron subgraph type and number of instances of that type, along with operations that have not been compiled to Neuron. Additionally, the tool displays a message showing the minimum number of NeuronCores required to run the model, followed by another table which shows the list of Neuron subgraphs by name and the number of pipelined NeuronCores used by each subgraph. More information about NeuronCore pipeline can be found in :ref:`neuroncore-pipeline`. If --expand_subgraph option is specified, the operations within each subgraph are printed below the subgraph information. Neuron Check Model tool is currently available for TensorFlow and MXNet. To check PT model, please use torch.neuron.analyze_model function as shown in PyTorch-Neuron Getting Started tutorial :ref:`/src/examples/pytorch/resnet50.ipynb` TensorFlow-Neuron Check Model ============================= The following example shows how to run TensorFlow-Neuron Check Model tool with TensorFlow ResNet50 tutorial. 1. Start with the TensorFlow ResNet50 tutorial at :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb` and do the first three steps of the tutorial. Please stay in the Python environment that you setup during the tutorial. 2. Install needed tensorflow_hub package and download the tool: :: pip install tensorflow_hub wget https://raw.githubusercontent.com/aws/aws-neuron-sdk/master/src/neuron-gatherinfo/tf_neuron_check_model.py python tf_neuron_check_model.py -h :: usage: tf_neuron_check_model.py [-h] [--show_names] [--expand_subgraph] model_path positional arguments: model_path a TensorFlow SavedModel directory (currently supporting TensorFlow v1 SaveModel only). optional arguments: -h, --help show this help message and exit --show_names list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model). --expand_subgraph show subgraph operations. 3. After step 3 of the TensorFlow ResNet50 tutorial, you can check the uncompiled model to see Neuron supported operations (currently supporting TensorFlow v1 SaveModel only): :: $ python tf_neuron_check_model.py ws_resnet50/resnet50/ * The following table shows the supported and unsupported operations within this uncompiled model. * Each line shows an operation type, the number of instances of that type within model, * and whether the type is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2'] Op Type Num Instances Neuron Supported ? ------- ------------- ------------------ Pad 2 Yes RandomUniform 54 Yes Sub 54 Yes Mul 54 Yes Add 54 Yes Conv2D 53 Yes BiasAdd 54 Yes FusedBatchNormV3 53 Yes Relu 49 Yes MaxPool 1 Yes AddV2 16 Yes Fill 56 Yes Mean 1 Yes MatMul 1 Yes Softmax 1 Yes Pack 1 Yes * Total inference operations: 504 * Total Neuron supported inference operations: 504 * Percent of total inference operations supported by Neuron: 100.0 4. You can also check the compiled model to see the number of pipeline NeuronCores for each subgraph: :: $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/ * Found 1 Neuron subgraph(s) (NeuronOp(s)) in this compiled model. * Use this tool on the original uncompiled model to see Neuron supported operations. * The following table shows all operations, including Neuron subgraphs. * Each line shows an operation type, the number of instances of that type within model, * and whether the type is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2'] Op Type Num Instances Neuron Supported ? ------- ------------- ------------------ NeuronOp 1 Yes * Please run this model on Inf1 instance with at least 1 NeuronCore(s). * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph * (and subgraph operations if --expand_subgraph is used): Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733 1 5. When showing subgraph information, you can use --expand_subgraph to show operation types in each subgraph: :: $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/ --expand_subgraph (output truncated to show subgraph information only) Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733 1 Op Type Num Instances ------- ------------- MatMul 1 Relu 49 Add 16 FusedBatchNorm 53 BiasAdd 54 Conv2D 53 Pad 2 Mean 1 MaxPool 1 Softmax 1 6. Use --show_names to see full operation names (caution: this option will generate many lines of output for a large model): :: $ python tf_neuron_check_model.py ws_resnet50/resnet50_neuron/ --show_names * Found 1 Neuron subgraph(s) (NeuronOp(s)) in this compiled model. * Use this tool on the original uncompiled model to see Neuron supported operations. * The following table shows all operations, including Neuron subgraphs. * Each line shows an operation name and whether the type of that operation is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2'] Op Name Op Type Neuron Supported ? ------- ------- ------------------ conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733 NeuronOp Yes * Please run this model on Inf1 instance with at least 1 NeuronCore(s). * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph * (and subgraph operations if --expand_subgraph is used): Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- conv5_block3_3_bn/FusedBatchNormV3/ReadVariableOp/neuron_op_d6f098c01c780733 1 MXNet-Neuron Check Model ========================= The following example shows how to run MXNet-Neuron Check Model tool with MXNet ResNet50 tutorial. 1. Start with the MXNet ResNet50 tutorial at :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb` and do the first three steps of the tutorial. Please stay in the Python environment that you setup during the tutorial. 2. Download the tool: :: wget https://raw.githubusercontent.com/aws/aws-neuron-sdk/master/src/neuron-gatherinfo/mx_neuron_check_model.py python mx_neuron_check_model.py -h :: usage: mx_neuron_check_model.py [-h] [--show_names] [--expand_subgraph] model_path positional arguments: model_path path prefix to MXNet model (the part before -symbol.json) optional arguments: -h, --help show this help message and exit --show_names list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model). --expand_subgraph show subgraph operations. 3. After step 3 of MXNet ResNet50 tutorial, you can check the uncompiled model to see Neuron supported operations: :: $ python mx_neuron_check_model.py resnet-50 * The following table shows the supported and unsupported operations within this uncompiled model. * Each line shows an operation type, the number of instances of that type within model, * and whether the type is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['null'] Op Type Num Instances Neuron Supported ? ------- ------------- ------------------ BatchNorm 51 Yes Convolution 53 Yes Activation 50 Yes Pooling 2 Yes elemwise_add 16 Yes Flatten 1 Yes FullyConnected 1 Yes SoftmaxOutput 1 No * Total inference operations: 175 * Total Neuron supported inference operations: 174 * Percent of total inference operations supported by Neuron: 99.4 4. You can also check the compiled model to see the number of pipeline NeuronCores for each subgraph: :: $ python mx_neuron_check_model.py resnet-50_compiled * Found 1 Neuron subgraph(s) (_neuron_subgraph_op(s)) in this compiled model. * Use this tool on the original uncompiled model to see Neuron supported operations. * The following table shows all operations, including Neuron subgraphs. * Each line shows an operation type, the number of instances of that type within model, * and whether the type is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['null'] Op Type Num Instances Neuron Supported ? ------- ------------- ------------------ _neuron_subgraph_op 1 Yes SoftmaxOutput 1 No * Please run this model on Inf1 instance with at least 1 NeuronCore(s). * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph * (and subgraph operations if --expand_subgraph is used): Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- _neuron_subgraph_op0 1 5. When showing subgraph information, you can use --expand_subgraph to show operation types in each subgraph: :: $ python mx_neuron_check_model.py resnet-50_compiled --expand_subgraph (output truncated to show subgraph information only) Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- _neuron_subgraph_op0 1 Op Type Num Instances ------- ------------- BatchNorm 51 Convolution 53 Activation 50 Pooling 2 elemwise_add 16 Flatten 1 FullyConnected 1 6. Use --show_names to see full operation names (caution: this option will generate many lines of output for a large model): :: $ python mx_neuron_check_model.py resnet-50_compiled --show_names * Found 1 Neuron subgraph(s) (_neuron_subgraph_op(s)) in this compiled model. * Use this tool on the original uncompiled model to see Neuron supported operations. * The following table shows all operations, including Neuron subgraphs. * Each line shows an operation name and whether the type of that operation is supported in Neuron. * Some operation types are excluded from table because they are no-operations or training-related operations: ['null'] Op Name Op Type Neuron Supported ? ------- ------- ------------------ _neuron_subgraph_op0 _neuron_subgraph_op Yes softmax SoftmaxOutput No * Please run this model on Inf1 instance with at least 1 NeuronCore(s). * The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph * (and subgraph operations if --expand_subgraph is used): Subgraph Name Num Pipelined NeuronCores ------------- ------------------------- _neuron_subgraph_op0 1 ================================================ FILE: archive/helper-tools/tutorial-neuron-gatherinfo.rst ================================================ .. _neuron_gatherinfo: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 Using Neuron GatherInfo Tool to collect debug and support information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Overview ======== The Neuron GatherInfo tool ``neuron-gatherinfo.py`` can assist in automating the collection and packaging of information from Neuron SDK tools that is useful to both user and AWS for issue resolution. The tool gathers log files and other system information. If being used to supply that info to AWS, the tool will redact proprietary and confidential information. The GatherInfo tool is supplied in source code form - available here: :github:`Neuron Gatherinfo ` The tool enables developers to gather compiler and inference/runtime logs. Additionally, the common usage is from within one of the supported ML frameworks that have been integrated with Neuron, and information can be captured from those compile/runtime environments using the frameworks. Steps Overview: ~~~~~~~~~~~~~~~ 1. Obtain a copy of neuron-gatherinfo.py from :github:`Neuron Gatherinfo ` 2. Install into a location in your $PATH or into a location from where you can launch the script 3. Use with compile and/or runtime environments Neuron-CC information gathering ------------------------------- Step 1: Re-run the compile steps for your workload with increased verbosity or debug levels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - For TensorFlow-Neuron, change the Python code as shown. Note that ‘compiler-workdir’ is expected to be an empty directory to prevent files from other runs from interfering with the information gathering. The call to the compile function has to be augmented with the **verbose** and the \**compiler_workdir \**arguments. In addition, please capture the stdout messages into a file (for example, by redirecting the stdout to a file) :: tfn.saved_model.compile(model_dir, compiled_model_dir, compiler_args=['--verbose', '2', '--pipeline', 'compile', 'SaveTemps'], compiler_workdir='./compiler-workdir') - For Neuron Apache MXNet, add compiler arguments as shown below and run the compilation process from an empty workdir: :: import mxnet as mx import os from packaging import version mxnet_version = version.parse(mx.__version__) if mxnet_version >= version.parse("1.8"): import mx_neuron as neuron else: from mxnet.contrib import neuron ... os.environ['SUBGRAPH_INFO'] = '1' compile_args = { '--verbose' : 2, '--pipeline' : 'compile', 'flags' : ['SaveTemps'] } csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs=inputs, **compile_args) .. _step-2-run-neuron-gatherinfopy-to-gather-information-to-share: Step 2: Run neuron-gatherinfo.py to gather information to share ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output result will be a tar.gz file. Neuron Runtime information gathering ------------------------------------ Step 1: EXECUTE inference steps for your workload with increased verbosity or debug levels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the case of runtime information, the tool **neuron-dump.py** is used by \**neuron-gatherinfo.py \**to gather that information. Make sure that you have the neuron tools package (aws-neuron-tools) installed. .. _step-2-run-neuron-gatherinfopy-to-gather-information-to-share-1: Step 2: Run neuron-gatherinfo.py to gather information to share ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The output result will be a tar.gz file. Tool Usage Reference ==================== Run neuron-gatherinfo.py using the “—help“ option: :: bash $ ~/bin/neuron-gatherinfo.py --help usage: neuron-gatherinfo.py [-h] [--additionalfileordir ADDFLDIR] [-c CCDIR] [-i] [-f FILTERFILE] [-m] -o OUTDIR [-r RTDIR] -s STDOUT [-v] Usage: /home/user/bin/neuron-gatherinfo.py [options] This program is used to gather information from this system for analysis and debugging optional arguments: -h, --help show this help message and exit --additionalfileordir ADDFLDIR Additional file or directory that the user wants to provide in the archive. The user can sanitize this file or directory before sharing -c CCDIR, --compileroutdir CCDIR Location of the neuron-cc generated files -i, --include By default, only the lines containing (grep) patterns like 'nrtd|neuron|kernel:' from the syslog are copied. Other lines are excluded. Using this option allows the timestamp section of other lines to be included. The rest of the contents of the line itself are elided. Providing the timestamp section may provide time continuity while viewing the copied syslog file -f FILTERFILE, --filter FILTERFILE -m, --modeldata By using this option, the entire compiler work directory's contents will be included (excluding the .pb files, unless an additional option is used). This would include model information, etc. The files that are included, by default, are these: graph_def.neuron- cc.log, all_metrics.csv, hh-tr-operand- tensortensor.json -o OUTDIR, --out OUTDIR The output directory where all the files and other information will be stored. The output will be stored as an archive as well as the actual directory where all the contents are copied. This will allow a simple audit of the files, if necessary. *** N O T E ***: Make sure that this directory has enough space to hold the files and resulting archive -r RTDIR, --runtimeoutdir RTDIR Location of the neuron runtime generated files -s STDOUT, --stdout STDOUT The file where the stdout of the compiler run was saved -v, --verbose Verbose mode displays commands executed and any additional information which may be useful in debugging the tool itself Examples ======== Example 1: no ML model information gathered (default behavior) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case, the tool will archive just the default information gathering: :: bash $ sudo ~/bin/neuron-gatherinfo.py -o compile-and-run-info-for-debugging-no-model-info -i --verbose -s stdout-from-compile_resnet50.out -c compiler-workdir Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lscpu.txt Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lshw.txt Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-lspci.txt Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt ****** Archive created at: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo.tar.gz From directory: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo ****** .. _example-2--model-ml-information-gathered-using-the-modeldata-option: Example 2 : model ML information gathered using the “—modeldata” option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case, the tool will archive the compiler work directory in addition to the default information gathering :: bash $ sudo ~/bin/neuron-gatherinfo.py -o compile-and-run-info-for-debugging -i --verbose -s stdout-from-compile_resnet50.out -c compiler-workdir --modeldata Running cmd: lscpu and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lscpu.txt Running cmd: lshw and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lshw.txt Running cmd: lspci | grep -i Amazon and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo/report-lspci.txt Running cmd: neuron-cc --version and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-cc.txt Running cmd: neuron-ls and capturing output in file: /home/user/tutorials-3/compile-and-run-info-for-debugging-no-model-info/neuron-gatherinfo/report-neuron-ls.txt ****** Archive created at: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo.tar.gz From directory: /home/user/tutorials-3/compile-and-run-info-for-debugging/neuron-gatherinfo ****** ************************** Based on your command line option, we're also packaging these files: graph_def.neuron-cc.log all_metrics.csv hh-tr-operand-tensortensor.json And this directory: /home/user/tutorials-3/compiler-workdir ************************** ================================================ FILE: archive/index.rst ================================================ .. meta:: :description: Archived AWS Neuron SDK documentation :keywords: AWS Neuron SDK, archived tutorials, legacy documentation :date-modified: 12-02-2025 ===================================== Archived AWS Neuron SDK documentation ===================================== .. note:: This page contains archived tutorials and other documentation for older versions of the AWS Neuron SDK. These pages are no longer actively maintained and may reference unsupported features or deprecated APIs. They are provided as-is and may not reflect the current state of the AWS Neuron SDK. Overview -------- The following content has been archived for reference purposes. For the latest documentation and guides, visit the `AWS Neuron SDK documentation `_. Archived feature docs --------------------- .. list-table:: :header-rows: 1 * - Feature - Last release supported - Date archived * - :doc:`tensorboard/getting-started-tensorboard-neuron-plugin` - Neuron 2.27.0 - Archived on: 12/2/2025 * - :doc:`neuronperf/index` - Neuron 2.27.0 - Archived on: 12/2/2025 * - :doc:`helper-tools/index` - Neuron 2.27.0 - Archived on: 12/2/2025 * - :doc:`transformers-neuronx/index` - Neuron 2.25.0 - Archived on: 9/15/2025 * - :doc:`MXNet Neuron Setup Guides ` - Neuron 2.27.0 - Archived on: 3/30/2026 * - :doc:`mxnet-neuron/index` - Neuron 2.16.0 - Archived on: 3/11/2026 * - :doc:`tensorflow/index` - Neuron 2.22.0 - Archived on: 3/11/2026 * - :doc:`torch-neuron/index` - Neuron 2.22.0 - Archived on: 3/11/2026 Archived tutorials ------------------ .. list-table:: :header-rows: 1 * - Tutorial - Last release supported - Date archived * - :doc:`tutorials/finetune_t5` - Neuron 2.24.0 - Archived on: 7/31/2025 * - :doc:`tutorials/ssd300_demo/ssd300_demo` - Neuron 2.24.0 - Archived on: 7/31/2025 * - :doc:`tutorials/megatron_gpt_pretraining` - Neuron 2.25.0 - Archived on: 7/31/2025 * - :doc:`tutorials/finetuning_llama2_7b_ptl` - Neuron 2.26.0 - Archived on: 8/25/2025 * - :doc:`tutorials/training_llama2_tp_pp_ptl` - Neuron 2.26.0 - Archived on: 8/25/2025 * - :doc:`tutorials/training_codegen25_7b` - Neuron 2.26.0 - Archived on: 8/25/2025 * - :doc:`tutorials/gpt3_neuronx_nemo_megatron_pretraining` - Neuron 2.26.0 - Archived on: 8/25/2025 * - :doc:`tutorials/multinode-training-model-profiling` - Neuron 2.29.0 - Archived on: 3/30/2026 .. toctree:: :maxdepth: 1 :hidden: tutorials/finetune_t5 tutorials/ssd300_demo/ssd300_demo tutorials/megatron_gpt_pretraining tutorials/training-gpt-neox-20b tutorials/finetuning_llama2_7b_ptl tutorials/training_llama2_tp_pp_ptl tutorials/training_codegen25_7b tutorials/multinode-training-model-profiling tutorials/training-gpt-neox tensorboard/getting-started-tensorboard-neuron-plugin neuronperf/index helper-tools/index transformers-neuronx/index mxnet-neuron/index tensorflow/index torch-neuron/index Accessing Archived Content -------------------------- Each tutorial listed above corresponds to a specific version or feature set of the Neuron SDK that has since been superseded. Use these resources for historical context or migration guidance. .. warning:: Archived tutorials may not be compatible with current Neuron SDK releases. Exercise caution when following instructions from these documents. ================================================ FILE: archive/mxnet-neuron/api-compilation-python-api.rst ================================================ .. _ref-mxnet-neuron-compilation-python-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Neuron Apache MXNet Compilation Python API ======================================================= .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The MXNet-Neuron compilation Python API provides a method to compile model graph for execution on Inferentia. Description ----------- Within the graph or subgraph, the compile method selects and sends Neuron-supported operations to Neuron-Compiler for compilation and saves the compiled artifacts in the graph. Uncompilable operations are kept as original operations for framework execution. The compiled graph can be saved using the MXNet save_checkpoint and served using MXNet Model Serving. Please see :ref:`mxnet-neuron-model-serving` for more information about exporting to saved model and serving using MXNet Model Serving. Options can be passed to Neuron compiler via the compile function. For example, the “\ ``--neuroncore-pipeline-cores``\ ” option directs Neuron compiler to compile each subgraph to fit in the specified number of NeuronCores. This number can be less than the total available NeuronCores on an Inf1 instance. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. For debugging compilation, use SUBGRAPH_INFO=1 environment setting before calling the compilation script. The extract subgraphs are preserved as hidden files in the run directory. For more information, see :ref:`neuron_gatherinfo` **MXNet 1.5** ------------- Method ------ .. code:: python from mxnet.contrib import neuron neuron.compile(sym, args, aux, inputs, **compile_args) Arguments --------- - **sym** - Symbol object loaded from symbol.json file - **args** - args/params dictionary loaded from params file - **aux** - aux/params dictionary loaded from params file - **inputs** - a dictionary with key/value mappings for input name to input numpy arrays - **kwargs** (optional) - a dictionary with key/value mappings for MXNet-Neuron compilation and Neuron Compiler options. - For example, to limit the number of NeuronCores per subgraph, use ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer representing the maximum number of NeuronCores per subgraph. - Additional compiler flags can be passed using ``'flags' : []`` where is a comma separated list of strings. See :ref:`neuron_gatherinfo` for example of passing debug flags to compiler. - Advanced option to exclude node names: ``compile_args={'excl_node_names' : []}`` where is a comma separated list of node name strings. Returns ------- - **sym** - new partitioned symbol - **args** - modified args/params - **auxs** - modified aux/params Example Usage: Compilation -------------------------- The following is an example usage of the compilation, with default compilation arguments: .. code:: python from mxnet.contrib import neuron ... neuron.compile(sym, args, aux, inputs={'data' : img}) **MXNet 1.8** ------------- Method ------ .. code:: python import mx_neuron as neuron neuron.compile(obj, args=None, aux=None, inputs=None, **compile_args) Arguments --------- - **obj** - Symbol object loaded from symbol.json file or gluon.HybridBlock object - **args** (optional) - args/params dictionary loaded from params file. Only needed in case of Symbol object - **aux** (optional) - aux/params dictionary loaded from params file. Only needed in case of Symbol object - **inputs** - a dictionary with key/value mappings for input name to input numpy arrays. - **kwargs** (optional) - a dictionary with key/value mappings for MXNet-Neuron compilation and Neuron Compiler options. - For example, to limit the number of NeuronCores per subgraph, use ``compile_args={'--neuroncore-pipeline-cores' : N}`` where N is an integer representing the maximum number of NeuronCores per subgraph. - Additional compiler flags can be passed using ``'flags' : []`` where is a comma separated list of strings. See :ref:`neuron_gatherinfo` for example of passing debug flags to compiler. - Advanced option to exclude node names: ``compile_args={'excl_node_names' : []}`` where is a comma separated list of node name strings. - work_dir: relative or absolute path for storing compiler artifacts (including params and jsons) generated during compilation when SUBGRAPH_INFO=1. Returns ------- - **(sym, args, auxs)** - for symbol object as input. sym, args and auxs are new partitioned symbol, modified args/params and modified aux/params repectively. - **(obj)** - for gluon.HybridBlock object as input. obj is the parititioned and optimized gluon.Hybrid block object for Neuron backend. Example Usage: Compilation -------------------------- The following is an example usage of the compilation, with default compilation arguments for symbol object: .. code:: python import mx_neuron as neuron ... neuron.compile(sym, args, aux, inputs={'data' : img}) The following is an example usage of the compilation, with default compilation arguments for gluon.HybridBlock object (only supported in MXNet-Neuron 1.8): .. code:: python import mx_neuron as neuron ... neuron.compile(obj, inputs={'data' : img}) Example Usage: Extract Compilation Statistics --------------------------------------------- To extract operation counts, insert the following code after compile step (assume csym is the compiled MXNet symbol): .. code:: python import json # Return list of nodes from MXNet symbol def sym_nodes(sym): return json.loads(sym.tojson())['nodes'] # Return number of operations in node list def count_ops(graph_nodes): return len([x['op'] for x in graph_nodes if x['op'] != 'null']) # Return triplet of compile statistics # - count of operations in symbol database # - number of Neuron subgraphs # - number of operations compiled to Neuron runtime def get_compile_stats(sym): cnt = count_ops(sym_nodes(sym)) neuron_subgraph_cnt = 0 neuron_compiled_cnt = 0 for g in sym_nodes(sym): if g['op'] == '_neuron_subgraph_op': neuron_subgraph_cnt += 1 for sg in g['subgraphs']: neuron_compiled_cnt += count_ops(sg['nodes']) return (cnt, neuron_subgraph_cnt, neuron_compiled_cnt) original_cnt = count_ops(sym_nodes(sym)) post_compile_cnt, neuron_subgraph_cnt, neuron_compiled_cnt = get_compile_stats(csym) print("INFO:mxnet: Number of operations in original model: ", original_cnt) print("INFO:mxnet: Number of operations in compiled model: ", post_compile_cnt) print("INFO:mxnet: Number of Neuron subgraphs in compiled model: ", neuron_subgraph_cnt) print("INFO:mxnet: Number of operations placed on Neuron runtime: ", neuron_compiled_cnt) .. code:: bash INFO:mxnet: Number of operations in original model: 67 INFO:mxnet: Number of operations in compiled model: 4 INFO:mxnet: Number of Neuron subgraphs in compiled model: 2 INFO:mxnet: Number of operations placed on Neuron runtime: 65 ================================================ FILE: archive/mxnet-neuron/api-reference-guide.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 API Reference Guide (mxnet-neuron) ================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /archive/mxnet-neuron/api-compilation-python-api .. include:: /archive/mxnet-neuron/api-reference-guide.txt ================================================ FILE: archive/mxnet-neuron/api-reference-guide.txt ================================================ * :ref:`ref-mxnet-neuron-compilation-python-api` ================================================ FILE: archive/mxnet-neuron/developer-guide.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Developer Guide =============== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /about-neuron/appnotes/mxnet-neuron/flex-eg .. include:: /archive/mxnet-neuron/developer-guide.txt ================================================ FILE: archive/mxnet-neuron/developer-guide.txt ================================================ * :ref:`flexeg` ================================================ FILE: archive/mxnet-neuron/ec2-then-ec2-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/ec2-then-ec2-devflow.rst ================================================ FILE: archive/mxnet-neuron/index.rst ================================================ Neuron Apache MXNet Release Notes ============================================== .. toctree:: :maxdepth: 1 /release-notes/archive/mxnet-neuron ================================================ FILE: archive/mxnet-neuron/inference-mxnet-neuron.rst ================================================ .. _inference-mxnet-neuron: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Inference (mxnet-neuron) (maintenance) ======================================= .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Tutorials API Reference Guide Developer Guide Misc .. include:: inference-mxnet-neuron.txt ================================================ FILE: archive/mxnet-neuron/inference-mxnet-neuron.txt ================================================ .. card:: Setup (``mxnet-neuron``) :link: setup-mxnet-neuron :link-type: ref :class-body: sphinx-design-class-title-small .. dropdown:: Tutorials :class-title: sphinx-design-class-title-small :animate: fade-in .. include:: /archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt .. dropdown:: API Reference Guide :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/mxnet-neuron/api-reference-guide.txt .. dropdown:: Developer Guide :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/mxnet-neuron/developer-guide.txt .. dropdown:: Misc :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/mxnet-neuron/misc-mxnet-neuron.txt ================================================ FILE: archive/mxnet-neuron/misc-mxnet-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Misc (mxnet-neuron) =================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /archive/mxnet-neuron/troubleshooting-guide What's New /release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-mxnet .. include:: /archive/mxnet-neuron/misc-mxnet-neuron.txt ================================================ FILE: archive/mxnet-neuron/misc-mxnet-neuron.txt ================================================ * :ref:`mxnet_troubleshooting_guide` * :ref:`What's New ` * :ref:`neuron-cc-ops-mxnet` ================================================ FILE: archive/mxnet-neuron/mxnet-neuron-setup.rst ================================================ .. _mxnet-setup: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 MXNet Neuron Setup ================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: mxnet-neuron-setup.txt ================================================ FILE: archive/mxnet-neuron/mxnet-neuron-setup.txt ================================================ .. card:: MxNet Neuron (``mxnet-neuron``) Setup for Inf1 Instances :link: setup-mxnet-neuron :link-type: ref :class-body: sphinx-design-class-title-small ================================================ FILE: archive/mxnet-neuron/neo-then-hosting-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/neo-then-hosting-devflow.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst ================================================ .. _mxnet-neuron-install-prev-al2: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous MXNet Neuron Releases for Amazon Linux (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.17.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.16.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-install-prev-al2023.rst ================================================ .. _mxnet-neuron-install-prev-al2023: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous MXNet Neuron Releases for Amazon Linux 2023 (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst ================================================ .. Install previous MXNet Neuron releases for Ubuntu 20.04 - archived Use the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need. .. tab-set:: .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-install-prev-u22.rst ================================================ .. _mxnet-neuron-install-prev-u22: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous MXNet Neuron Releases for Ubuntu 22 (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-install.rst ================================================ .. _install-neuron-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron ===================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-al2-base-dlami.rst ================================================ .. _setup-mxnet-neuron-al2-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Amazon Linux 2 ========================================================= .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Amazon Linux 2) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-al2.txt .. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-al2.rst ================================================ .. _setup-mxnet-neuron-al2: .. include:: /setup/install-templates/al2-python.rst .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Amazon Linux 2 ====================================================== .. contents:: Table of contents :local: :depth: 2 .. include:: /setup/install-templates/al2-python.rst Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Amazon Linux 2 AMI(HVM) - Kernel 5.10 * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-al2.txt .. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-al2023.rst ================================================ .. _setup-mxnet-neuron-al2023: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Amazon Linux 2023 ========================================================= .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console, please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Amazon Linux 2023 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-al2023.txt .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-al2023.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20-base-dlami.rst ================================================ .. _setup-mxnet-neuron-u20-base-dlami: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Ubuntu 20 ================================================ .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instance sizes and pricing see: `Inf1 web page `_ * Check for the latest version of the `DLAMI Base AMI `_ and copy the AMI name that starts with "Deep Learning Base Neuron AMI (Ubuntu 20.04) " from "AMI Name:" section * Search for the copied AMI name in the AMI Search , you should see a matching AMI with the AMI name in Community AMIs. Select the AMI and use it to launch the instance. * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-u20.txt .. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20.rst ================================================ .. _setup-mxnet-neuron-u20: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Ubuntu 20 ================================================= .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Ubuntu Server 20 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-u20.txt .. include:: /archive/mxnet-neuron/setup/mxnet-update-u20.rst .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u20.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-neuron-ubuntu22.rst ================================================ .. _setup-mxnet-neuron-u22: .. card:: Select a Different Framework or Platform for Setup :link: setup-guide-index :link-type: ref :class-body: sphinx-design-class-title-small MXNet Neuron ("mxnet-neuron") Setup on Ubuntu 22 ================================================= .. contents:: Table of contents :local: :depth: 2 Get Started with Latest Release of MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provide links that will assist you to quickly start with a fresh installation of :ref:`install-neuron-mxnet`. .. dropdown:: Launch the Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in * Please follow the instructions at `launch an Amazon EC2 Instance `_ to launch an instance. When choosing the instance type at the EC2 console. please make sure to select the correct instance type. * To get more information about instances sizes and pricing see: `Inf1 web page `_ * Select Ubuntu Server 20 AMI * After launching the instance, follow the instructions in `Connect to your instance `_ to connect to the instance .. dropdown:: Install Drivers and Tools :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami --category=driver_runtime_tools .. include:: /includes/setup/tab-inference-mxnet-neuron-u22.txt .. include:: /archive/mxnet-neuron/setup/mxnet-install-prev-u22.rst ================================================ FILE: archive/mxnet-neuron/setup/mxnet-update-u20.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. mxnet-neuron-u20-update: Update to latest MXNet Neuron (``mxnet-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: MXNet 1.8.0 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/mxnet-update.rst ================================================ .. _update-neuron-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest MXNet Neuron =============================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=mxnet --framework-version=1.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.14.2-mxnet-install.rst ================================================ .. _install-neuron-1.14.2-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron (Neuron 1.14.2) ====================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.15.0-mxnet-install.rst ================================================ .. _install-neuron-1.15.0-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron (Neuron 1.15.0) ====================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.15.1-mxnet-install.rst ================================================ .. _install-neuron-1.15.1-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron (Neuron 1.15.1) ====================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.15.2-mxnet-install.rst ================================================ .. _install-neuron-1.15.2-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron (Neuron 1.15.2) ====================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.16.3-mxnet-install.rst ================================================ .. _install-neuron-1.16.3-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron ===================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.17.2-mxnet-install.rst ================================================ .. _install-neuron-1.17.2-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron ===================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.18.0-mxnet-install.rst ================================================ .. _install-neuron-1.18.0-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron ===================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/prev-releases/neuron-1.19.0-mxnet-install.rst ================================================ .. _install-neuron-1.19.0-mxnet: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install MXNet Neuron ===================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: MXNet 1.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: MXNet 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install mxnet --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=mxnet-1.5.1 ================================================ FILE: archive/mxnet-neuron/setup/setup-inference ================================================ Setup Guide for Inf1 ==================== .. toctree:: :maxdepth: 1 Fresh install Update to latest release Install previous releases ================================================ FILE: archive/mxnet-neuron/troubleshooting-guide.rst ================================================ .. _mxnet_troubleshooting_guide: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Troubleshooting Guide for Neuron Apache MXNet ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of Contents :local: :depth: 2 Inference Runtime Error ======================= Out-of-memory error when calling Symbol API bind() too many times ----------------------------------------------------------------- .. important :: ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting Neuron 1.19.0 release if your application is using ``NEURONCORE_GROUP_SIZES`` please see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details. If you see out-of-memory error when using Symbol API's bind() function, please ensure that the bind() function is called once for each desired model instance. For example, on inf1.xlarge, use Symbol API to create 4 parallel instances of a model that was compiled to 1 NeuronCore (--neuroncore-pipeline-cores=1), each is bound to an different mx.neuron(i) context where i is the NeuronCore Group index ranging from 0 to 3. Then use 4 threads to feed the 4 instances in parallel. For example: .. code:: python NUM_PARALLEL = 4 os.environ['NEURONCORE_GROUP_SIZES'] = ','.join('1' for _ in range(NUM_PARALLEL)) data_iter = [] for i in range(NUM_PARALLEL): data_iter.append(mx.io.ImageRecordIter( path_imgrec=recfile_base, data_shape=(3, 224, 224), batch_size=1, prefetch_buffer=1, num_parts=NUM_PARALLEL, part_index=i)) sym, args, auxs = mx.model.load_checkpoint('resnet-50_compiled', 0) exec_list = [] for i in range(NUM_PARALLEL): exec = sym.bind(ctx=mx.neuron(i), args=args, aux_states=auxs, grad_req='null') exec_list.append(exec) def single_thread_infer(i): for batch in data_iter[i]: img = batch.data[0] label = batch.label feed_dict = {'data': img} exe = exec_list[i] exe.copy_params_from(feed_dict) exe.forward() out = exe.outputs[0] future_list = [] with futures.ThreadPoolExecutor(max_workers=NUM_PARALLEL) as executor: for i in range(NUM_PARALLEL): future_list.append(executor.submit(single_thread_infer, i)) Inference crashed with MXNetError: InferShapeKeyword argument name xyz not found -------------------------------------------------------------------------------- If you see MXNetError: .. code:: bash mxnet.base.MXNetError: [11:55:39] src/c_api/c_api_symbolic.cc:508: InferShapeKeyword argument name xyz not found." This is followed by a list of "Candidate arguments". This list shows all the input argument names that the model knows about, and 'xyz' is not in the list. To fix this, remove entry xyz from the feed dictionary. Inference crashed at mx.nd.waitall() with MXNetError: Check failed: bin.dtype() == mshadow::kUint8 -------------------------------------------------------------------------------------------------- When executing Symbol API's forward function followed by mx.nd.waitall(), where MXNetError exception occurs with 'Check failed: bin.dtype() == mshadow::kUint8'. Inference crashed with NRTD error 1002 -------------------------------------- During inference, the user may encounter an error with details "[NRTD:infer_wait] error: 1002": .. code:: bash mxnet.base.MXNetError: [11:26:56] src/operator/subgraph/neuron/./neuron_util.h:1175: Check failed: rsp_wait.status().code() == 0 || rsp_wait.status().code() == 1003: Failed Infer Wait with Neuron-RTD Error. Neuron-RTD Status Code: 1002, details: "[NRTD:infer_wait] error: 1002 " Runtime errors are listed in the Neuron Runtime return codes documentation. In particular, 1002 means that some invalid input has been submitted to infer, e.g. missing some of the input tensors, incorrect input tensor sizes. Please examine /var/log/syslog to see imore details on the error. For example, you may see: .. code:: Oct 30 19:13:39 ip-172-31-93-131 nrtd[1125]: [TDRV:io_queue_prepare_input_nonhugetlb] Unexpected input size, for data00, expected: 2097152, received: 33554432 This means that the input tensor size is larger than what the model was compiled for (i.e. the example input tensor shapes passed during compilation. Multi-Model Server ================== Failed to create NEURONCORE Group with GRPC Error. Status Error: 14, Error message: "Connect Failed" ---------------------------------------------------------------------------------------------------- NOTE: This error only applies to MXNet 1.5. If the client is unable to start workers and you get a message that MMS is unable to create NeuronCore Group, please check that Neuron RTD is running (neuron-rtd process). .. code:: json { "code": 500, "type": "InternalServerException", "message": "Failed to start workers“ } .. code:: bash 2019-10-23 19:56:23,187 [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [19:56:23] src/operator/subgraph/inferentia/./inferentia_util.h:218: Check failed: status.ok() Failed to create NeuronCore Group with GRPC Error. Status Error: 14, Error message: "Connect Failed" Multiple MMS workers die with “Backend worker process die.” message ------------------------------------------------------------------- .. important :: ``NEURONCORE_GROUP_SIZES`` will no longer be supported starting Neuron 1.19.0 release if your application is using ``NEURONCORE_GROUP_SIZES`` please see :ref:`neuron-migrating-apps-neuron-to-libnrt` and :ref:`eol-ncgs-env_2` for more details. If you run inference with MMS and get multiple messages “Backend worker process die", please ensure that the number of workers ("intial_workers") passed during load model is less than or equal to number of NeuronCores available divided by number of NeuronCores required by model. .. code:: bash com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Backend worker process die. com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/mxnet/symbol/symbol.py", line 1524, in simple_bind com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ctypes.byref(exe_handle))) com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/usr/local/lib/python3.6/site-packages/mxnet/base.py", line 252, in check_call com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise MXNetError(py_str(_LIB.MXGetLastError())) com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mxnet.base.MXNetError: [00:26:32] src/operator/subgraph/neuron/./neuron_util.h:221: Check failed: 0 == create_eg_rsp.status().code() Failed to create NeuronCore Group with KRTD Error. KRTD Status Code: 4, details: "" As indicated in :ref:`appnote-performance-tuning`, for greater flexibility user can use NEURONCORE_GROUP_SIZES to specify the groupings of NeuronCores into Neuron devices, each device consisting of one or more NeuronCores. Each worker would take a device. The total number of NeuronCores taken by all the workers should be less than or equal the total number of NeuronCores visible to neuron-rtd. This situation should be considered at full load (MMS scales up to max_workers). Additionally, to properly assign model to Neuron device, the environment NEURONCORE_GROUP_SIZES must be specified within the model server class (ie. mxnet_model_service.py in the example above). For example, add the following line within mxnet_model_service.py for model compiled to 1 NeuronCore: .. code:: python os.environ['NEURONCORE_GROUP_SIZES'] = '1' More information about max_worker limit setting can be found at `MMS Management API Documentation`_. For example, to run up to 4 workers in inf1.xlarge where 4 NeuronCores are available by default to Neuron-RTD, set max_workers to 4: .. _MMS Management API Documentation: https://github.com/awslabs/multi-model-server/blob/master/docs/management_api.md#user-content-scale-workers .. code:: bash curl -v -X PUT "http://localhost:8081/models/squeezenet_v1.1_compiled?min_worker=1?max_worker=4" MMS throws a "mxnet.base.MXNetError: array::at" error ----------------------------------------------------- If you see “mxnet.base.MXNetError: array::at” when running MMS please check that NDArray/Gluon API is not used as they are not supported in MXNet-Neuron. If you would like to use NDArray or Gluon API, please upgrade to MXNet 1.8. .. code:: bash [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - array::at [INFO ] W-9000-squeezenet_v1.1_compiled com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 30 [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last): [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/tmp/models/6606fa046f68a34df87f15362a7a2d9a49749878/model_handler.py", line 82, in handle [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - data = self.inference(data) [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/tmp/models/6606fa046f68a34df87f15362a7a2d9a49749878/mxnet_model_service.py", line 153, in inference [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - d.wait_to_read() [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/home/user/regression_venv_p3.6/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1819, in wait_to_read [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - check_call(_LIB.MXNDArrayWaitToRead(self.handle)) [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/home/user/regression_venv_p3.6/lib/python3.6/site-packages/mxnet/base.py", line 253, in check_call [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise MXNetError(py_str(_LIB.MXGetLastError())) [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - mxnet.base.MXNetError: array::at [INFO ] W-9000-squeezenet_v1.1_compiled-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Invoking custom service failed. MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded ------------------------------------------------------------------------------------ NOTE: This issue is resolved in version 1.5.1.1.1.88.0 released 11/17/2020 and only applies for MXNet 1.5. MXNet Model Server is not able to clean up Neuron RTD states after model is unloaded (deleted) from model server. Restarting the model server may fail with "Failed to create NEURONCORE_GROUP" error: .. code:: bash mxnet.base.MXNetError: [00:26:59] src/operator/subgraph/neuron/./neuron_util.h:348: Check failed: 0 == create_eg_rsp.status().code(): Failed to create NEURONCORE_GROUP with Neuron-RTD Error. Neuron-RTD Status Code: 9, details: "" The workaround is to run “`/opt/aws/neuron/bin/neuron-cli reset`“ to clear Neuron RTD states after all models are unloaded and server is shut down before restarting the model server. Pipeline mode is not able to execute inferences requests in parallel -------------------------------------------------------------------- If you see that multiple executors in a neuron pipeline setup (one model compiled for more than one neuron-cores using `--neuroncore-pipeline-cores` option during compilation) are not running in parallel, please set the following MXNet's environment variables before inference to allow mxnet to execute the CPU ops in parallel. Otherwise it will be sequential and stall the executors. ``MXNET_CPU_WORKER_NTHREADS`` is used to do that. Setting its value to ``__subgraph_opt_neuroncore__`` in the compiled model json will ensure that all the executors (threads) can be run in parallel. Features only in MXNet-Neuron 1.5 --------------------------------- - Shared memory for IFMaps transfer to neuron runtime (has higher performance compared to GRPC mode) - Neuron profiling using MXNet Features only in MXNet-Neuron 1.8 --------------------------------- - Gluon API support - Library mode neuron runtime ================================================ FILE: archive/mxnet-neuron/tutorials/mxnet-tutorial-setup.rst ================================================ .. _mxnet-tutorial-setup: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 MXNet Tutorial Setup ==================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. #. Launch an Inf1.6xlarge Instance: .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst #. Set up a development environment: * Enable or install MXNet-Neuron: :ref:`install-neuron-mxnet`. #. Run tutorial in Jupyter notebook: * Follow instruction at :ref:`Setup Jupyter notebook ` to: #. Start the Jupyter Notebook on the instance #. Run the Jupyter Notebook from your local browser * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory: .. code:: git clone https://github.com/aws/aws-neuron-sdk.git cd aws-neuron-sdk/src/examples/mxnet * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/mxnet`` * From your local browser, open the tutorial notebook from the menu and follow the instructions. ================================================ FILE: archive/mxnet-neuron/tutorials/tutorial-model-serving.rst ================================================ .. _mxnet-neuron-model-serving: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tutorial: Neuron Apache MXNet Model Serving ============================================= .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. This MXNet Neuron Model Serving (MMS) example is adapted from the MXNet vision service example which uses pretrained squeezenet to perform image classification: https://github.com/awslabs/multi-model-server/tree/master/examples/mxnet_vision. Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler. Warning ******* If you are using MXNet-1.5, please note that MXNet-1.5 entered maintenance mode and require Neuron Runtime 1.x, please see :ref:`maintenance_mxnet_1_5`. To setup development environment for MXNet-1.5 see installation instructions at :ref:`mxnet-setup`. If using DLAMI, you can activate the environment aws_neuron_mxnet_p36 and skip the installation part in the first step below. 1. First, install Java runtime and multi-model-server: .. code:: bash cd ~/ # sudo dnf -y install -q jre # for AL2023 sudo apt-get install -y -q default-jre # for Ubuntu pip install multi-model-server Download the example code: .. code:: bash git clone https://github.com/awslabs/multi-model-server cd ~/multi-model-server/examples/mxnet_vision 2. Compile ResNet50 model to Inferentia target by saving the following Python script to compile_resnet50.py and run “\ ``python compile_resnet50.py``\ ” .. code:: python from packaging import version import numpy as np import mxnet as mx mxnet_version = version.parse(mx.__version__) if mxnet_version >= version.parse("1.8"): import mx_neuron as neuron else: from mxnet.contrib import neuron path='http://data.mxnet.io/models/imagenet/' mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params') mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json') mx.test_utils.download(path+'synset.txt') nn_name = "resnet-50" #Load a model sym, args, auxs = mx.model.load_checkpoint(nn_name, 0) #Define compilation parameters # - input shape and dtype inputs = {'data' : mx.nd.zeros([1,3,224,224], dtype='float32') } # compile graph to inferentia target csym, cargs, cauxs = neuron.compile(sym, args, auxs, inputs) # save compiled model mx.model.save_checkpoint(nn_name + "_compiled", 0, csym, cargs, cauxs) 3. Prepare signature file ``signature.json`` to configure the input name and shape: .. code:: json { "inputs": [ { "data_name": "data", "data_shape": [ 1, 3, 224, 224 ] } ] } 4. Prepare ``synset.txt`` which is a list of names for ImageNet prediction classes: .. code:: bash curl -O https://s3.amazonaws.com/model-server/model_archive_1.0/examples/squeezenet_v1.1/synset.txt 5. Create custom service class following template in model_server_template folder: .. code:: bash cp -r ../model_service_template/* . Edit ``mxnet_model_service.py`` to use the appropriate context. Make the following change: .. code:: bash from packaging import version mxnet_version = version.parse(mx.__version__) if mxnet_version >= version.parse("1.8"): import mx_neuron as neuron self.mxnet_ctx = mx.neuron() Comment out the existing context set: .. code:: bash #self.mxnet_ctx = mx.cpu() if gpu_id is None else mx.gpu(gpu_id) Also, comment out unnecessary data copy for model_input in ``mxnet_model_service.py``: .. code:: bash #model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input] 6. Package the model with model-archiver: .. code:: bash cd ~/multi-model-server/examples model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle 7. Start MXNet Model Server (MMS) and load model using RESTful API. Please ensure that Neuron RTD is running with default settings (see Neuron Runtime Getting Started): .. code:: bash cd ~/multi-model-server/ multi-model-server --start --model-store examples # Pipe to log file if you want to keep a log of MMS curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=1&synchronous=true&url=resnet-50_compiled.mar" sleep 10 # allow sufficient time to load model Each worker requires a NeuronCore group that can accommodate the compiled model. Additional workers can be added by increasing max_workers configuration as long as there are enough NeuronCores available. Use ``neuron-top`` to see which models are loaded on specific NeuronCores. 8. Test inference using an example image: .. code:: bash curl -O https://raw.githubusercontent.com/awslabs/multi-model-server/master/docs/images/kitten_small.jpg curl -X POST http://127.0.0.1:8080/predictions/resnet-50_compiled -T kitten_small.jpg You will see the following output: .. code:: bash [ { "probability": 0.6375716328620911, "class": "n02123045 tabby, tabby cat" }, { "probability": 0.1692783385515213, "class": "n02123159 tiger cat" }, { "probability": 0.12187337130308151, "class": "n02124075 Egyptian cat" }, { "probability": 0.028840631246566772, "class": "n02127052 lynx, catamount" }, { "probability": 0.019691042602062225, "class": "n02129604 tiger, Panthera tigris" } ] 9. To cleanup after test, issue a delete command via RESTful API and stop the model server: .. code:: bash curl -X DELETE http://127.0.0.1:8081/models/resnet-50_compiled multi-model-server --stop ================================================ FILE: archive/mxnet-neuron/tutorials/tutorials-mxnet-computervision.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Computer Vision Tutorials (``mxnet-neuron``) ============================================ .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * ResNet-50 tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` * Model Serving tutorial :ref:`[html] ` * Getting started with Gluon tutorial :ref:`[html] ` :github:`[notebook] ` ================================================ FILE: archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tutorials (``mxnet-neuron``) ============================= .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Computer Vision Tutorials Natural Language Processing (NLP) Tutorials Utilizing Neuron Capabilities Tutorials .. include:: /archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt ================================================ FILE: archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.txt ================================================ .. tab-set:: .. tab-item:: Computer Vision Tutorials :name: * ResNet-50 tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` * Model Serving tutorial :ref:`[html] ` * Getting started with Gluon tutorial :ref:`[html] ` :github:`[notebook] ` .. tab-item:: Natural Language Processing (NLP) Tutorials :name: * MXNet 1.8: Using data parallel mode tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` .. tab-item:: Utilizing Neuron Capabilities Tutorials :name: * NeuronCore Groups tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` .. note:: To use Jupyter Notebook see: * :ref:`setup-jupyter-notebook-steps-troubleshooting` * :ref:`running-jupyter-notebook-as-script` ================================================ FILE: archive/mxnet-neuron/tutorials/tutorials-mxnet-nlp.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Natural Language Processing (NLP) Tutorials (``mxnet-neuron``) ============================================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * MXNet 1.8: Using data parallel mode tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` ================================================ FILE: archive/mxnet-neuron/tutorials/tutorials-mxnet-utilizing-neuron-capabilities.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Utilizing Neuron Capabilities Tutorials (``mxnet-neuron``) ========================================================== .. warning:: This document is archived. MXNet is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * NeuronCore Groups tutorial :ref:`[html] ` :mxnet-neuron-src:`[notebook] ` ================================================ FILE: archive/neuronperf/index.rst ================================================ .. _neuronperf: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ================= NeuronPerf (Beta) ================= NeuronPerf is a lightweight Python library with a simple API that enables fast measurements of performance when running models using Neuron. .. _neuronperf_quickstart: NeuronPerf Quickstart --------------------- To install NeuronPerf in your Neuron environment, execute: .. code:: bash $ pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com Refer to the :ref:`neuronperf_examples` and :ref:`neuronperf_user_guide` to get started. .. _neuronperf_user_guide: NeuronPerf User Guide --------------------- .. toctree:: :maxdepth: 1 Overview Terminology Examples Benchmark Guide Evaluate Guide Compile Guide Model Index Guide NeuronPerf API Reference ------------------------ .. toctree:: :maxdepth: 1 API Framework Notes FAQ --- .. toctree:: :maxdepth: 1 FAQ Troubleshooting --------------- .. toctree:: :maxdepth: 1 Troubleshooting Release Notes ------------- .. toctree:: :maxdepth: 1 rn ================================================ FILE: archive/neuronperf/neuronperf_api.rst ================================================ .. _neuronperf_api: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf API ============== .. contents:: Table of Contents :local: :depth: 2 .. note:: Due to a bug in Sphinx, some of the type annotations may be incomplete. .. py:function:: compile(compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = "models", filename: str = None, compiler_args: dict = None, verbosity: int = 1, *args, **kwargs) -> str: Compiles the provided model with each provided example input, pipeline size, and performance level. Any additional compiler_args passed will be forwarded to the compiler on every invocation. :param model: The model to compile. :param list inputs: A list of example inputs. :param batch_sizes: A list of batch sizes that correspond to the example inputs. :param pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`. :param performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`neuron-cc-training-mixed-precision`. :param str models_dir: The directory where compilation artifacts will be stored. :param str model_name: An optional model name tag to apply to compiled artifacts. :param str filename: The name of the model index to write out. If not provided, a name will be generated and returned. :param dict compiler_args: Additional compiler arguments to be forwarded with every compilation. :param int verbosity: 0 = error, 1 = info, 2 = debug :return: A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged. :rtype: str .. _neuronperf_api_benchmark: .. py:function:: benchmark(load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, cast_modes: Union[str, List[str]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, verbosity: int = 1, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, device_type: str = "neuron") -> List[Dict]: Benchmarks the model index or individiual model using the provided inputs. If a model index is provided, additional fields such as ``pipeline_sizes`` and ``performance_levels`` can be used to filter the models to benchmark. The default behavior is to benchmark all configurations in the model index. :param load_fn: A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g. ``neuronperf.torch.benchmark``). :param str model_filename: A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g. ``MyModelClass``). :param list inputs: A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments. :param batch_sizes: A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided. :param float duration: The number of seconds to benchmark each model. :param n_models: The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from ``device_type``, instance size, or other environment state. :param pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`. :param performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`neuron-cc-training-mixed-precision`. :param workers_per_model: The number of workers to use per model loaded. If ``None``, this is automatically selected. :param env_setup_fn: A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config. :param setup_fn: A function that receives the benchmarker id, config, and model to perform last minute configuration before inference. :param preprocess_fn: A custom preprocessing function to perform on each input before inference. :param postprocess_fn: A custom postprocessing function to perform on each input after inference. :param bool multiprocess: When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging. :param bool multiinterpreter: When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess. :param bool return_timers: When True, the return of this function is a list of tuples ``(config, results)`` with detailed information. This can be converted to reports with ``get_reports(results)``. :param float stats_interval: Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage. :param str device_type: This will be set automatically to one of the ``SUPPORTED_DEVICE_TYPES``. :param float cost_per_hour: The price of this device / hour. Used to estimate cost / 1 million infs in reports. :param str model_name: A friendly name for the model to use in reports. :param str model_class_name: Internal use. :param str model_class_file: Internal use. :param int verbosity: 0 = error, 1 = info, 2 = debug :return: A list of benchmarking results. :rtype: list[dict] .. py:function:: get_reports(results) Summarizes and combines the detailed results from ``neuronperf.benchmark``, when run with ``return_timers=True``. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such as ``neuronperf.write_csv``. :param list[tuple] results: The list of results from ``neuronperf.benchmark``. :param list[int] batch_sizes: The batch sizes that correspond to the `inputs` provided to ``compile`` and ``benchmark``. Used to correct throughput values in the reports. :return: A list of dictionaries that summarize the results for each model configuration. :rtype: list[dict] .. py:function:: print_reports(reports, cols=SUMMARY_COLS, sort_by="throughput_peak", reverse=False) Print a report to the terminal. Example of default behavior: >>> neuronperf.print_reports(reports) throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 329.667 6.073 6.109 1 1 2 1 models/model_b1_p1_83bh3hhs.pt :param reports: Results from `get_reports`. :param cols: The columns in the report to be displayed. :param sort_by: Sort the cols by the specified key. :param reverse: Sort order. .. py:function:: write_csv(reports: list[dict], filename: str = None, cols=REPORT_COLS) Write benchmarking reports to CSV file. :param list[dict] reports: Results from `neuronperf.get_reports`. :param str filename: Filename to write. If not provided, generated from model_name in report and current timestamp. :param list[str] cols: The columns in the report to be kept. :return: The filename written. :rtype: str .. py:function:: write_json(reports: list[dict], filename: str = None) Writes benchmarking reports to a JSON file. :param list[dict] reports: Results from `neuronperf.get_reports`. :param str filename: Filename to write. If not provided, generated from model_name in report and current timestamp. :return: The filename written. :rtype: str .. py:function:: model_index.append(*model_indexes: Union[str, dict]) -> dict: Appends the model indexes non-destructively into a new model index, without modifying any of the internal data. This is useful if you have benchmarked multiple related models and wish to combine their respective model indexes into a single index. Model name will be taken from the first index provided. Duplicate configs will be filtered. :param model_indexes: Model indexes or paths to model indexes to combine. :return: A new dictionary representing the combined model index. :rtype: dict .. py:function:: model_index.copy(old_index: Union[str, dict], new_index: str, new_dir: str) -> str: Copy an index to a new location. Will rename ``old_index`` to ``new_index`` and copy all model files into ``new_dir``, updating the index paths. This is useful for pulling individual models out of a pool. Returns the path to the new index. .. py:function:: model_index.create(filename, input_idx=0, batch_size=1, pipeline_size=1, cast_mode=DEFAULT_CAST, compile_s=None) Create a new model index from a pre-compiled model. :param str filename: The path to the compiled model. :param int input_idx: The index in your inputs that this model should be run on. :param int batch_size: The batch size at compilation for this model. :param int pipeline_size: The pipeline size used at compilation for this model. :param str cast_mode: The casting option this model was compiled with. :param float compile_s: Seconds spent compiling. :return: A new dictionary representing a model index. :rtype: dict .. py:function:: model_index.delete(filename: str): Deletes the model index and all associated models referenced by the index. .. py:function:: model_index.filter(index: Union[str, dict], **kwargs) -> dict: Filters provided model index on provided criteria and returns a new index. Each kwarg is a standard (k, v) pair, where k is treated as a filter name and v may be one or more values used to filter model configs. .. py:function:: model_index.load(filename) -> dict: Load a NeuronPerf model index from a file. .. py:function:: model_index.move(old_index: str, new_index: str, new_dir: str) -> str: This is the same as ``copy`` followed by ``delete`` on the old index. .. py:function:: model_index.save(model_index, filename: str = None, root_dir=None) -> str: Save a NeuronPerf model index to a file. ================================================ FILE: archive/neuronperf/neuronperf_benchmark_guide.rst ================================================ .. _neuronperf_benchmark_guide: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ========================== NeuronPerf Benchmark Guide ========================== The call to ``neuronperf[torch/tensorflow/mxnet/cpu].benchmark`` is used to measure your model performance. It will choose reasonable defaults if none are provided, and will return back reports that summarize the benchmarking results. What is the default behavior of ``benchmark``? ---------------------------------------------- That will depend how you provided your model and how your model was compiled. The two most common ways to provide your model are: #. Provide the path to your compiled model #. Provide the path to a model index from ``neuronperf.compile`` (a JSON file) Data Parallel ~~~~~~~~~~~~~ Your model is benchmarked on provided ``inputs`` in 4 different configurations: #. A single model on 1 NeuronCore with one worker (min. latency) #. A single model on 1 NeuronCore with two workers (max. throughput / NC) #. ``MAX`` models on ``MAX`` NeuronCores with one worker (min. latency + max. instance usage) #. ``MAX`` models on ``MAX`` NeuronCores with two workers (max. throughput + max. instance usage) The value ``MAX`` is automatically determined by your instance size. If it can't be identified, those configurations will be skipped. The primary benefit of (3) and (4) is to verify that your model scales well at maximum instance usage. .. note:: If you provided the path to a model index from ``compile``: * Your input parameters to ``benchmark`` (``batch_sizes``, etc.) are treated as filters on the index * Each remaining model configuration is benchmarked as described in (1) Pipeline ~~~~~~~~ Pipeline mode is active when using a Neuron device and ``pipeline_sizes > 1``. The same behavior as described in Data Parallel applies, except that only one worker configuration is executed: the optimal number of workers for your pipeline size, unless manually overridden. Parameters ---------- Below are some useful and common parameters to tweak. Please see the :ref:`neuronperf_api` for full details. * ``n_models`` controls how many models to load. The default behavior is ``n_models=[1, MAX]``. * ``workers_per_model`` controls how many worker threads will be feeding inputs to each model. The default is automatically determined. * ``pipeline_sizes`` tells the benchmarker how many cores are needed for your model so that each model instance can be loaded properly. Default is 1. * ``duration`` controls how long to run each configuration. * ``batch_sizes`` is used to inform the benchmarker of your input shape so that throughput can be computed correctly. Almost all NeuronPerf behaviors are controllable via arguments found in the :ref:`neuronperf_api`. This guide attempts to provide some context and examples for those arguments. Inputs ------ Models accept one or more inputs to operate on. Since NeuronPerf needs to support multiple inputs for multiple models, as well as multi-input models, there are some details that may need your attention. See the :ref:`neuronperf_framework_notes` for details. Multi-input Models ~~~~~~~~~~~~~~~~~~ If your model accepts multiple inputs, you must provide them in a ``tuple``. For example, suppose you have a model like this: .. code:: python class Model(torch.nn.Module): def forward(self, x, y, z): ... return output In order for NeuronPerf to pass along your multiple inputs correctly, you should provide them as a ``tuple``: .. code:: python inputs = (x, y, z) npf.torch.benchmark(model_filename, inputs, ...) If you are compiling and/or benchmarking multiple models, you can pass different sized inputs as a list of tuples: .. code:: python inputs = [(x1, y1, z1), (x2, y2, z2), ...] npf.torch.benchmark(model_filename, inputs, ...) Preprocessing and Postprocessing -------------------------------- Many models have additional preprocessing and postprocessing steps involved that may add non-negligible overhead to inference time. NeuronPerf supports these use cases through the use of custom functions. Preprocessing ~~~~~~~~~~~~~ Recall that NeuronPerf expects (or wraps) each model input into a ``tuple``. These tuples will be unpacked before calling your model. Here is an example for a model with one input. The example multiples the input by 5 before inference. .. code:: python def preprocess_fn(x): return x * 5 ... # Benchmark with custom preprocessing function reports = npf.torch.benchmark( filename, inputs, ..., preprocess_fn = preprocess_fn, ) Or if your model expects multiple inputs: .. code:: python def preprocess_fn(x, y, z): return x / 255, y / 255, z / 255 ... # Benchmark with custom preprocessing function reports = npf.torch.benchmark( filename, inputs, ..., preprocess_fn = preprocess_fn, ) Postprocessing ~~~~~~~~~~~~~~ Postprocessing is almost identical to preprocessing, except that your function will receive whatever the output of your model is, exactly as returned without modification. There are no type guarantees. .. code:: python def postprocess_fn(x): return x.argmax() ... # Benchmark with custom preprocessing function reports = npf.torch.benchmark( filename, inputs, ..., postprocess_fn = postprocess_fn, ) Minimal Latency --------------- Suppose you are interested in the minimal latency achievable with your model. In this case, there is no need for more than one worker to execute at a time. We can manually specify the number of workers to use. See below :ref:`neuronperf_worker_threads`. .. _neuronperf_worker_threads: Worker Threads -------------- The argument ``workers_per_model`` controls the number of worker threads that are trying to prepare and load examples onto a single NeuronCore at a time. Therefore, a value of 1 corresponds to 1 thread / model. If ``n_models=16``, then there would be 16 worker threads, one per model. This number is selected based upon whether you are using DataParallel (i.e. ``pipeline_sizes == 1``), or Pipeline Mode (``pipeline_sizes != 1``). By default, NeuronPerf will try to pick try multiple combinations of model copies and workers. You may be interested in controlling this manually. .. code:: python reports = npf.torch.benchmark('model_neuron_b1.pt', ..., workers_per_model=1) You may also pass a list, as with other parameters: .. code:: python workers_per_model = [1, 2] # Same as the default for data parallel reports = npf.torch.benchmark('model_neuron_b1.pt', ..., workers_per_model=workers_per_model) With the default number of :ref:`neuronperf_model_copies`, a call to ``print_results`` might look like this: .. code:: bash throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 307.25 3.251 3.277 1 1 1 1 models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt 2746.0 5.641 6.82 16 1 1 1 models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt 329.5 6.053 6.108 1 1 2 1 models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt 2809.0 10.246 12.52 16 1 2 1 models/a5cff386-89ca-4bbf-9087-d0e624c3c604.pt .. _neuronperf_model_copies: Model Copies ------------ By default, NeuronPerf will benchmark two settings for ``n_models``: 1. A single copy 2. The maximum number number of copies for your instance size You can override this behavior by passing ``n_models`` to ``benchmark``, as shown below: .. code:: python reports = npf.torch.benchmark('model_neuron_b1.pt', ..., n_models=6) or .. code:: python n_models = list(range(1, 10)) reports = npf.torch.benchmark('model_neuron_b1.pt', ..., n_models=n_models) .. _neuronperf_pipeline_mode: Pipeline Mode ------------- By default, NeuronPerf will assume you intend to use DataParallel, with two exceptions: * You compiled your model using NeuronPerf for pipeline mode * You constructed a model index that uses pipeline mode You can also manually tell NeuronPerf that your model was compiled for pipeline mode. It is similar to how other arguments are passed. .. code:: python reports = npf.torch.benchmark('model_neuron_b1.pt', ..., pipeline_sizes=2) If you are passing multiple models in an index, then you should pass a list for ``pipeline_sizes``. .. code:: python reports = npf.torch.benchmark('model_index.json', ..., pipeline_sizes=[1, 2, 3]) Duration -------- NeuronPerf will benchmark each configuration specified for 60 seconds by default. You can control the duration by passing ``duration`` (in seconds). .. code:: python reports = npf.torch.benchmark('model_index.json', ..., duration=10) .. warning:: If you make the duration too short, it may expire before all models are loaded and have had time to execute. Custom Datasets (Beta) ---------------------- Currently, only PyTorch supports custom datasets, and the interface is subject to change. If you provide a custom dataset, it will be fully executed on each loaded model copy. So if you provide ``n_models=2``, your dataset will be run through twice in parallel. To use this API, call ``benchmark`` passing a ``torch.utils.data.Dataset`` to ``inputs``. You can easily create your own ``Dataset`` by implementing the interface, or use one of the available datasets. For example: .. code:: python import torchvision dataset = torchvision.datasets.FashionMNIST( root="data", train=False, download=True, transform=ToTensor() ) reports = npf.torch.benchmark('model_index.json', inputs=dataset, batch_sizes=[8], preprocess_fn=lambda x: x[0], loop_dataset=False) .. note:: The ``preprocess_fn`` is required here to extract image input from the ``(image, label)`` tuple generated by dataloader. If the length of dataset is not sufficient to get the runtime performance, one can set ``loop_dataset=True`` to rerun dataset until certain duration. Results ------- Viewing and Saving ~~~~~~~~~~~~~~~~~~ There are currently three ways to view results. - ``neuronperf.print_reports(...)`` - Dump abbrieviated results in your terminal - ``neuronperf.write_csv(...)`` - Store metrics of interest as CSV - ``neuronperf.write_json(...)`` - Store everything as JSON See the :ref:`neuronperf_api` for full details. Full Timing Results ~~~~~~~~~~~~~~~~~~~ NeuronPerf automatically combines and summarizes the detailed timing information collecting during benchmarking. If you wish to receive everything back yourself, you can use: .. code:: python results = npf.torch.benchmark('model_index.json', ..., return_timers=True) If you later wish to produce reports the same way that NeuronPerf does internally, you can call: .. code:: python reports = npf.get_reports(results) Verbosity --------- Verbosity is an integer, currently one of ``{0, 1, 2}``, where: * 0 = SILENT * 1 = INFO (default) * 2 = VERBOSE / DEBUG Example: .. code:: python reports = npf.torch.benchmark(..., n_models=1, duration=5, verbosity=2) .. code:: bash DEBUG:neuronperf.benchmarking - Cast mode was not specified, assuming default. INFO:neuronperf.benchmarking - Benchmarking 'resnet50.json', ~5 seconds remaining. DEBUG:neuronperf.benchmarking - Running model config: {'model_filename': 'models/model_b1_p1_83bh3hhs.pt', 'device_type': 'neuron', 'input_idx': 0, 'batch_size': 1, 'n_models': 1, 'workers_per_model': 2, 'pipeline_size': 1, 'cast_mode': None, 'multiprocess': True, 'multiinterpreter': False, 'start_dts': '20211111-062818', 'duration': '5'} DEBUG:neuronperf.benchmarking - Benchmarker 0 started. DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 0 started. DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 1 started. DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 0 finished after 738 inferences. DEBUG:neuronperf.benchmarking - Benchmarker 0, Worker 1 finished after 738 inferences. DEBUG:neuronperf.benchmarking - Benchmarker 0 finished. throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 329.667 6.073 6.109 1 1 2 1 models/model_b1_p1_83bh3hhs.pt Internal Process Model ---------------------- For each model loaded (see :ref:`neuronperf_model_copies`), a process is spawned. Each process may use multiple threads (see :ref:`neuronperf_worker_threads`). The threads will continue to load examples and keep the hardware busy. NeuronPerf spawns processes slightly differently between frameworks. For PyTorch and Apache MXNet, processes are forked. For Tensorflow/Keras, a fresh interpreter is launched, and benchmarkers are serialized and run as a script. If you suspect you are having trouble due to the way processes are managed, you have two mechanisms of control: .. code:: python reports = npf.torch.benchmark(..., multiprocess=False) Default is ``True``, and ``False`` will disable multiprocessing and run everything inside a single parent process. This may not work for all frameworks beyond the first model configuration, because process teardown is used to safely deallocate models from the hardware. It is not recommeneded to benchmark this way. .. code:: python reports = npf.torch.benchmark(..., multiinterpreter=True) This flag controls whether a fresh interpreter is used instead of forking. Defaults to ``False`` except with Tensorflow/Keras. .. _npf-cpu-gpu: Benchmark on CPU or GPU ----------------------- When benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated. .. note:: GPU benchmarking is currently only available for PyTorch. CPU: .. code:: python cpu_reports = npf.cpu.benchmark(YourModelClass, ...) GPU: .. code:: python gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type="gpu") Your model class will be instantiated in a subprocess, so there are some things to keep in mind. * Your model class must be defined at the top level inside a Python module * i.e. don't place your model class definition inside a function or other nested scope * If your model class has special Python module dependencies, consider importing them inside your class ``__init__`` * If your model class expects constructor arguments, wrap your class so that it has no constructor arguments Example of a wrapped model class for CPU/GPU benchmarking: .. code:: python class ModelWrapper(torch.nn.Module): def __init__(self): super().__init__() from transformers import AutoModelForSequenceClassification model_name = "bert-base-cased" self.bert = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) self.add_module(model_name, self.bert) def forward(self, *inputs): return self.bert(*inputs) reports = npf.torch.benchmark(ModelWrapper, inputs, device_type="gpu") ================================================ FILE: archive/neuronperf/neuronperf_compile_guide.rst ================================================ .. _neuronperf_compile_guide: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ======================== NeuronPerf Compile Guide ======================== If you wish to compile multiple configurations at once, NeuronPerf provides a simplified and uniform API across frameworks. The output is a model index that tracks the artifacts produces, and can be passed directly to the :ref:`benchmark ` routine for a streamlined end-to-end process. This may be useful if you wish to test multiple configurations of your model on Neuron hardware. You can manually specify the model index filename by passing ``filename``, or let NeuronPerf generate one and return it for you. Compiled artifacts will be placed in a local ``models`` directory. How does ``compile`` know which instance type to compile for? ------------------------------------------------------------- NeuronPerf will assume that the instance type your are currently on is also the compile target. However, you may compile on a non-Neuron instance or choose to target a different instance type. In the case, you can pass ``compiler_target`` to the ``compile`` call. For example: .. code:: python import neuronperf as npf import neuronperf.torch npf.torch.compile(model, inputs) # compile for current instance type npf.torch.compile(model, inputs, compiler_target="inf2") # compile for inf2 Compiling multiple variants --------------------------- If you provide multiple pipeline sizes, batch sizes, and/or cast modes, NeuronPerf will compile all of them. .. code:: python # Select a few batch sizes and pipeline configurations to test batch_sizes = [1, 5, 10] pipeline_sizes = [1, 2, 4] # Construct example inputs example_inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float16) for batch_size in batch_sizes] # Compile all configurations index = npf.torch.compile( model, example_inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, ) If you wished to benchmark specific subsets of configurations, you could compile the specific configurations independently and later combine the results into a single index, as shown below. .. code:: python # Compile with pipeline size 1 and vary batch dimension batch_index = npf.torch.compile( model, example_inputs, batch_sizes=batch_sizes, pipeline_sizes=1, ) # Compile with batch size 1 and vary pipeline dimension pipeline_index = npf.torch.compile( model, example_inputs[0], batch_sizes=1, pipeline_sizes=pipeline_sizes, ) index = npf.model_index.append(batch_index, pipeline_index) npf.model_index.save(index, 'model_index.json') The ``compile`` function supports ``batch_sizes``, ``pipeline_sizes``, ``cast_modes``, and custom ``compiler_args``. If there is an error during compilation for a requested configuration, it will be logged and compilation will continue onward without terminating. (This is to support long-running compile jobs with many configurations.) ================================================ FILE: archive/neuronperf/neuronperf_evaluate_guide.rst ================================================ .. _neuronperf_evaluate_guide: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ========================== NeuronPerf Evaluate Guide ========================== NeuronPerf has a new API for evaluating model accuracy on Neuron hardware. This API is currently only available for PyTorch. You can access the API through standard ``benchmark()`` by passing an additional kwarg, ``eval_metrics``. For example: .. code:: python reports = npf.torch.benchmark( model_index_or_path, dataset, n_models=1, workers_per_model=2, duration=0, eval_metrics=['accuracy', 'precision'] ) In this example, we fix ``n_models`` and ``n_workers`` because replicating the same model will not impact accuracy. We also set ``duration=0`` to allow benchmarking to run untimed through all dataset examples. Because this call can be tedious to type, a convenience function is provided: .. code:: python reports = npf.torch.evaluate(model_index_or_path, dataset, metrics=['accuracy', 'precision']) .. note: Please note that ``eval_metrics`` becomes ``metrics`` when using ``evaluate``. The ``dataset`` can be any iterable object that produces ``tuple(*INPUTS, TARGET)``. If ``TARGET`` does not appear in the last column for your dataset, you can customize this by passing ``eval_target_col``. For example: .. code:: python reports = npf.torch.evaluate(model_index_or_path, dataset, metrics='accuracy', eval_target_col=1) You can list the currently available metrics. .. code:: python >>> npf.list_metrics() │····· Name Description │····· Accuracy (TP + TN) / (TP + TN + FP + FN) │····· TruePositiveRate TP / (TP + FN) │····· Sensitivity Alias for TruePositiveRate │····· Recall Alias for TruePositiveRate │····· Hit Rate Alias for TruePositiveRate │····· TrueNegativeRate TN / (TN + FP) │····· Specificity Alias for TrueNegativeRate │····· Selectivity Alias for TrueNegativeRate │····· PositivePredictiveValue TP / (TP + FP) │····· Precision Alias for PositivePredictiveValue │····· NegativePredictiveValue TN / (TN + FN) │····· FalseNegativeRate FN / (FN + TP) │····· FalsePositiveRate FP / (FP + TN) │····· FalseDiscoveryRate FP / (FP + TN) │····· FalseOmissionRate FP / (FP + TP) │····· PositiveLikelihoodRatio TPR / FPR │····· NegativeLikelihoodRatio FNR / TNR │····· PrevalenceThreshold sqrt(FPR) / (sqrt(FPR) + sqrt(TPR)) │····· ThreatScore TP / (TP + FN + FP) │····· F1Score 2TP / (2TP + FN + FP) │····· MeanAbsoluteError sum(|y - x|) / n │····· MeanSquaredError sum((y - x)^2) / n New metrics may appear in the list after importing a submodule. For example, ``import neuronperf.torch`` will register a new ``topk`` metric. Custom Metrics -------------- Simple Variants =============== If you wish to register a metric that is a slight tweak of an existing metric with different ``init`` args, you can use ``register_metric_from_existing()``: .. code:: python npf.register_metric_from_existing("topk", "topk_3", k=3) This example registers a new metric ``topk_3`` from existing metric ``topk``, passing ``k=3`` as at ``init`` time. New Metrics =========== You can register your own metrics using ``register_metric()``. You metrics must extend ``BaseEvalMetric``: .. code:: python class BaseEvalMetric(ABC): """ Abstract base class BaseEvalMetric from which other metrics inherit. """ @abstractmethod def process_record(self, output: Any = None, target: Any = None) -> None: """Process an individual record and return the result.""" pass @staticmethod def aggregate(metrics: Iterable["BaseEvalMetric"]) -> Any: """Combine a sequence of metrics into a single result.""" raise NotImplementedError For example: .. code:: python import neuronperf as npf class MyCustomMetric(npf.BaseEvalMetric): def __init__(self): super().__init__() self.passing = 0 self.processed = 0 def process_record(self, outputs, target): self.processed += 1 if outputs == target: self.passing += 1 @staticmethod def aggregate(metrics): passing = 0 processed = 0 for metric in metrics: passing += metric.passing processed += metric.processed return passing / processed if processed else 0 npf.register_metric("MyCustomMetric", MyCustomMetric) ================================================ FILE: archive/neuronperf/neuronperf_examples.rst ================================================ .. _neuronperf_examples: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf Examples =================== This page walks through several examples of using NeuronPerf, starting with the simplest way---using a compiled model. We will also see how we can use NeuronPerf to perform a hyperparameter search, and manage the artifacts produced, as well as our results. Benchmark a Compiled Model -------------------------- This example assumes you have already compiled your model for Neuron and saved it to disk. You will need to adapt the batch size, input shape, and filename for your model. .. code:: python import torch # or tensorflow, mxnet import neuronperf as npf import neuronperf.torch # or tensorflow, mxnet # Construct dummy inputs batch_sizes = 1 input_shape = (batch_sizes, 3, 224, 224) inputs = torch.ones(input_shape) # or numpy array for TF, MX # Benchmark and save results reports = npf.torch.benchmark("your_model_file.pt", inputs, batch_sizes) npf.print_reports(reports) npf.write_json(reports) .. code:: bash INFO:neuronperf.benchmarking - Benchmarking 'your_model_file.pt', ~8.0 minutes remaining. throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 296766.5 0.003 0.003 1 1 1 1 your_model_file.pt 3616109.75 0.005 0.008 24 1 1 1 your_model_file.pt 56801.0 0.035 0.04 1 1 2 1 your_model_file.pt 3094419.4 0.005 0.051 24 1 2 1 your_model_file.pt Let's suppose you only wish to test two specific configurations. You wish to benchmark 1 model and 1 worker thread, and also with 2 worker threads for 15 seconds each. The call to ``benchmark`` becomes: .. code:: python reports = npf.torch.benchmark(filename, inputs, batch_sizes, n_models=1, workers_per_model=[1, 2], duration=15) You can also add a custom model name to reports. .. code:: python reports = npf.torch.benchmark(..., model_name="MyFancyModel") See the :ref:`neuronperf_benchmark_guide` for further details. Benchmark a Model from Source ----------------------------- In this example, we define, compile, and benchmark a simple (dummy) model using PyTorch. We'll assume you already have a PyTorch model compiled for Neuron with the filename ``model_neuron_b1.pt``. Furthermore, let's assume the model was traced with a batch size of 1, and has an input shape of (3, 224, 224). .. literalinclude:: test_simple_pt.py :language: python :caption: :download:`test_simple_pt.py ` :linenos: .. code:: bash (aws_neuron_pytorch_p36) ubuntu@ip-172-31-11-122:~/tmp$ python test_simple_pt.py INFO:neuronperf.benchmarking - Benchmarking 'model_neuron_b1.pt', ~8.0 minutes remaining. throughput_avg latency_ms_p50 latency_ms_p99 n_models pipeline_size workers_per_model batch_size model_filename 296766.5 0.003 0.003 1 1 1 1 model_neuron_b1.pt 3616109.75 0.005 0.008 24 1 1 1 model_neuron_b1.pt 56801.0 0.035 0.04 1 1 2 1 model_neuron_b1.pt 3094419.4 0.005 0.051 24 1 2 1 model_neuron_b1.pt Compile and Benchmark a Model ----------------------------- Here is an end-to-end example of compiling and benchmarking a ResNet-50 model from ``torchvision``. .. literalinclude:: test_resnet50_pt.py :language: python :caption: :download:`test_resnet50_pt.py ` :linenos: Benchmark on CPU or GPU ----------------------- When benchmarking on CPU or GPU, the API is slightly different. With CPU or GPU, there is no compiled model to benchmark, so instead we need to directly pass a reference to the model class that will be instantiated. .. note:: GPU benchmarking is currently only available for PyTorch. CPU: .. code:: python cpu_reports = npf.cpu.benchmark(YourModelClass, ...) GPU: .. code:: python gpu_reports = npf.torch.benchmark(YourModelClass, ..., device_type="gpu") Please refer to :ref:`npf-cpu-gpu` for details and an example of providing your model class. ================================================ FILE: archive/neuronperf/neuronperf_faq.rst ================================================ .. _neuronperf_faq: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf FAQ ============== .. contents:: Table of contents :local: :depth: 1 When should I use NeuronPerf? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When you want to measure the highest achievable performance for your model with Neuron. When should I **not** use NeuronPerf? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When measuring end-to-end performance that includes your network serving stack. Instead, your should compare your e2e numbers to those obtained by NeuronPerf to optimize your serving overhead. Which frameworks does NeuronPerf support? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ See :ref:`neuronperf_framework_notes`. Which Neuron instance types does NeuronPerf support? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PyTorch and TensorFlow support all instance types. MXNet support is limited to inf1. What is the secret to obtaining the best numbers? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There is no secret sauce. NeuronPerf follows best practices. What are the "best practices" that NeuronPerf uses? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - These vary slightly by framework and how your model was compiled - For a model compiled for a single NeuronCore (DataParallel): - To maximize throughput, for ``N`` models, use ``2 * N`` worker threads - To minimize latency, use 1 worker thread per model - Use a new Python process for each model to avoid GIL contention - Ensure you benchmark long enough for your numbers to stabilize - Ignore outliers at the start and end of inference benchmarking ================================================ FILE: archive/neuronperf/neuronperf_framework_notes.rst ================================================ .. _neuronperf_framework_notes: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ========================== NeuronPerf Framework Notes ========================== PyTorch ======= * Requires: ``torch-neuron`` or ``torch-neuronx`` - Versions: 1.7.x, 1.8.x, 1.9.x, 1.10.x, 1.11.x, 1.12.x, 1.13.x * Input to ``compile``: ``torch.nn.Module`` * Model inputs: ``Any``. TensorFlow 1.x ============== * Requires: ``tensorflow-neuron`` - Versions: All * Input to ``compile``: Path to uncompiled model dir from ``saved_model.simple_save`` * Model inputs: Tensors must be provided as ``numpy.ndarray`` .. note:: Although TensorFlow *tensors* must be ``ndarray``, this doesn't stop you from wrapping them inside of data structures that traverse process boundaries safely. For example, you can still pass an input ``dict`` like ``{'input_0': np.zeros((2, 1))}``. TensorFlow 2.x ============== * Requires: ``tensorflow-neuron`` or ``tensorflow-neuronx`` - Versions: All * Input to ``compile``: ``tf.keras.Model`` * Model inputs: Tensors must be provided as ``numpy.ndarray`` .. note:: Although TensorFlow *tensors* must be ``ndarray``, this doesn't stop you from wrapping them inside of data structures that traverse process boundaries safely. For example, you can still pass an input ``dict`` like ``{'input_0': np.zeros((2, 1))}``. Apache MXNet ============= * Requires: ``mxnet-neuron`` - Versions 1.5, 1.8 * Input to ``compile``: ``tuple(sym, args, aux)`` * Inputs: Tensors must be provided as ``mxnet.ndarray`` or ``numpy.ndarray`` ================================================ FILE: archive/neuronperf/neuronperf_install.rst ================================================ .. _neuronperf_install: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf Install ================== Activate your Neuron environment, and execute: .. code:: bash $ pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com ================================================ FILE: archive/neuronperf/neuronperf_model_index_guide.rst ================================================ .. _neuronperf_model_index_guide: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 ============================ NeuronPerf Model Index Guide ============================ A **model index** is a JSON file that tracks information about one or more compiled models. You can generate them using ``compile``, by using the API described here, or you may create them manually in a text editor. After a call to ``compile`` you may notice that you now have a ``models`` directory. You will also spot a new file named something like ``model_83b3raj2.json`` in your local directory, if you didn't provide a ``filename`` yourself. A model index is not intended to be opaque; you should feel free to open, inspect, and modify it yourself. It contains some information about the artifacts that were compiled. Individual models referenced by the index can be handed to ``benchmark`` directly along with an example input, or you may pass the entire index as in the basic example above. Here is an example index: .. code:: bash python3 -m json.tool model_index.json .. code:: json { "version": "0.0.0.0+0bc220a", "model_configs": [ { "filename": "models/model_b1_p1_38793jda.pt", "input_idx": 0, "batch_size": 1, "pipeline_size": 1, "compile_s": 5.32 } ] } An index is useful for keeping track of your compiled artifacts and their parameters. The advantages of using ``neuronperf.[torch/tensorflow/mxnet].compile`` are clearer when we wish to compile multiple variants of our model and benchmark all of them at the same time. All of the model artifacts and the index can be destroyed using ``model_index.delete('model_index.json')``. Benchmarking ============ When benchmarking with an index, there are some important details to keep in mind. If you originally built the index using a set of inputs, the model index has associated the ``inputs`` with the compiled models by their positional index. For example: .. code:: python batch_sizes = [1, 2] inputs = [torch.zeros((b, 100)) for b in batch_sizes] Here, ``inputs[0]`` corresponds to batch size 1. Therefore, the model index will contain a reference to input 0 for that model. When you call ``benchmark``, you must pass inputs with the same shape in the same positions as at compile time. .. note:: It's only necessary that there is an input with the correct shape at``inputs[input_index]``. The example data itself is not important. Working with Indexes -------------------- The API detail below describes utilities for working with indexes. An ``index`` can be either a loaded index (JSON) or the path to an index (it will be loaded automatically). Creating ======== .. code:: python index = neuronperf.model_index.create('/path/to/model', batch_size=1) filename = neuronperf.model_index.save(index) Once you have an index, you can pass its path directly to ``benchmark``. You can also pass a custom filename instead: .. code:: python index = neuronperf.model_index.create('/path/to/model', batch_size=1) neuronperf.model_index.save(index, 'my_index.json') Appending ========= If **multiple models use the same inputs**, you can append them together. For example, if you have the same batch size with multiple pipeline sizes, the inputs are the same, but the model changes. .. code:: python pipeline_sizes = [1, 2, 3, 4] indexes = [neuronperf.model_index.create(f'/path/to/model_p{p}', pipeline_size=p, batch_size=5) for p in pipeline_sizes] index = neuronperf.model_index.append(*indexes) neuronperf.model_index.save(index, 'my_index.json') Filtering ========= You can construct a new model index that is filtered by some parameter. For example, to get a new index with only batch sizes [1, 2], you could do: .. code:: python new_index = neuronperf.model_index.filter(index, batch_sizes=[1, 2]) You can also benchmark subset of a model index by passing only the subset parameters of interest, but remember to ensure you provide the correct number of inputs for the index (even if some are not used). For example, if you an index with models at ``batch_sizes = [1, 2, 3]``, but only wish to benchmark batch size 2: .. code:: python batch_sizes = [1, 2, 3] inputs = [torch.zeros((b, 100)) for b in batch_sizes] reports = neuronperf.torch.benchmark('model_index.json', inputs, batch_sizes=2) Copying ======= You can copy an index to a new location with ``neuronperf.model_index.copy(index, new_index_name, new_index_dir)``. This is mostly useful in combination with ``filter``/``append``. Deleting ======== If you wish to keep your compiled models, just delete the model index file yourself. If you want to delete your model index and all associated artifacts, use: .. code:: python neuronperf.model_index.delete('my_index.json') ================================================ FILE: archive/neuronperf/neuronperf_overview.rst ================================================ .. _neuronperf_overview: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 =================== NeuronPerf Overview =================== NeuronPerf is a lightweight Python library that can help you easily benchmark your models with Neuron hardware. NeuronPerf supports Neuron releases for PyTorch, Tensorflow, and MXNet. It is used internally by the Neuron team to generate performance benchmarking numbers. When interacting with NeuronPerf, you will typically import the base package along with one of the submodule wrappers, for example: .. code:: python import neuronperf import neuronperf.torch You may then benchmark and/or compile one or more models with NeuronPerf. For example, .. code:: python reports = neuronperf.torch.benchmark(model, inputs, ...) The ``compile`` and ``benchmark`` methods must be accessed through one of the supported framework submodules. Benchmarking ============ All NeuronPerf ``benchmark`` calls require a minimum of two arguments: 1. A filename 2. Inputs The filename may refer to: 1. A Neuron-compiled model (e.g. ``my_model.pt``) 2. A :ref:`Model Index `. A Model Index is useful for benchmarking more than one model in a single session. Compiling ========= NeuronPerf also provides a standard interface to all Neuron frameworks through the ``compile`` API. .. code:: python model_index = neuronperf.torch.compile(model, inputs, ...) This is completely optional. You may use the standard compilation guides for supported frameworks. Next Steps ========== Take a look at the simple :ref:`neuronperf_examples`, :ref:`neuronperf_benchmark_guide`, :ref:`neuronperf_compile_guide`, and :ref:`neuronperf_api`. ================================================ FILE: archive/neuronperf/neuronperf_terminology.rst ================================================ .. _neuronperf_terminology: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf Terminology ====================== * Model Inputs - An individual input or ``list`` of inputs - Example: ``inputs = [(torch.ones((batch_size, 5))) for batch_size in batch_sizes]`` - Each input is associated with the ``batch_sizes`` specified, in the same order - Each input is fed individually to a corresponding model - If an input is provided as a ``tuple``, it will be destructured to ``model(*input)`` to support multiple args - See :ref:`neuronperf_framework_notes` for framework-specific requirements * Latency - Time to execute a single ``model(input)`` - Typically measured in milliseconds * Model - Your data model; varies by framework. See :ref:`neuronperf_framework_notes` - Models may be wrapped by submodules (``torch``, ``tensorflow``, ``mxnet``) as callables * Model Index - A JSON file that tracks compiled model artifacts * Model Inputs - A ``tuple`` of inputs passed to a model, i.e. a single complete example - Example: ``input = (torch.ones((5, 3, 224, 224)),)`` * Throughput - Inferences / second ================================================ FILE: archive/neuronperf/neuronperf_troubleshooting.rst ================================================ .. _neuronperf_troubleshooting: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 NeuronPerf Troubleshooting ========================== .. contents:: Table of contents :local: :depth: 2 Compilation issues ^^^^^^^^^^^^^^^^^^ Model fails to compile ~~~~~~~~~~~~~~~~~~~~~~ Please `file a bug `_ with as much information as possible. Benchmarking Issues ^^^^^^^^^^^^^^^^^^^ Benchmarking terminates early with errors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Scroll up and read the output. Most likely causes are: - invalid input shapes or - not enough memory to load the requested number of model copies on the device. Try passing ``n_models=1`` to ``benchmark`` again to test for memory issues. Other Issues or Feature Requests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Please file a bug on `Github `_. ================================================ FILE: archive/neuronperf/rn.rst ================================================ .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 What's New ========== .. toctree:: :maxdepth: 1 /release-notes/components/dev-tools ================================================ FILE: archive/neuronperf/setup.cfg ================================================ [aliases] # Define this so we don't resolve to the wrong setuptools 'test' entrypoint when # invoking brazil-build test. test = brazil_test ================================================ FILE: archive/neuronperf/setup.py ================================================ import collections import os import subprocess from setuptools import find_packages, setup # Read __version__.py version_py = os.path.join("src", "neuronperf", "__version__.py") with open(version_py, "rt") as fp: lines = fp.readlines() meta = collections.OrderedDict() for line in lines: key, value = line.split("=") meta[key.strip()] = value.strip()[1:-1] # Extract fields for packaging TITLE = meta["__title__"] AUTHOR = meta["__author__"] DESCRIPTION = meta["__description__"] VERSION = os.getenv("BRAZIL_PACKAGE_VERSION", "0.0.0.0") LICENSE = meta["__license__"] # Compute release version and write back meta info for consistency. GIT_SHA = os.environ.get("BRAZIL_PACKAGE_CHANGE_ID") if GIT_SHA: GIT_SHA = GIT_SHA.strip()[:9] else: # This is probably a local build. Try to attach something meaningful. try: GIT_SHA = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode().strip() except: GIT_SHA = "0" * 9 VERSION = "{}+{}".format(VERSION.strip(), GIT_SHA) meta["__version__"] = VERSION with open(version_py, "wt") as fp: for k, v in meta.items(): fp.write('{} = "{}"\n'.format(k, v)) setup( name=TITLE, version=VERSION, description=DESCRIPTION, author=AUTHOR, license=LICENSE, classifiers=[ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Topic :: Scientific/Engineering :: Artificial Intelligence", "License :: Other/Proprietary License", "Programming Language :: Python :: 3.6", ], keywords="aws neuron", packages=find_packages(where="src", exclude=("test",)), install_requires=["dill==0.3.4", "numpy", "psutil==5.9.0"], python_requires=">=3.6", package_dir={"": "src"}, data_files=[], package_data={"": ["py.typed"]}, ) ================================================ FILE: archive/neuronperf/test_resnet50_pt.py ================================================ import torch import torch_neuron import neuronperf as npf import neuronperf.torch from torchvision import models # Load a pretrained ResNet50 model model = models.resnet50(pretrained=True) # Select a few batch sizes to test filename = 'resnet50.json' batch_sizes = [5, 6, 7] # Construct example inputs inputs = [torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) for batch_size in batch_sizes] # Compile npf.torch.compile( model, inputs, batch_sizes=batch_sizes, filename=filename, ) # Benchmark reports = npf.torch.benchmark(filename, inputs) # View and save results npf.print_reports(reports) npf.write_csv(reports, 'resnet50_results.csv') npf.write_json(reports, 'resnet50_results.json') ================================================ FILE: archive/neuronperf/test_simple_pt.py ================================================ import torch import torch.neuron import neuronperf as npf import neuronperf.torch # Define a simple model class Model(torch.nn.Module): def forward(self, x): x = x * 3 return x + 1 # Instantiate model = Model() model.eval() # Define some inputs batch_sizes = [1] inputs = [torch.ones((batch_size, 3, 224, 224)) for batch_size in batch_sizes] # Compile for Neuron model_neuron = torch.neuron.trace(model, inputs) model_neuron.save("model_neuron_b1.pt") # Benchmark reports = npf.torch.benchmark("model_neuron_b1.pt", inputs, batch_sizes) # View and save results npf.print_reports(reports) npf.write_csv(reports, "model_neuron_b1.csv") ================================================ FILE: archive/src/benchmark/pytorch/bert-base-cased_benchmark.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["bert-base-cased"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = neuronperf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/bert-base-cased_compile.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["bert-base-cased"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) neuronperf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/bert-base-uncased_benchmark.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["bert-base-uncased"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = neuronperf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/bert-base-uncased_compile.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["bert-base-uncased"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) neuronperf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/distilbert-base-uncased-finetuned-sst-2-english_benchmark.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilbert-base-uncased-finetuned-sst-2-english"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = neuronperf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/distilbert-base-uncased-finetuned-sst-2-english_compile.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilbert-base-uncased-finetuned-sst-2-english"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) neuronperf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/distilbert-base-uncased_benchmark.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilbert-base-uncased"] sequence_lengths = [128] batch_sizes = [9] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = neuronperf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/distilbert-base-uncased_compile.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilbert-base-uncased"] sequence_lengths = [128] batch_sizes = [9] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) neuronperf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/distilroberta-base_benchmark.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilroberta-base"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = neuronperf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/distilroberta-base_compile.py ================================================ import torch import torch.neuron import neuronperf import neuronperf.torch from transformers import AutoTokenizer, AutoModelForSequenceClassification # Add to these lists or change as needed model_names = ["distilroberta-base"] sequence_lengths = [128] batch_sizes = [6] pipeline_sizes = [1] def get_batch(tokenizer, sequence_length, batch_size): sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer.encode_plus( sequence_0, sequence_1, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="pt", ) inputs = ( torch.cat([paraphrase["input_ids"]] * batch_size, 0), torch.cat([paraphrase["attention_mask"]] * batch_size, 0), ) return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) neuronperf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/hf-google-vit_benchmark.py ================================================ import torch import neuronperf import neuronperf.torch import torch_neuronx from PIL import Image import requests from transformers import ViTImageProcessor, ViTForImageClassification def benchmark(batch_size): feature_extractor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224') model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224', torchscript=True) model.eval() url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = feature_extractor(images=image, return_tensors="pt") inputs = inputs['pixel_values'].repeat([batch_size, 1, 1, 1]) example = (inputs,) traced = torch_neuronx.trace(model, example, compiler_args="--model-type=transformer") filename = 'model.pt' torch.jit.save(traced, filename) reports = neuronperf.torch.benchmark(filename, [example], batch_sizes=[batch_size]) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) if __name__ == '__main__': # Use batch_size = 1 for best latency, batch_size = 2 for best throughput benchmark(batch_size=2) ================================================ FILE: archive/src/benchmark/pytorch/hf-openai-clip_benchmark.py ================================================ import torch import neuronperf import neuronperf.torch import torch_neuronx import os from torchvision.datasets import CIFAR100 from transformers import CLIPProcessor, CLIPModel def benchmark(model_name, batch_size): # Build the model, preprocessor, and dataset cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False) processor = CLIPProcessor.from_pretrained(model_name) model = CLIPModel.from_pretrained(model_name, return_dict=False) # Prepare a sample input image = cifar100[0][0] text = [] for c in cifar100.classes: text.append(f'a photo of a {c}') inputs = processor(text=text, images=image, return_tensors="pt", padding=True) image = inputs['pixel_values'] # (b, c, h, w) image = image.repeat(batch_size, 1, 1, 1) inputs = (inputs['input_ids'], image) # Trace the model model.eval() traced = torch_neuronx.trace(model, inputs, compiler_args='--enable-saturate-infinity') filename = 'model.pt' torch.jit.save(traced, filename) reports = neuronperf.torch.benchmark(filename, [inputs], batch_sizes=[batch_size]) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) if __name__ == '__main__': # Recommended batch sizes for throughput # openai/clip-vit-base-patch32: 64 # openai/clip-vit-large-patch14: 4 model_name = 'openai/clip-vit-base-patch32' batch_size = 64 benchmark(model_name, batch_size) ================================================ FILE: archive/src/benchmark/pytorch/hf_pretrained_wav2vec2_conformer_relpos_benchmark.py ================================================ import torch import torch_neuronx from datasets import load_dataset from transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC import neuronperf as npf import neuronperf.torch BATCH_SIZE = 1 def benchmark(): processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-conformer-rel-pos-large-960h-ft") model = Wav2Vec2ConformerForCTC.from_pretrained("facebook/wav2vec2-conformer-rel-pos-large-960h-ft") model.eval() # take the first entry in the dataset as our input ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation", trust_remote_code=True) inputs = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest", sampling_rate=16_000).input_values inputs = inputs.repeat([BATCH_SIZE, 1]) example = (inputs,) traced = torch_neuronx.trace(model, example, compiler_args='--model-type=transformer') filename = 'model.pt' torch.jit.save(traced, filename) model_neuron = torch.jit.load(filename) output = model_neuron(inputs) print(f"output is {output}") reports = neuronperf.torch.benchmark(filename, [example], multiprocess=False, batch_sizes=[BATCH_SIZE]) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) if __name__ == '__main__': benchmark() ================================================ FILE: archive/src/benchmark/pytorch/hf_pretrained_wav2vec2_conformer_rope_benchmark.py ================================================ import torch import torch_neuronx from datasets import load_dataset from transformers import Wav2Vec2Processor, Wav2Vec2ConformerForCTC import neuronperf as npf import neuronperf.torch BATCH_SIZE = 1 def benchmark(): processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-conformer-rope-large-960h-ft") model = Wav2Vec2ConformerForCTC.from_pretrained("facebook/wav2vec2-conformer-rope-large-960h-ft") model.eval() # take the first entry in the dataset as our input ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation", trust_remote_code=True) inputs = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest", sampling_rate=16_000).input_values inputs = inputs.repeat([BATCH_SIZE, 1]) example = (inputs,) traced = torch_neuronx.trace(model, example, compiler_args='--model-type=transformer') filename = 'model.pt' torch.jit.save(traced, filename) model_neuron = torch.jit.load(filename) output = model_neuron(inputs) print(f"output is {output}") reports = neuronperf.torch.benchmark(filename, [example], multiprocess=False, batch_sizes=[BATCH_SIZE]) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) if __name__ == '__main__': benchmark() ================================================ FILE: archive/src/benchmark/pytorch/inf2_benchmark.py ================================================ # primary Script used for inf2 Benchmarking import torch import neuronperf import neuronperf.torch import torch_neuronx from transformers import ( AutoModel, AutoModelForSequenceClassification # Any other model class respective to the model we want to infer on ) class GPT2Neuron(torch.nn.Module): def __init__(self, model) -> None: super().__init__() self.model = model def forward(self, input_ids, attention_mask): return self.model(input_ids=input_ids, attention_mask=attention_mask, use_cache=False) def benchmark(model_name, batch_size, sequence_length): model = AutoModel.from_pretrained(model_name, torchscript=True) if 'gpt2' in model_name: model = GPT2Neuron(model) model.eval() example = ( torch.zeros(batch_size, sequence_length, dtype=torch.int), # input_ids torch.zeros(batch_size, sequence_length, dtype=torch.int), # attention_mask ) traced = torch_neuronx.trace(model, example) filename = 'model.pt' torch.jit.save(traced, filename) reports = neuronperf.torch.benchmark(filename, [example]) # View and save results print("======== {} ========".format(filename)) neuronperf.print_reports(reports) neuronperf.write_csv(reports) neuronperf.write_json(reports) if __name__ == '__main__': # benchmark(model_name, batch_size, sequence_length) # Below are a few examples - # benchmark('bert-base-cased', 16, 128) # benchmark('bert-base-uncased', 4, 128) # benchmark('gpt2', 16, 256) ================================================ FILE: archive/src/benchmark/pytorch/opt_benchmark.py ================================================ import os import neuronperf as npf import torch from transformers import AutoTokenizer """ Run the sample at this link to get the split model state_dict (opt-13b-split): https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/facebook-opt-13b-sampling.ipynb Make sure transformers is installed Change the variables below for opt30b or opt66b models """ BATCH_SIZE = 2 TP_DEGREE = 2 SEQ_LEN = 2048 TOKENIZER = AutoTokenizer.from_pretrained("facebook/opt-13b") MODEL_DIR = "./opt-13b-split" class Wrapper(torch.nn.Module): def __init__(self, filename): super().__init__() from transformers_neuronx.opt.model import OPTForSampling self.neuron_model = OPTForSampling.from_pretrained( filename, batch_size=BATCH_SIZE, tp_degree=TP_DEGREE, amp="f16" ) self.neuron_model.to_neuron() def forward(self, *inputs): return self.neuron_model.sample(torch.concat(inputs), sequence_length=SEQ_LEN) # Custom load to let our Wrapper class handle things def load_fn(filename, **kwargs): return Wrapper(filename) # NeuronPerf can't see tp_degree at the moment, so just expose all cores def env_setup_fn(*_): del os.environ["NEURON_RT_VISIBLE_CORES"] def preprocess_fn(inputs): return [TOKENIZER.encode(text, return_tensors="pt") for text in inputs] def postprocess_fn(outputs): return [TOKENIZER.decode(seq) for seq in outputs] def benchmark(): inputs = ["Hello, I'm a language model,"] * BATCH_SIZE reports = npf.benchmark( load_fn, MODEL_DIR, [inputs], # treat batch as 1 input and let Wrapper handle batching batch_sizes=1, # ^ n_models=1, # only load 1 copy of model max_infers=5, max_duration=0, # sampling can take a while, so let's not timeout workers_per_model=1, # no bottleneck on model inputs, so 1 is fine env_setup_fn=env_setup_fn, preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn, ) # grab the only report (we only benchmarked 1 config) report = reports[0] # let's update throughput to be tokens / second and add a new record new_tokens = sum(SEQ_LEN - len(TOKENIZER.encode(i)) for i in inputs) tokens_per_s = round(new_tokens / (report["latency_ms_avg"] / 1000), 2) report["throughput_avg"] = report["tokens_per_s"] = tokens_per_s # display and save results npf.print_report(report) print(f"Results saved to: {npf.write_json(report)}") if __name__ == "__main__": benchmark() ================================================ FILE: archive/src/benchmark/pytorch/perceiver-multimodal_benchmark.py ================================================ import base64 import os import ssl import re from urllib import request import time import random from tqdm import tqdm import numpy as np import math from typing import Optional, Tuple, Union from transformers import PerceiverForMultimodalAutoencoding from transformers.modeling_outputs import BaseModelOutputWithCrossAttentions from transformers.models.perceiver.modeling_perceiver import PerceiverBasicDecoder, PerceiverClassifierOutput from transformers.models.perceiver.modeling_perceiver import restructure import torch import torch.nn as nn import torch_neuronx # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E pipeline models. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class MultimodalPerceiverWrapper(nn.Module): def __init__(self, perceiver_model, nchunks, image_chunk_size, audio_chunk_size): super().__init__() self.perceiver_model = perceiver_model self.nchunks = nchunks self.image_chunk_size = image_chunk_size self.audio_chunk_size = audio_chunk_size def forward(self, inputs: torch.FloatTensor, neuron_decoder, attention_mask: Optional[torch.FloatTensor] = None, head_mask: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None): output_attentions = output_attentions if output_attentions is not None else self.perceiver_model.config.output_attentions output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.perceiver_model.config.output_hidden_states ) return_dict = return_dict if return_dict is not None else self.perceiver_model.config.use_return_dict if self.perceiver_model.input_preprocessor is not None: inputs, modality_sizes, inputs_without_pos = self.perceiver_model.input_preprocessor(inputs) else: modality_sizes = None inputs_without_pos = None if inputs.size()[-1] != self.perceiver_model.config.d_model: raise ValueError( f"Last dimension of the inputs: {inputs.size()[-1]} doesn't correspond to config.d_model:" f" {self.perceiver_model.config.d_model}. Make sure to set config.d_model appropriately." ) batch_size, seq_length, _ = inputs.size() device = inputs.device # If no attention mask is provided, make them all ones if attention_mask is None: attention_mask = torch.ones((batch_size, seq_length), device=device) # Make the attention mask broadcastable to [batch_size, num_heads, seq_length, seq_length] extended_attention_mask = self.perceiver_model.invert_attention_mask(attention_mask) head_mask = self.perceiver_model.get_head_mask(head_mask, self.perceiver_model.config.num_blocks * self.perceiver_model.config.num_self_attends_per_block) embedding_output = self.perceiver_model.embeddings(batch_size=batch_size) encoder_outputs = self.perceiver_model.encoder( embedding_output, attention_mask=None, head_mask=head_mask, inputs=inputs, inputs_mask=extended_attention_mask, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) sequence_output = encoder_outputs[0] logits = None reconstruction = {} for chunk_idx in tqdm(range(self.nchunks)): subsampled_output_points = { 'image': torch.arange( self.image_chunk_size * chunk_idx, self.image_chunk_size * (chunk_idx + 1)).to(device), 'audio': torch.arange( self.audio_chunk_size * chunk_idx, self.audio_chunk_size * (chunk_idx + 1)).to(device), 'label': None, } logits = neuron_decoder(sequence_output, extended_attention_mask, inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points) reconstruction['label'] = logits['label'] if 'image' not in reconstruction: reconstruction['image'] = logits['image'] reconstruction['audio'] = logits['audio'] else: reconstruction['image'] = torch.cat( [reconstruction['image'], logits['image']], dim=1) reconstruction['audio'] = torch.cat( [reconstruction['audio'], logits['audio']], dim=1) del logits return reconstruction def custom_model_forward( self, nchunks, image_chunk_size, audio_chunk_size, neuron_decoder, inputs: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, head_mask: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, ) -> Union[Tuple, PerceiverClassifierOutput]: return_dict = return_dict if return_dict is not None else self.config.use_return_dict perceiver_wrapper = MultimodalPerceiverWrapper(self.perceiver, nchunks, image_chunk_size, audio_chunk_size) outputs = perceiver_wrapper( inputs, neuron_decoder, attention_mask=attention_mask, head_mask=head_mask, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) return outputs def custom_decoder_query(self, inputs, modality_sizes=None, inputs_without_pos=None, subsampled_points=None): if self.position_encoding_type == "none": # Queries come from elsewhere raise ValueError("You cannot construct decoder queries when position_encoding_type is set to none") if subsampled_points is not None: # subsampled_points are the indices if the inputs would be flattened # however, the inputs aren't flattened, that's why we use unravel_index # to get the indices for the unflattened array # unravel_index returns a tuple (x_idx, y_idx, ...) # stack to get the [n, d] tensor of coordinates def unravel_indices(indices, shape): coord = [] for dim in reversed(shape): coord.append(indices % dim) indices = indices // dim coord = torch.stack(coord[::-1], dim=-1) return coord pos = unravel_indices(subsampled_points, self.output_index_dims) batch_size = inputs.shape[0] # Map these coordinates to [-1, 1] pos = -1 + 2 * pos / torch.tensor(self.output_index_dims)[None, :] pos = torch.broadcast_to(pos[None], [batch_size, pos.shape[0], pos.shape[1]]) # Construct the position encoding. if self.position_encoding_type == "trainable": pos_emb = self.output_position_encodings(batch_size) elif self.position_encoding_type == "fourier": pos_emb = self.output_position_encodings( self.output_index_dims, batch_size=batch_size, device=inputs.device, dtype=inputs.dtype, pos=pos ) # Optionally project them to a target dimension. pos_emb = self.positions_projection(pos_emb) pos_emb = torch.reshape(pos_emb, [pos_emb.shape[0], -1, pos_emb.shape[-1]]) else: batch_size = inputs.shape[0] index_dims = inputs.shape[2:] # Construct the position encoding. if self.position_encoding_type == "trainable": pos_emb = self.output_position_encodings(batch_size) elif self.position_encoding_type == "fourier": pos_emb = self.output_position_encodings( index_dims, batch_size, device=inputs.device, dtype=inputs.dtype ) # Optionally project them to a target dimension. pos_emb = self.positions_projection(pos_emb) if self.concat_preprocessed_input: if inputs_without_pos is None: raise ValueError("Value is required for inputs_without_pos if concat_preprocessed_input is True") pos_emb = torch.cat([inputs_without_pos, pos_emb], dim=-1) return pos_emb # Define wrapper for tracing encoder class EncoderWrapper(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder def forward(self, embedding_output, inputs, extended_attention_mask): output = self.encoder(embedding_output, inputs=inputs, inputs_mask=extended_attention_mask) return output class NeuronEncoder(nn.Module): def __init__(self, encoder_wrapper): super().__init__() self.encoder_wrapper = encoder_wrapper def forward(self, hidden_states: torch.Tensor, attention_mask: Optional[torch.FloatTensor] = None, head_mask: Optional[torch.FloatTensor] = None, inputs: Optional[torch.FloatTensor] = None, inputs_mask: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True): last_hidden_states = self.encoder_wrapper(hidden_states, inputs, inputs_mask)['last_hidden_state'] return BaseModelOutputWithCrossAttentions(last_hidden_state=last_hidden_states) # Define wrapper for tracing decoder class DecoderWrapper(nn.Module): def __init__(self, decoder, decoder_query_audio, decoder_query_image, decoder_query_label, output_postprocessor): super().__init__() self.decoder = decoder self.decoder_query_audio = decoder_query_audio self.decoder_query_image = decoder_query_image self.decoder_query_label = decoder_query_label self.output_postprocessor = output_postprocessor self.num_query_channels = decoder.num_query_channels def forward(self, z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding, image_input, image_input_without_pos, image_subsampled_point, image_padding, label_input, label_input_without_pos, label_padding): audio_query = self.decoder_query_audio(inputs=audio_input, inputs_without_pos=audio_input_without_pos, subsampled_points=audio_subsampled_point) image_query = self.decoder_query_image(inputs=image_input, inputs_without_pos=image_input_without_pos, subsampled_points=image_subsampled_point) label_query = self.decoder_query_label(inputs=label_input, inputs_without_pos=label_input_without_pos) def embed(x, pos): x = torch.reshape(x, [x.shape[0], np.prod(x.shape[1:-1]), x.shape[-1]]) pos = torch.broadcast_to(pos, [x.shape[0], x.shape[1], self.num_query_channels - x.shape[2]]) return torch.cat([x, pos], dim=2) audio_padded = embed(audio_query, audio_padding) image_padded = embed(image_query, image_padding) label_padded = embed(label_query, label_padding) decoder_query = torch.cat([audio_padded, image_padded, label_padded], dim=1) logits = self.decoder(decoder_query, z, query_mask).logits output_modality_sizes = {"audio": audio_subsampled_point.shape[0], "image": image_subsampled_point.shape[0], "label": 1} logits = self.output_postprocessor(logits, modality_sizes=output_modality_sizes) return logits class NeuronDecoder(nn.Module): def __init__(self, decoder_wrapper): super().__init__() self.decoder_wrapper = decoder_wrapper self.modalities = decoder_wrapper.decoder.modalities self.padding = decoder_wrapper.decoder.padding def forward(self, z, query_mask, inputs, modality_sizes, inputs_without_pos=None, subsampled_points=None, output_attentions=False): # Partition the flat inputs among the different modalities inputs = restructure(modality_sizes, inputs) assert(subsampled_points is not None) assert(inputs_without_pos is not None) for modality, decoder in self.modalities.items(): if modality == "audio": audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality] elif modality == "image": image_input, image_input_without_pos, image_subsampled_point, image_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality] else: # label doesn't have subsampled point label_input, label_input_without_pos, label_padding = inputs[modality], inputs_without_pos[modality], self.padding[modality] assert(audio_input_without_pos is not None) assert(audio_subsampled_point is not None) assert(image_input_without_pos is not None) assert(image_subsampled_point is not None) assert(label_input_without_pos is not None) output = self.decoder_wrapper(z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding, image_input, image_input_without_pos, image_subsampled_point, image_padding, label_input, label_input_without_pos, label_padding) return output # -- Load compiled models -- model = PerceiverForMultimodalAutoencoding.from_pretrained("deepmind/multimodal-perceiver", low_cpu_mem_usage=True) PerceiverForMultimodalAutoencoding.forward = custom_model_forward PerceiverBasicDecoder.decoder_query = custom_decoder_query COMPILER_WORKDIR_ROOT="perceiver_multimodal_compile_dir" COMPILER_WORKDIR_DECODER = os.path.join(COMPILER_WORKDIR_ROOT, "decoder") COMPILER_WORKDIR_ENCODER = os.path.join(COMPILER_WORKDIR_ROOT, "encoder") # load saved encoder from disk encoder_fname = os.path.join(COMPILER_WORKDIR_ENCODER, 'model.pt') neuron_encoder = NeuronEncoder(EncoderWrapper(model.perceiver.encoder)) neuron_encoder.encoder_wrapper = torch.jit.load(encoder_fname) model.perceiver.encoder = neuron_encoder # load saved decoder from disk decoder_fname = os.path.join(COMPILER_WORKDIR_DECODER, 'model.pt') neuron_decoder = NeuronDecoder(DecoderWrapper(model.perceiver.decoder, model.perceiver.decoder.modalities['audio'].decoder_query, \ model.perceiver.decoder.modalities['image'].decoder_query, model.perceiver.decoder.modalities['label'].decoder_query, \ model.perceiver.output_postprocessor)) neuron_decoder.decoder_wrapper = torch.jit.load(decoder_fname) # Inference function def autoencode_video(images, audio, nchunks, image_chunk_size, audio_chunk_size): input_image = torch.from_numpy(np.moveaxis(images, -1, 2)).to(torch.float32) input_audio = torch.from_numpy(audio).to(torch.float32) input_label = torch.zeros((images.shape[0], 700)) inputs = {'image': input_image, 'audio': input_audio, 'label':input_label} reconstruction = {} with torch.no_grad(): reconstruction = model(nchunks, image_chunk_size, audio_chunk_size, neuron_decoder, inputs=inputs) # reshape image and audio modalities back to original shape reconstruction['image'] = torch.reshape(reconstruction['image'], images.shape) reconstruction['audio'] = torch.reshape(reconstruction['audio'], audio.shape) return reconstruction # Generate random image for benchmarking AUDIO_SAMPLES_PER_PATCH = 16 image = np.random.random(size=(1, 16, 224, 224, 3)) audio = np.random.random(size=(1, 30720, 1)) nchunks = 128 image_chunk_size = np.prod(image.shape[1:-1]) // nchunks audio_chunk_size = audio.shape[1] // AUDIO_SAMPLES_PER_PATCH // nchunks n_runs = 20 model_inputs = (image, audio, nchunks, image_chunk_size, audio_chunk_size) benchmark(n_runs, "perceiver-multimodal", autoencode_video, model_inputs) ================================================ FILE: archive/src/benchmark/pytorch/perceiver-multimodal_compile.py ================================================ import base64 import os import ssl import re from urllib import request import time import random from tqdm import tqdm import numpy as np from typing import Optional, Tuple, Union from transformers import PerceiverForMultimodalAutoencoding from transformers.modeling_outputs import BaseModelOutputWithCrossAttentions from transformers.models.perceiver.modeling_perceiver import PerceiverBasicDecoder, PerceiverClassifierOutput from transformers.models.perceiver.modeling_perceiver import restructure import torch import torch.nn as nn import torch_neuronx class MultimodalPerceiverWrapper(nn.Module): def __init__(self, perceiver_model, nchunks, image_chunk_size, audio_chunk_size): super().__init__() self.perceiver_model = perceiver_model self.nchunks = nchunks self.image_chunk_size = image_chunk_size self.audio_chunk_size = audio_chunk_size def forward(self, inputs: torch.FloatTensor, neuron_decoder, attention_mask: Optional[torch.FloatTensor] = None, head_mask: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None): output_attentions = output_attentions if output_attentions is not None else self.perceiver_model.config.output_attentions output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.perceiver_model.config.output_hidden_states ) return_dict = return_dict if return_dict is not None else self.perceiver_model.config.use_return_dict if self.perceiver_model.input_preprocessor is not None: inputs, modality_sizes, inputs_without_pos = self.perceiver_model.input_preprocessor(inputs) else: modality_sizes = None inputs_without_pos = None if inputs.size()[-1] != self.perceiver_model.config.d_model: raise ValueError( f"Last dimension of the inputs: {inputs.size()[-1]} doesn't correspond to config.d_model:" f" {self.perceiver_model.config.d_model}. Make sure to set config.d_model appropriately." ) batch_size, seq_length, _ = inputs.size() device = inputs.device # If no attention mask is provided, make them all ones if attention_mask is None: attention_mask = torch.ones((batch_size, seq_length), device=device) # Make the attention mask broadcastable to [batch_size, num_heads, seq_length, seq_length] extended_attention_mask = self.perceiver_model.invert_attention_mask(attention_mask) head_mask = self.perceiver_model.get_head_mask(head_mask, self.perceiver_model.config.num_blocks * self.perceiver_model.config.num_self_attends_per_block) embedding_output = self.perceiver_model.embeddings(batch_size=batch_size) encoder_outputs = self.perceiver_model.encoder( embedding_output, attention_mask=None, head_mask=head_mask, inputs=inputs, inputs_mask=extended_attention_mask, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) sequence_output = encoder_outputs[0] logits = None reconstruction = {} for chunk_idx in tqdm(range(self.nchunks)): subsampled_output_points = { 'image': torch.arange( self.image_chunk_size * chunk_idx, self.image_chunk_size * (chunk_idx + 1)).to(device), 'audio': torch.arange( self.audio_chunk_size * chunk_idx, self.audio_chunk_size * (chunk_idx + 1)).to(device), 'label': None, } logits = neuron_decoder(sequence_output, extended_attention_mask, inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points) reconstruction['label'] = logits['label'] if 'image' not in reconstruction: reconstruction['image'] = logits['image'] reconstruction['audio'] = logits['audio'] else: reconstruction['image'] = torch.cat( [reconstruction['image'], logits['image']], dim=1) reconstruction['audio'] = torch.cat( [reconstruction['audio'], logits['audio']], dim=1) del logits return reconstruction def custom_model_forward( self, nchunks, image_chunk_size, audio_chunk_size, neuron_decoder, inputs: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, head_mask: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None, ) -> Union[Tuple, PerceiverClassifierOutput]: return_dict = return_dict if return_dict is not None else self.config.use_return_dict perceiver_wrapper = MultimodalPerceiverWrapper(self.perceiver, nchunks, image_chunk_size, audio_chunk_size) outputs = perceiver_wrapper( inputs, neuron_decoder, attention_mask=attention_mask, head_mask=head_mask, output_attentions=output_attentions, output_hidden_states=output_hidden_states, return_dict=return_dict, ) return outputs def custom_decoder_query(self, inputs, modality_sizes=None, inputs_without_pos=None, subsampled_points=None): if self.position_encoding_type == "none": # Queries come from elsewhere raise ValueError("You cannot construct decoder queries when position_encoding_type is set to none") if subsampled_points is not None: # subsampled_points are the indices if the inputs would be flattened # however, the inputs aren't flattened, that's why we use unravel_index # to get the indices for the unflattened array # unravel_index returns a tuple (x_idx, y_idx, ...) # stack to get the [n, d] tensor of coordinates def unravel_indices(indices, shape): coord = [] for dim in reversed(shape): coord.append(indices % dim) indices = indices // dim coord = torch.stack(coord[::-1], dim=-1) return coord pos = unravel_indices(subsampled_points, self.output_index_dims) batch_size = inputs.shape[0] # Map these coordinates to [-1, 1] pos = -1 + 2 * pos / torch.tensor(self.output_index_dims)[None, :] pos = torch.broadcast_to(pos[None], [batch_size, pos.shape[0], pos.shape[1]]) # Construct the position encoding. if self.position_encoding_type == "trainable": pos_emb = self.output_position_encodings(batch_size) elif self.position_encoding_type == "fourier": pos_emb = self.output_position_encodings( self.output_index_dims, batch_size=batch_size, device=inputs.device, dtype=inputs.dtype, pos=pos ) # Optionally project them to a target dimension. pos_emb = self.positions_projection(pos_emb) pos_emb = torch.reshape(pos_emb, [pos_emb.shape[0], -1, pos_emb.shape[-1]]) else: batch_size = inputs.shape[0] index_dims = inputs.shape[2:] # Construct the position encoding. if self.position_encoding_type == "trainable": pos_emb = self.output_position_encodings(batch_size) elif self.position_encoding_type == "fourier": pos_emb = self.output_position_encodings( index_dims, batch_size, device=inputs.device, dtype=inputs.dtype ) # Optionally project them to a target dimension. pos_emb = self.positions_projection(pos_emb) if self.concat_preprocessed_input: if inputs_without_pos is None: raise ValueError("Value is required for inputs_without_pos if concat_preprocessed_input is True") pos_emb = torch.cat([inputs_without_pos, pos_emb], dim=-1) return pos_emb # Define wrapper for tracing encoder class EncoderWrapper(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder def forward(self, embedding_output, inputs, extended_attention_mask): output = self.encoder(embedding_output, inputs=inputs, inputs_mask=extended_attention_mask) return output class NeuronEncoder(nn.Module): def __init__(self, encoder_wrapper): super().__init__() self.encoder_wrapper = encoder_wrapper def forward(self, hidden_states: torch.Tensor, attention_mask: Optional[torch.FloatTensor] = None, head_mask: Optional[torch.FloatTensor] = None, inputs: Optional[torch.FloatTensor] = None, inputs_mask: Optional[torch.FloatTensor] = None, output_attentions: Optional[bool] = False, output_hidden_states: Optional[bool] = False, return_dict: Optional[bool] = True): last_hidden_states = self.encoder_wrapper(hidden_states, inputs, inputs_mask)['last_hidden_state'] return BaseModelOutputWithCrossAttentions(last_hidden_state=last_hidden_states) # Define wrapper for tracing decoder class DecoderWrapper(nn.Module): def __init__(self, decoder, decoder_query_audio, decoder_query_image, decoder_query_label, output_postprocessor): super().__init__() self.decoder = decoder self.decoder_query_audio = decoder_query_audio self.decoder_query_image = decoder_query_image self.decoder_query_label = decoder_query_label self.output_postprocessor = output_postprocessor self.num_query_channels = decoder.num_query_channels def forward(self, z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding, image_input, image_input_without_pos, image_subsampled_point, image_padding, label_input, label_input_without_pos, label_padding): audio_query = self.decoder_query_audio(inputs=audio_input, inputs_without_pos=audio_input_without_pos, subsampled_points=audio_subsampled_point) image_query = self.decoder_query_image(inputs=image_input, inputs_without_pos=image_input_without_pos, subsampled_points=image_subsampled_point) label_query = self.decoder_query_label(inputs=label_input, inputs_without_pos=label_input_without_pos) def embed(x, pos): x = torch.reshape(x, [x.shape[0], np.prod(x.shape[1:-1]), x.shape[-1]]) pos = torch.broadcast_to(pos, [x.shape[0], x.shape[1], self.num_query_channels - x.shape[2]]) return torch.cat([x, pos], dim=2) audio_padded = embed(audio_query, audio_padding) image_padded = embed(image_query, image_padding) label_padded = embed(label_query, label_padding) decoder_query = torch.cat([audio_padded, image_padded, label_padded], dim=1) logits = self.decoder(decoder_query, z, query_mask).logits output_modality_sizes = {"audio": audio_subsampled_point.shape[0], "image": image_subsampled_point.shape[0], "label": 1} logits = self.output_postprocessor(logits, modality_sizes=output_modality_sizes) return logits class NeuronDecoder(nn.Module): def __init__(self, decoder_wrapper): super().__init__() self.decoder_wrapper = decoder_wrapper self.modalities = decoder_wrapper.decoder.modalities self.padding = decoder_wrapper.decoder.padding def forward(self, z, query_mask, inputs, modality_sizes, inputs_without_pos=None, subsampled_points=None, output_attentions=False): # Partition the flat inputs among the different modalities inputs = restructure(modality_sizes, inputs) assert(subsampled_points is not None) assert(inputs_without_pos is not None) for modality, decoder in self.modalities.items(): if modality == "audio": audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality] elif modality == "image": image_input, image_input_without_pos, image_subsampled_point, image_padding = inputs[modality], inputs_without_pos[modality], subsampled_points[modality].to(torch.float32), self.padding[modality] else: # label doesn't have subsampled point label_input, label_input_without_pos, label_padding = inputs[modality], inputs_without_pos[modality], self.padding[modality] assert(audio_input_without_pos is not None) assert(audio_subsampled_point is not None) assert(image_input_without_pos is not None) assert(image_subsampled_point is not None) assert(label_input_without_pos is not None) output = self.decoder_wrapper(z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding, image_input, image_input_without_pos, image_subsampled_point, image_padding, label_input, label_input_without_pos, label_padding) return output model = PerceiverForMultimodalAutoencoding.from_pretrained("deepmind/multimodal-perceiver", low_cpu_mem_usage=True) COMPILER_WORKDIR_ROOT="perceiver_multimodal_compile_dir" PerceiverForMultimodalAutoencoding.forward = custom_model_forward PerceiverBasicDecoder.decoder_query = custom_decoder_query # --- Compile Encoder --- # Define sample inputs for tracing encoder embedding_output = torch.randn(1, 784, 512) sample_inputs = torch.randn(1, 52097, 704) extended_attention_mask = torch.zeros(1, 1, 1, 52097) # Wrap and trace the encoder, save the traced encoder COMPILER_WORKDIR_ENCODER = os.path.join(COMPILER_WORKDIR_ROOT, "encoder") neuron_encoder = NeuronEncoder(EncoderWrapper(model.perceiver.encoder)) # You might see a warning from trace about unused input - these are safe to ignore. print("Compiling Encoder...") neuron_encoder.encoder_wrapper = torch_neuronx.trace( neuron_encoder.encoder_wrapper, (embedding_output, sample_inputs, extended_attention_mask), compiler_workdir=COMPILER_WORKDIR_ENCODER, compiler_args=[f"--temp-dir={COMPILER_WORKDIR_ENCODER}", "--auto-cast=none"] # --auto-cast=none is needed to avoid numerical error. ) # Save compiled encoder encoder_fname = os.path.join(COMPILER_WORKDIR_ENCODER, 'model.pt') torch.jit.save(neuron_encoder.encoder_wrapper, encoder_fname) # --- Compile Decoder --- # Define sample inputs for tracing decoder z = torch.randn(1, 784, 512) query_mask = torch.zeros(1, 1, 1, 52097) audio_input = torch.randn(1, 1920, 704) audio_input_without_pos = torch.randn(1, 1920, 16) audio_subsampled_point = torch.arange(0, 15, dtype=torch.float32) # 15 = 1920/128 audio_padding = torch.randn(1, 641) image_input = torch.randn(1, 50176, 704) image_input_without_pos = torch.randn(1, 50176, 48) image_subsampled_point = torch.arange(0, 6272, dtype=torch.float32) # 6272 = 224*224*16/128 image_padding = torch.randn(1, 831) label_input = torch.randn(1, 1, 704) label_input_without_pos = torch.randn(1, 1, 700) label_padding = torch.randn(1, 2) # Wrap and trace the decoder, save the traced decoder COMPILER_WORKDIR_DECODER = os.path.join(COMPILER_WORKDIR_ROOT, "decoder") neuron_decoder = NeuronDecoder(DecoderWrapper(model.perceiver.decoder, model.perceiver.decoder.modalities['audio'].decoder_query, \ model.perceiver.decoder.modalities['image'].decoder_query, model.perceiver.decoder.modalities['label'].decoder_query, \ model.perceiver.output_postprocessor)) # You might see a warning from trace about unused input - these are safe to ignore. print("Compiling decoder...") neuron_decoder.decoder_wrapper = torch_neuronx.trace( neuron_decoder.decoder_wrapper, (z, query_mask, audio_input, audio_input_without_pos, audio_subsampled_point, audio_padding, image_input, image_input_without_pos, image_subsampled_point, image_padding, label_input, label_input_without_pos, label_padding), compiler_workdir=COMPILER_WORKDIR_DECODER, compiler_args=[f"--temp-dir={COMPILER_WORKDIR_DECODER}", "--auto-cast=none"] # --auto-cast=none is needed to avoid numerical error. ) # Save compiled decoder decoder_fname = os.path.join(COMPILER_WORKDIR_DECODER, 'model.pt') torch.jit.save(neuron_decoder.decoder_wrapper, decoder_fname) print("Done") ================================================ FILE: archive/src/benchmark/pytorch/perceiver-vision_benchmark.py ================================================ import torch import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed models_list = [ ("PerceiverForImageClassificationLearned", "deepmind/vision-perceiver-learned"), ("PerceiverForImageClassificationFourier", "deepmind/vision-perceiver-fourier"), ("PerceiverForImageClassificationConvProcessing", "deepmind/vision-perceiver-conv"), ] batch_sizes = [1] n_models = [1, 2] workers_per_model = [1, 2] # optimized for latency or throughput def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for class_name, pretrained_name in models_list: model_name = pretrained_name.split("/")[1] inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/perceiver-vision_compile.py ================================================ import torch import transformers # ==4.32.0 import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed models_list = [ ("PerceiverForImageClassificationLearned", "deepmind/vision-perceiver-learned"), ("PerceiverForImageClassificationFourier", "deepmind/vision-perceiver-fourier"), ("PerceiverForImageClassificationConvProcessing", "deepmind/vision-perceiver-conv"), ] batch_sizes = [1] pipeline_sizes = [1] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for class_name, pretrained_name in models_list: model_name = pretrained_name.split("/")[1] model = getattr(transformers, class_name).from_pretrained(pretrained_name) inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Compile print("Compiling {}".format(filename)) npf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/pixart_alpha_benchmark.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" os.environ["NEURON_CUSTOM_SILU"] = "1" import copy import diffusers import math import numpy as npy import time import torch import torch_neuronx import torch.nn as nn import torch.nn.functional as F from diffusers import PixArtAlphaPipeline from diffusers import Transformer2DModel from IPython.display import clear_output from matplotlib import image as mpimg from matplotlib import pyplot as plt from torch import nn import torch from torch import nn from transformers.models.t5.modeling_t5 import T5EncoderModel from diffusers import Transformer2DModel # Define datatype DTYPE = torch.bfloat16 # Specialized benchmarking class for PixArt models. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E PixArt performance, # because the top-level PixArt pipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because PixArt pipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class InferenceTextEncoderWrapper(nn.Module): def __init__(self, dtype, t: T5EncoderModel, seqlen: int): super().__init__() self.dtype = dtype self.device = t.device self.t = t def forward(self, text_input_ids, attention_mask=None): return [self.t(text_input_ids, attention_mask)['last_hidden_state'].to(self.dtype)] class InferenceTransformerWrapper(nn.Module): def __init__(self, transformer: Transformer2DModel): super().__init__() self.transformer = transformer self.config = transformer.config self.dtype = transformer.dtype self.device = transformer.device def forward(self, hidden_states, encoder_hidden_states=None, timestep=None, encoder_attention_mask=None, added_cond_kwargs=None, return_dict=False): output = self.transformer( hidden_states, encoder_hidden_states, timestep, encoder_attention_mask) return output class SimpleWrapper(nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, x): output = self.model(x) return output # --- Load all compiled models and benchmark pipeline --- def get_pipe(resolution, dtype): if resolution == 256: transformer: Transformer2DModel = Transformer2DModel.from_pretrained( "PixArt-alpha/PixArt-XL-2-256x256", subfolder="transformer", torch_dtype=dtype) return PixArtAlphaPipeline.from_pretrained( "PixArt-alpha/PixArt-XL-2-512x512", transformer=transformer, torch_dtype=dtype) elif resolution == 512: return PixArtAlphaPipeline.from_pretrained( "PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=dtype) else: raise Exception(f"Unsupport resolution {resolution} for pixart alpha") COMPILER_WORKDIR_ROOT = 'pixart_alpha_compile_dir' text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') transformer_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'transformer/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') # Select the desired resolution () resolution = 256 # resolution = 512 pipe = get_pipe(resolution, DTYPE) seqlen = 120 _neuronTextEncoder = InferenceTextEncoderWrapper(DTYPE, pipe.text_encoder, seqlen) _neuronTextEncoder.t = torch.jit.load(text_encoder_filename) pipe.text_encoder = _neuronTextEncoder assert pipe._execution_device is not None device_ids = [0, 1] _neuronTransformer = InferenceTransformerWrapper(pipe.transformer) _neuronTransformer.transformer = torch_neuronx.DataParallel(torch.jit.load(transformer_filename), device_ids, set_dynamic_batching=False) pipe.transformer = _neuronTransformer pipe.vae.decoder = SimpleWrapper(torch.jit.load(decoder_filename)) pipe.vae.post_quant_conv = SimpleWrapper(torch.jit.load(post_quant_conv_filename)) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "pixart_alpha", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/pixart_sigma_benchmark.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" os.environ["NEURON_CUSTOM_SILU"] = "1" import copy import diffusers import math import numpy as npy import time import torch import torch_neuronx import torch.nn as nn import torch.nn.functional as F from diffusers import PixArtSigmaPipeline from IPython.display import clear_output from matplotlib import image as mpimg from matplotlib import pyplot as plt from torch import nn import torch from torch import nn from transformers.models.t5.modeling_t5 import T5EncoderModel from diffusers import Transformer2DModel # Define datatype DTYPE = torch.bfloat16 # Specialized benchmarking class for PixArt models. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E PixArt performance, # because the top-level PixArt pipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because PixArt pipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class InferenceTextEncoderWrapper(nn.Module): def __init__(self, dtype, t: T5EncoderModel, seqlen: int): super().__init__() self.dtype = dtype self.device = t.device self.t = t def forward(self, text_input_ids, attention_mask=None): return [self.t(text_input_ids, attention_mask)['last_hidden_state'].to(self.dtype)] class InferenceTransformerWrapper(nn.Module): def __init__(self, transformer: Transformer2DModel): super().__init__() self.transformer = transformer self.config = transformer.config self.dtype = transformer.dtype self.device = transformer.device def forward(self, hidden_states, encoder_hidden_states=None, timestep=None, encoder_attention_mask=None, added_cond_kwargs=None, return_dict=False): output = self.transformer( hidden_states, encoder_hidden_states, timestep, encoder_attention_mask) return output class SimpleWrapper(nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, x): output = self.model(x) return output # --- Load all compiled models and benchmark pipeline --- def get_pipe(resolution, dtype): if resolution == 256: transformer = Transformer2DModel.from_pretrained( "PixArt-alpha/PixArt-Sigma-XL-2-256x256", subfolder='transformer', torch_dtype=dtype, ) return PixArtSigmaPipeline.from_pretrained( "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers", transformer=transformer, torch_dtype=dtype, ) elif resolution == 512: transformer = Transformer2DModel.from_pretrained( "PixArt-alpha/PixArt-Sigma-XL-2-512-MS", subfolder='transformer', torch_dtype=dtype, ) return PixArtSigmaPipeline.from_pretrained( "PixArt-alpha/pixart_sigma_sdxlvae_T5_diffusers", transformer=transformer, torch_dtype=dtype, ) else: raise Exception(f"Unsupport resolution {resolution} for PixArt Sigma") COMPILER_WORKDIR_ROOT = 'pixart_sigma_compile_dir' text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') transformer_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'transformer/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') # Select the desired resolution () resolution = 256 # resolution = 512 pipe = get_pipe(resolution, DTYPE) seqlen = 300 _neuronTextEncoder = InferenceTextEncoderWrapper(DTYPE, pipe.text_encoder, seqlen) _neuronTextEncoder.t = torch.jit.load(text_encoder_filename) pipe.text_encoder = _neuronTextEncoder assert pipe._execution_device is not None device_ids = [0, 1] _neuronTransformer = InferenceTransformerWrapper(pipe.transformer) _neuronTransformer.transformer = torch_neuronx.DataParallel(torch.jit.load(transformer_filename), device_ids, set_dynamic_batching=False) pipe.transformer = _neuronTransformer pipe.vae.decoder = SimpleWrapper(torch.jit.load(decoder_filename)) pipe.vae.post_quant_conv = SimpleWrapper(torch.jit.load(post_quant_conv_filename)) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "pixart_alpha", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/resnet50_benchmark.py ================================================ import torch import torch.neuron import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_name = "resnet50" batch_sizes = [1, 6] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.torch.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/resnet50_compile.py ================================================ import torch import torch.neuron import torchvision import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_name = "resnet50" batch_sizes = [1, 6] pipeline_sizes = [1] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": model = torchvision.models.resnet50(pretrained=True) inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Compile print("Compiling {}".format(filename)) npf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/resnet_benchmark.py ================================================ import torch import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_names = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152"] batch_sizes = [1, 8, 64] n_models = [1, 2] workers_per_model = [1, 2] # optimized for latency or throughput def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for model_name in model_names: inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/resnet_compile.py ================================================ import torch import torchvision import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_names = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152"] batch_sizes = [1, 8, 64] pipeline_sizes = [1] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for model_name in model_names: model = getattr(torchvision.models, model_name)(pretrained=True) inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Compile print("Compiling {}".format(filename)) npf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/sd2_512_benchmark.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import torch import torch.nn as nn import torch_neuronx from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from diffusers.models.unet_2d_condition import UNet2DConditionOutput import time import math # Define datatype DTYPE = torch.bfloat16 # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] def decode_latents(self, latents): latents = latents.to(torch.float) latents = 1 / self.vae.config.scaling_factor * latents image = self.vae.decode(latents).sample image = (image / 2 + 0.5).clamp(0, 1) image = image.cpu().permute(0, 2, 3, 1).float().numpy() return image StableDiffusionPipeline.decode_latents = decode_latents # --- Load all compiled models and benchmark pipeline --- COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512' model_id = "stabilityai/stable-diffusion-2-1-base" text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) class NeuronTypeConversionWrapper(nn.Module): def __init__(self, network): super().__init__() self.network = network def forward(self, x): return self.network(x.float()) # Load other compiled models onto a single neuron core. pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder) pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename) pipe.vae.decoder = NeuronTypeConversionWrapper(torch.jit.load(decoder_filename)) pipe.vae.post_quant_conv = NeuronTypeConversionWrapper(torch.jit.load(post_quant_conv_filename)) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "stable_diffusion_512", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/sd2_512_compile.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import torch import torch.nn as nn import torch_neuronx import copy from diffusers import StableDiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput # Compatibility for diffusers<0.18.0 from packaging import version import diffusers diffusers_version = version.parse(diffusers.__version__) use_new_diffusers = diffusers_version >= version.parse('0.18.0') if use_new_diffusers: from diffusers.models.attention_processor import Attention else: from diffusers.models.cross_attention import CrossAttention # Define datatype DTYPE = torch.bfloat16 # Have to do this double wrapper trick to compile the unet, because # of the special UNet2DConditionOutput output type. class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] # Optimized attention def get_attention_scores(self, query, key, attn_mask): dtype = query.dtype if self.upcast_attention: query = query.float() key = key.float() # Check for square matmuls if(query.size() == key.size()): attention_scores = custom_badbmm( key, query.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = attention_scores.softmax(dim=1).permute(0,2,1) attention_probs = attention_probs.to(dtype) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = attention_scores.softmax(dim=-1) attention_probs = attention_probs.to(dtype) return attention_probs # In the original badbmm the bias is all zeros, so only apply scale def custom_badbmm(a, b): bmm = torch.bmm(a, b) scaled = bmm * 0.125 return scaled # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_512' # Model ID for SD version pipeline model_id = "stabilityai/stable-diffusion-2-1-base" # --- Compile UNet and save --- pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Replace original cross-attention module with custom cross-attention module for better performance if use_new_diffusers: Attention.get_attention_scores = get_attention_scores else: CrossAttention.get_attention_scores = get_attention_scores # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe # Compile unet - FP32 sample_1b = torch.randn([1, 4, 64, 64], dtype=DTYPE) timestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE) example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference", "--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron # --- Compile CLIP text encoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) text_encoder = copy.deepcopy(pipe.text_encoder) del pipe # Apply the wrapper to deal with custom return type text_encoder = NeuronTextEncoder(text_encoder) # Compile text encoder # This is used for indexing a lookup table in torch.nn.Embedding, # so using random numbers may give errors (out of range). emb = torch.tensor([[49406, 18376, 525, 7496, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) text_encoder_neuron = torch_neuronx.trace( text_encoder.neuron_text_encoder, emb, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(text_encoder_neuron) # Save the compiled text encoder text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(text_encoder_neuron, text_encoder_filename) # delete unused objects del text_encoder del text_encoder_neuron # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, 64, 64], dtype=torch.float32) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(decoder_neuron) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_neuron # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 64, 64], dtype=torch.float32) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(post_quant_conv_neuron) # # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv del post_quant_conv_neuron ================================================ FILE: archive/src/benchmark/pytorch/sd2_768_benchmark.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import torch import torch.nn as nn import torch_neuronx from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler from diffusers.models.unet_2d_condition import UNet2DConditionOutput import time import math # Define datatype DTYPE = torch.float32 # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] # --- Load all compiled models and run pipeline --- COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_768' model_id = "stabilityai/stable-diffusion-2-1" text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder) pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename) pipe.vae.decoder = torch.jit.load(decoder_filename) pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "stable_diffusion_768", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/sd2_768_compile.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import torch import torch.nn as nn import torch_neuronx import copy from diffusers import StableDiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput # Compatibility for diffusers<0.18.0 from packaging import version import diffusers diffusers_version = version.parse(diffusers.__version__) use_new_diffusers = diffusers_version >= version.parse('0.18.0') if use_new_diffusers: from diffusers.models.attention_processor import Attention else: from diffusers.models.cross_attention import CrossAttention # Define datatype DTYPE = torch.float32 class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] # Optimized attention def get_attention_scores(self, query, key, attn_mask): dtype = query.dtype if self.upcast_attention: query = query.float() key = key.float() # Check for square matmuls if(query.size() == key.size()): attention_scores = custom_badbmm( key, query.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = attention_scores.softmax(dim=1).permute(0,2,1) attention_probs = attention_probs.to(dtype) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = attention_scores.softmax(dim=-1) attention_probs = attention_probs.to(dtype) return attention_probs # In the original badbmm the bias is all zeros, so only apply scale def custom_badbmm(a, b): bmm = torch.bmm(a, b) scaled = bmm * 0.125 return scaled # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sd2_compile_dir_768' # Model ID for SD version pipeline model_id = "stabilityai/stable-diffusion-2-1" # --- Compile UNet and save --- pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Replace original cross-attention module with custom cross-attention module for better performance if use_new_diffusers: Attention.get_attention_scores = get_attention_scores else: CrossAttention.get_attention_scores = get_attention_scores # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe # Compile unet sample_1b = torch.randn([1, 4, 96, 96], dtype=DTYPE) timestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE) example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference", "--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron # --- Compile CLIP text encoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) text_encoder = copy.deepcopy(pipe.text_encoder) del pipe # Apply the wrapper to deal with custom return type text_encoder = NeuronTextEncoder(text_encoder) # Compile text encoder # This is used for indexing a lookup table in torch.nn.Embedding, # so using random numbers may give errors (out of range). emb = torch.tensor([[49406, 18376, 525, 7496, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) text_encoder_neuron = torch_neuronx.trace( text_encoder.neuron_text_encoder, emb, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(text_encoder_neuron) # Save the compiled text encoder text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(text_encoder_neuron, text_encoder_filename) # delete unused objects del text_encoder del text_encoder_neuron # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, 96, 96], dtype=DTYPE) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(decoder_neuron) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_neuron # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 96, 96], dtype=DTYPE) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(post_quant_conv_neuron) # # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv del post_quant_conv_neuron ================================================ FILE: archive/src/benchmark/pytorch/sd2_inpainting_benchmark.py ================================================ import torch import torch.nn as nn import torch_neuronx import os from diffusers import StableDiffusionInpaintPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput from diffusers.models.attention_processor import Attention import argparse import copy torch.manual_seed(0) def parse_argsuments(): parser = argparse.ArgumentParser() parser.add_argument('--prompt', type=str, default='Face of a yellow cat, high resolution, sitting on a park bench', help="user input for text to image use case") parser.add_argument('--target_dir', type=str, default='./sd21_inpainting_512_neuron', help="directory to save neuron compield model") args=parser.parse_args() return args # Have to do this double wrapper trick to compile the unet, because # of the special UNet2DConditionOutput output type. class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.bfloat16().expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] # Optimized attention def get_attention_scores(self, query, key, attn_mask): dtype = query.dtype if self.upcast_attention: query = query.float() key = key.float() # Check for square matmuls if(query.size() == key.size()): attention_scores = custom_badbmm( key, query.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1) attention_probs = attention_probs.to(dtype) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1) attention_probs = attention_probs.to(dtype) return attention_probs def custom_badbmm(a, b): bmm = torch.bmm(a, b) scaled = bmm * 0.125 return scaled inputs=parse_argsuments() print(inputs.target_dir) # For saving compiler artifacts COMPILER_WORKDIR_ROOT = inputs.target_dir def trace_vae_encoder(model_id, height, width): # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32) vae_encoder = copy.deepcopy(pipe.vae.encoder) del pipe sample_input = torch.randn([1, 3, height, width]) vae_encoder_neuron = torch_neuronx.trace( vae_encoder, sample_input, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder'), ) # Save the compiled text encoder vae_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder/model.pt') torch.jit.save(vae_encoder_neuron, vae_encoder_filename) # delete unused objects del vae_encoder del vae_encoder_neuron def trace_unet(model_id, height, width): # --- Compile UNet and save --- DTYPE = torch.bfloat16 pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Replace original cross-attention module with custom cross-attention module for better performance Attention.get_attention_scores = get_attention_scores # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe sample_1b = torch.randn([1, 9, height, width], dtype=DTYPE) timestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 1024], dtype=DTYPE) example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference", "--verbose=info"], ) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron def main(): model_id = "stabilityai/stable-diffusion-2-inpainting" height = 624 width = 936 trace_unet(model_id, height // 8, width // 8) trace_vae_encoder(model_id, height, width) # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32) text_encoder = copy.deepcopy(pipe.text_encoder) del pipe # Apply the wrapper to deal with custom return type text_encoder = NeuronTextEncoder(text_encoder) # Compile text encoder # This is used for indexing a lookup table in torch.nn.Embedding, # so using random numbers may give errors (out of range). emb = torch.tensor([[49406, 18376, 525, 7496, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) text_encoder_neuron = torch_neuronx.trace( text_encoder.neuron_text_encoder, emb, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), ) # Save the compiled text encoder text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(text_encoder_neuron, text_encoder_filename) # delete unused objects del text_encoder del text_encoder_neuron # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, height // 8, width // 8]) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'), compiler_args=["--verbose", "info"] ) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_neuron # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, height // 8 , width // 8]) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), compiler_args=["--verbose", "info"] ) # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv del post_quant_conv_neuron if __name__ == "__main__": main() ================================================ FILE: archive/src/benchmark/pytorch/sd2_inpainting_inference.py ================================================ import torch import torch.nn as nn import torch_neuronx import os import time from diffusers import StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler from diffusers.models.unet_2d_condition import UNet2DConditionOutput from diffusers.models.attention_processor import Attention import threading import argparse import sys import copy import PIL import math torch.manual_seed(0) def parse_argsuments(): parser = argparse.ArgumentParser() parser.add_argument('--prompt', type=str, default='Face of a yellow cat, high resolution, sitting on a park bench', help="user input for text to image use case") parser.add_argument('--target_dir', type=str, default='./sd21_inpainting_512_neuron', help="directory to save neuron compield model") args=parser.parse_args() return args # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] DTYPE = torch.bfloat16 # Have to do this double wrapper trick to compile the unet, because # of the special UNet2DConditionOutput output type. class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, timestep_cond=None, added_cond_kwargs=None, cross_attention_kwargs=None, return_dict=False): sample = self.unetwrap(sample.to(dtype=DTYPE), timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states.to(dtype=DTYPE))[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] # Optimized attention def get_attention_scores(self, query, key, attn_mask): dtype = query.dtype if self.upcast_attention: query = query.float() key = key.float() # Check for square matmuls if(query.size() == key.size()): attention_scores = custom_badbmm( key, query.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1) attention_probs = attention_probs.to(dtype) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2) ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1) attention_probs = attention_probs.to(dtype) return attention_probs def custom_badbmm(a, b): bmm = torch.bmm(a, b) scaled = bmm * 0.125 return scaled def main(): inputs=parse_argsuments() print(inputs.target_dir) # For saving compiler artifacts COMPILER_WORKDIR_ROOT = inputs.target_dir model_id = "stabilityai/stable-diffusion-2-inpainting" pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float32) text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') vae_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder) pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename) pipe.vae.encoder = torch.jit.load(vae_encoder_filename) pipe.vae.decoder = torch.jit.load(decoder_filename) pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) height = 624 width = 936 base_image = PIL.Image.open('sd2_inpainting_photo.png') mask = PIL.Image.open('sd2_inpainting_mask.png') image = pipe(prompt=inputs.prompt, image=base_image, mask_image=mask, height=height, width=width).images[0] image.save("sd2_inpainting_output.png") n_runs = 10 benchmark(n_runs, "stable_diffusion_inpainting", pipe, (inputs.prompt, base_image, mask, None, height, width)) if __name__ == "__main__": main() ================================================ FILE: archive/src/benchmark/pytorch/sd_15_512_benchmark.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import copy import time import torch import torch.nn as nn import torch_neuronx from diffusers import StableDiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput import time import math # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False): sample = self.unetwrap(sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = torch.float32 self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] class NeuronSafetyModelWrap(nn.Module): def __init__(self, safety_model): super().__init__() self.safety_model = safety_model def forward(self, clip_inputs): return list(self.safety_model(clip_inputs).values()) # # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sd_1_5_fp32_512_compile_workdir' # Model ID for SD version pipeline model_id = "runwayml/stable-diffusion-v1-5" pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') safety_model_neuron_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model/model.pt') # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder) pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename) pipe.vae.decoder = torch.jit.load(decoder_filename) pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) pipe.safety_checker.vision_model = NeuronSafetyModelWrap(torch.jit.load(safety_model_neuron_filename)) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "stable_diffusion_15_512", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/sd_15_512_compile.py ================================================ import os os.environ["NEURON_FUSE_SOFTMAX"] = "1" import copy import time import torch import torch.nn as nn import torch_neuronx from diffusers import StableDiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput # Compatibility for diffusers<0.18.0 from packaging import version import diffusers diffusers_version = version.parse(diffusers.__version__) use_new_diffusers = diffusers_version >= version.parse('0.18.0') if use_new_diffusers: from diffusers.models.attention_processor import Attention else: from diffusers.models.cross_attention import CrossAttention def get_attention_scores(self, query, key, attn_mask): dtype = query.dtype if self.upcast_attention: query = query.float() key = key.float() if(query.size() == key.size()): attention_scores = cust_badbmm( key, query.transpose(-1, -2), self.scale ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=1).permute(0,2,1) attention_probs = attention_probs.to(dtype) else: attention_scores = cust_badbmm( query, key.transpose(-1, -2), self.scale ) if self.upcast_softmax: attention_scores = attention_scores.float() attention_probs = torch.nn.functional.softmax(attention_scores, dim=-1) attention_probs = attention_probs.to(dtype) return attention_probs def cust_badbmm(a, b, scale): bmm = torch.bmm(a, b) scaled = bmm * scale return scaled class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, cross_attention_kwargs=None, return_dict=False): sample = self.unetwrap(sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states)[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = torch.float32 self.device = text_encoder.device def forward(self, emb, attention_mask = None): return [self.neuron_text_encoder(emb)['last_hidden_state']] class NeuronSafetyModelWrap(nn.Module): def __init__(self, safety_model): super().__init__() self.safety_model = safety_model def forward(self, clip_inputs): return list(self.safety_model(clip_inputs).values()) # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sd_1_5_fp32_512_compile_workdir' # Model ID for SD version pipeline model_id = "runwayml/stable-diffusion-v1-5" # --- Compile CLIP text encoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) text_encoder = copy.deepcopy(pipe.text_encoder) del pipe # Apply the wrapper to deal with custom return type text_encoder = NeuronTextEncoder(text_encoder) # Compile text encoder # This is used for indexing a lookup table in torch.nn.Embedding, # so using random numbers may give errors (out of range). emb = torch.tensor([[49406, 18376, 525, 7496, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) with torch.no_grad(): start_time = time.time() text_encoder_neuron = torch_neuronx.trace( text_encoder.neuron_text_encoder, emb, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) text_encoder_neuron_compile_time = time.time() - start_time print('text_encoder_neuron_compile_time:', text_encoder_neuron_compile_time) # Save the compiled text encoder text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch_neuronx.async_load(text_encoder_neuron) torch.jit.save(text_encoder_neuron, text_encoder_filename) # delete unused objects del text_encoder del text_encoder_neuron del emb # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, 64, 64]) with torch.no_grad(): start_time = time.time() decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) vae_decoder_compile_time = time.time() - start_time print('vae_decoder_compile_time:', vae_decoder_compile_time) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch_neuronx.async_load(decoder_neuron) torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_in del decoder_neuron # --- Compile UNet and save --- pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) # Replace original cross-attention module with custom cross-attention module for better performance if use_new_diffusers: Attention.get_attention_scores = get_attention_scores else: CrossAttention.get_attention_scores = get_attention_scores # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe # Compile unet - FP32 sample_1b = torch.randn([1, 4, 64, 64]) timestep_1b = torch.tensor(999).float().expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 768]) example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b with torch.no_grad(): start_time = time.time() unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference", "--enable-fast-loading-neuron-binaries"] ) unet_compile_time = time.time() - start_time print('unet_compile_time:', unet_compile_time) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron del sample_1b del timestep_1b del encoder_hidden_states_1b # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 64, 64]) with torch.no_grad(): start_time = time.time() post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) vae_post_quant_conv_compile_time = time.time() - start_time print('vae_post_quant_conv_compile_time:', vae_post_quant_conv_compile_time) # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch_neuronx.async_load(post_quant_conv_neuron) torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv # --- Compile safety checker and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) safety_model = copy.deepcopy(pipe.safety_checker.vision_model) del pipe clip_input = torch.randn([1, 3, 224, 224]) with torch.no_grad(): start_time = time.time() safety_model = torch_neuronx.trace( safety_model, clip_input, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model'), compiler_args=["--enable-fast-loading-neuron-binaries"] ) safety_model_compile_time = time.time() - start_time print('safety_model_compile_time:', safety_model_compile_time) # Save the compiled safety checker safety_model_neuron_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'safety_model/model.pt') torch_neuronx.async_load(safety_model) torch.jit.save(safety_model, safety_model_neuron_filename) # delete unused objects del safety_model print('Total compile time:', text_encoder_neuron_compile_time + vae_decoder_compile_time + unet_compile_time + vae_post_quant_conv_compile_time + safety_model_compile_time) ================================================ FILE: archive/src/benchmark/pytorch/sd_4x_upscaler_benchmark.py ================================================ import os import time import requests import copy import math import torch import torch.nn as nn import torch.nn.functional as F import torch_neuronx import numpy as np from PIL import Image from io import BytesIO import diffusers from diffusers import StableDiffusionUpscalePipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward( self, sample, timestep, encoder_hidden_states, class_labels, cross_attention_kwargs=None, ): out_tuple = self.unet( sample, timestep, encoder_hidden_states, class_labels, return_dict=False ) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward( self, sample, timestep, encoder_hidden_states, class_labels, cross_attention_kwargs=None, return_dict=False, ): sample = self.unetwrap( sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states, class_labels, )[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask=None): return [self.neuron_text_encoder(emb)["last_hidden_state"]] # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] # --- Load all compiled models --- COMPILER_WORKDIR_ROOT = 'stable_diffusion_upscaler_fp32' model_id = "stabilityai/stable-diffusion-x4-upscaler" text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') pipe = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float32) # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe.text_encoder = NeuronTextEncoder(pipe.text_encoder) pipe.text_encoder.neuron_text_encoder = torch.jit.load(text_encoder_filename) pipe.vae.decoder = torch.jit.load(decoder_filename) pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) # Run pipeline prompt = ["a white cat"] url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png" response = requests.get(url) low_res_img = Image.open(BytesIO(response.content)).convert("RGB") low_res_img = low_res_img.resize((128, 128)) upscaled_image = pipe(prompt=prompt, image=low_res_img).images[0] os.makedirs("misc", exist_ok=True) upscaled_image.save("upsampled_cat.png") # Benchmark n_runs = 20 benchmark(n_runs, "stable_diffusion_512", pipe, (prompt, low_res_img)) ================================================ FILE: archive/src/benchmark/pytorch/sd_4x_upscaler_compile.py ================================================ import os import requests import copy import math import torch import torch.nn as nn import torch.nn.functional as F import torch_neuronx from PIL import Image from io import BytesIO import diffusers from diffusers import StableDiffusionUpscalePipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput from packaging import version def apply_neuron_attn_override( diffusers_pkg, get_attn_scores_func, neuron_scaled_dot_product_attention ): diffusers_version = version.parse(diffusers_pkg.__version__) use_new_diffusers = diffusers_version >= version.parse("0.18.0") if use_new_diffusers: diffusers_pkg.models.attention_processor.Attention.get_attention_scores = ( get_attn_scores_func ) else: diffusers_pkg.models.cross_attention.CrossAttention.get_attention_scores = ( get_attn_scores_func ) # If Pytorch 2 is available, a F.scaled_dot_product_attention will be used, so we need to # monkey patch that too to be Neuron optimized attention if hasattr(F, "scaled_dot_product_attention"): F.scaled_dot_product_attention = neuron_scaled_dot_product_attention def get_attention_scores_neuron(self, query, key, attn_mask): if query.size() == key.size(): attention_scores = cust_badbmm(key, query.transpose(-1, -2), self.scale) attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1) else: attention_scores = cust_badbmm(query, key.transpose(-1, -2), self.scale) attention_probs = attention_scores.softmax(dim=-1) return attention_probs def cust_badbmm(a, b, scale): bmm = torch.bmm(a, b) scaled = bmm * scale return scaled def neuron_scaled_dot_product_attention( query, key, value, attn_mask=None, dropout_p=None, is_causal=None ): orig_shape = None if len(query.shape) == 4: orig_shape = query.shape def to3d(x): return x.reshape(-1, x.shape[2], x.shape[3]) query, key, value = map(to3d, [query, key, value]) if query.size() == key.size(): attention_scores = torch.bmm(key, query.transpose(-1, -2)) * ( 1 / math.sqrt(query.size(-1)) ) attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1) else: attention_scores = torch.bmm(query, key.transpose(-1, -2)) * ( 1 / math.sqrt(query.size(-1)) ) attention_probs = attention_scores.softmax(dim=-1) attn_out = torch.bmm(attention_probs, value) if orig_shape: attn_out = attn_out.reshape( orig_shape[0], orig_shape[1], attn_out.shape[1], attn_out.shape[2] ) return attn_out class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward( self, sample, timestep, encoder_hidden_states, class_labels, cross_attention_kwargs=None, ): out_tuple = self.unet( sample, timestep, encoder_hidden_states, class_labels, return_dict=False ) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.device = unetwrap.unet.device def forward( self, sample, timestep, encoder_hidden_states, class_labels, cross_attention_kwargs=None, return_dict=False, ): sample = self.unetwrap( sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states, class_labels, )[0] return UNet2DConditionOutput(sample=sample) class NeuronTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.neuron_text_encoder = text_encoder self.config = text_encoder.config self.dtype = text_encoder.dtype self.device = text_encoder.device def forward(self, emb, attention_mask=None): return [self.neuron_text_encoder(emb)["last_hidden_state"]] # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'stable_diffusion_upscaler_fp32' # Model ID for SD version pipeline model_id = "stabilityai/stable-diffusion-x4-upscaler" # --- Compile CLIP text encoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionUpscalePipeline.from_pretrained( model_id, torch_dtype=torch.float32 ) text_encoder = copy.deepcopy(pipe.text_encoder) del pipe # Apply the wrapper to deal with custom return type text_encoder = NeuronTextEncoder(text_encoder) # Compile text encoder # This is used for indexing a lookup table in torch.nn.Embedding, # so using random numbers may give errors (out of range). emb = torch.tensor([[49406, 18376, 525, 7496, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) text_encoder_neuron = torch_neuronx.trace( text_encoder.neuron_text_encoder, emb, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), ) # Save the compiled text encoder text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(text_encoder_neuron, text_encoder_filename) # delete unused objects del text_encoder # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionUpscalePipeline.from_pretrained( model_id, torch_dtype=torch.float32 ) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # # Compile vae decoder decoder_in = torch.randn([1, 4, 128, 128]) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder'), ) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder # --- Compile UNet and save --- pipe = StableDiffusionUpscalePipeline.from_pretrained( model_id, torch_dtype=torch.float32 ) # Replace original cross-attention module with custom cross-attention module for better performance apply_neuron_attn_override( diffusers, get_attention_scores_neuron, neuron_scaled_dot_product_attention ) # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe # Compile unet - FP32 sample_1b = torch.randn([1, 7, 128, 128]) timestep_1b = torch.tensor(999).float().expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 1024]) class_labels = torch.tensor([20]) example_inputs = sample_1b, timestep_1b, encoder_hidden_states_1b, class_labels unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference"] ) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = StableDiffusionUpscalePipeline.from_pretrained( model_id, torch_dtype=torch.float32 ) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # # # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 128, 128]) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), ) # # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv ================================================ FILE: archive/src/benchmark/pytorch/sdxl_base_1024_benchmark.py ================================================ import os import torch import torch.nn as nn import torch_neuronx from diffusers import DiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput from transformers.models.clip.modeling_clip import CLIPTextModelOutput import time import math # Define datatype DTYPE = torch.float32 # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, added_cond_kwargs={"text_embeds": text_embeds, "time_ids": time_ids}, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.add_embedding = unetwrap.unet.add_embedding self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states, added_cond_kwargs["text_embeds"], added_cond_kwargs["time_ids"])[0] return UNet2DConditionOutput(sample=sample) class TextEncoderOutputWrapper(nn.Module): def __init__(self, traceable_text_encoder, original_text_encoder): super().__init__() self.traceable_text_encoder = traceable_text_encoder self.config = original_text_encoder.config self.dtype = original_text_encoder.dtype self.device = original_text_encoder.device def forward(self, text_input_ids, output_hidden_states=True): out_tuple = self.traceable_text_encoder(text_input_ids) return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2]) class TextEncoderOutputWrapper(nn.Module): def __init__(self, traceable_text_encoder, original_text_encoder): super().__init__() self.traceable_text_encoder = traceable_text_encoder self.config = original_text_encoder.config self.dtype = original_text_encoder.dtype self.device = original_text_encoder.device def forward(self, text_input_ids, output_hidden_states=True): out_tuple = self.traceable_text_encoder(text_input_ids) return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2]) class TraceableTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.text_encoder = text_encoder def forward(self, text_input_ids): out_tuple = self.text_encoder(text_input_ids, output_hidden_states=True, return_dict=False) return out_tuple # --- Load all compiled models and run pipeline --- COMPILER_WORKDIR_ROOT = 'sdxl_base_compile_dir_1024' model_id = "stabilityai/stable-diffusion-xl-base-1.0" text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') text_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') text_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt') pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Load the compiled UNet onto two neuron cores. pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) device_ids = [0,1] pipe.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe.vae.decoder = torch.jit.load(decoder_filename) pipe.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) pipe.text_encoder = TextEncoderOutputWrapper(torch.jit.load(text_encoder_filename), pipe.text_encoder) pipe.text_encoder_2 = TextEncoderOutputWrapper(torch.jit.load(text_encoder_2_filename), pipe.text_encoder_2) prompt = "a photo of an astronaut riding a horse on mars" n_runs = 20 benchmark(n_runs, "stable_diffusion_1024", pipe, prompt) ================================================ FILE: archive/src/benchmark/pytorch/sdxl_base_1024_compile.py ================================================ import os import torch import torch.nn as nn import torch.nn.functional as F import torch_neuronx import math import copy import diffusers from diffusers import DiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput from diffusers.models.attention_processor import Attention from transformers.models.clip.modeling_clip import CLIPTextModelOutput from packaging import version def apply_neuron_attn_override( diffusers_pkg, get_attn_scores_func, neuron_scaled_dot_product_attention ): diffusers_version = version.parse(diffusers_pkg.__version__) use_new_diffusers = diffusers_version >= version.parse("0.18.0") if use_new_diffusers: diffusers_pkg.models.attention_processor.Attention.get_attention_scores = ( get_attn_scores_func ) else: diffusers_pkg.models.cross_attention.CrossAttention.get_attention_scores = ( get_attn_scores_func ) # If Pytorch 2 is available, a F.scaled_dot_product_attention will be used, so we need to # monkey patch that too to be Neuron optimized attention if hasattr(F, "scaled_dot_product_attention"): F.scaled_dot_product_attention = neuron_scaled_dot_product_attention # Define datatype DTYPE = torch.float32 # Optimized attention def get_attention_scores_neuron(self, query, key, attn_mask): if query.size() == key.size(): attention_scores = custom_badbmm( key, query.transpose(-1, -2), self.scale ) attention_probs = attention_scores.softmax(dim=1).permute(0,2,1) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2), self.scale ) attention_probs = attention_scores.softmax(dim=-1) return attention_probs def custom_badbmm(a, b, scale): bmm = torch.bmm(a, b) scaled = bmm * scale return scaled def neuron_scaled_dot_product_attention( query, key, value, attn_mask=None, dropout_p=None, is_causal=None ): orig_shape = None if len(query.shape) == 4: orig_shape = query.shape def to3d(x): return x.reshape(-1, x.shape[2], x.shape[3]) query, key, value = map(to3d, [query, key, value]) if query.size() == key.size(): attention_scores = torch.bmm(key, query.transpose(-1, -2)) * ( 1 / math.sqrt(query.size(-1)) ) attention_probs = attention_scores.softmax(dim=1).permute(0, 2, 1) else: attention_scores = torch.bmm(query, key.transpose(-1, -2)) * ( 1 / math.sqrt(query.size(-1)) ) attention_probs = attention_scores.softmax(dim=-1) attn_out = torch.bmm(attention_probs, value) if orig_shape: attn_out = attn_out.reshape( orig_shape[0], orig_shape[1], attn_out.shape[1], attn_out.shape[2] ) return attn_out # Replace original cross-attention module with custom cross-attention module for better performance apply_neuron_attn_override( diffusers, get_attention_scores_neuron, neuron_scaled_dot_product_attention ) class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward( self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None ): out_tuple = self.unet( sample, timestep, encoder_hidden_states, added_cond_kwargs={"text_embeds": text_embeds, "time_ids": time_ids}, return_dict=False, ) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.add_embedding = unetwrap.unet.add_embedding self.device = unetwrap.unet.device def forward( self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None, ): sample = self.unetwrap( sample, timestep.float().expand((sample.shape[0],)), encoder_hidden_states, added_cond_kwargs["text_embeds"], added_cond_kwargs["time_ids"], )[0] return UNet2DConditionOutput(sample=sample) class TextEncoderOutputWrapper(nn.Module): def __init__(self, traceable_text_encoder, original_text_encoder): super().__init__() self.traceable_text_encoder = traceable_text_encoder self.config = original_text_encoder.config self.dtype = original_text_encoder.dtype self.device = original_text_encoder.device def forward(self, text_input_ids, output_hidden_states=True): out_tuple = self.traceable_text_encoder(text_input_ids) return CLIPTextModelOutput(text_embeds=out_tuple[0], last_hidden_state=out_tuple[1], hidden_states=out_tuple[2]) class TraceableTextEncoder(nn.Module): def __init__(self, text_encoder): super().__init__() self.text_encoder = text_encoder def forward(self, text_input_ids): out_tuple = self.text_encoder(text_input_ids, output_hidden_states=True, return_dict=False) return out_tuple # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sdxl_base_compile_dir_1024' # Model ID for SD XL version pipeline model_id = "stabilityai/stable-diffusion-xl-base-1.0" # --- Compile Text Encoders and save --- pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Apply wrappers to make text encoders traceable traceable_text_encoder = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder)) traceable_text_encoder_2 = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder_2)) del pipe text_input_ids_1 = torch.tensor([[49406, 736, 1615, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407]]) text_input_ids_2 = torch.tensor([[49406, 736, 1615, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) # Text Encoder 1 neuron_text_encoder = torch_neuronx.trace( traceable_text_encoder, text_input_ids_1, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), ) text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(neuron_text_encoder, text_encoder_filename) # Text Encoder 2 neuron_text_encoder_2 = torch_neuronx.trace( traceable_text_encoder_2, text_input_ids_2, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2'), ) text_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt') torch.jit.save(neuron_text_encoder_2, text_encoder_2_filename) # --- Compile Text Encoders and save --- pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) # Apply wrappers to make text encoders traceable traceable_text_encoder = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder)) traceable_text_encoder_2 = copy.deepcopy(TraceableTextEncoder(pipe.text_encoder_2)) del pipe text_input_ids_1 = torch.tensor([[49406, 736, 1615, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407]]) text_input_ids_2 = torch.tensor([[49406, 736, 1615, 49407, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]) # Text Encoder 1 neuron_text_encoder = torch_neuronx.trace( traceable_text_encoder, text_input_ids_1, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder'), ) text_encoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder/model.pt') torch.jit.save(neuron_text_encoder, text_encoder_filename) # Text Encoder 2 neuron_text_encoder_2 = torch_neuronx.trace( traceable_text_encoder_2, text_input_ids_2, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2'), ) text_encoder_2_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'text_encoder_2/model.pt') torch.jit.save(neuron_text_encoder_2, text_encoder_2_filename) # --- Compile UNet and save --- pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) # Replace original cross-attention module with custom cross-attention module for better performance Attention.get_attention_scores = get_attention_scores_neuron # Apply double wrapper to deal with custom return type pipe.unet = NeuronUNet(UNetWrap(pipe.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe.unet.unetwrap) del pipe # Compile unet - FP32 sample_1b = torch.randn([1, 4, 128, 128], dtype=DTYPE) timestep_1b = torch.tensor(999, dtype=DTYPE).expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 2048], dtype=DTYPE) added_cond_kwargs_1b = {"text_embeds": torch.randn([1, 1280], dtype=DTYPE), "time_ids": torch.randn([1, 6], dtype=DTYPE)} example_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b["text_embeds"], added_cond_kwargs_1b["time_ids"],) unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet'), compiler_args=["--model-type=unet-inference"] ) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, 128, 128], dtype=DTYPE) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder') ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(decoder_neuron) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_neuron # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=DTYPE) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 128, 128], dtype=DTYPE) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(post_quant_conv_neuron) # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv del post_quant_conv_neuron ================================================ FILE: archive/src/benchmark/pytorch/sdxl_base_and_refiner_1024_benchmark.py ================================================ import os import torch import torch.nn as nn import torch_neuronx from diffusers import DiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput import time import math # Define datatype DTYPE = torch.float32 # Specialized benchmarking class for stable diffusion. # We cannot use any of the pre-existing benchmarking utilities to benchmark E2E stable diffusion performance, # because the top-level StableDiffusionPipeline cannot be serialized into a single Torchscript object. # All of the pre-existing benchmarking utilities (in neuronperf or torch_neuronx) require the model to be a # traced Torchscript. def benchmark(n_runs, test_name, model, model_inputs): if not isinstance(model_inputs, tuple): model_inputs = (model_inputs,) warmup_run = model(*model_inputs) latency_collector = LatencyCollector() # can't use register_forward_pre_hook or register_forward_hook because StableDiffusionPipeline is not a torch.nn.Module for _ in range(n_runs): latency_collector.pre_hook() res = model(*model_inputs) latency_collector.hook() p0_latency_ms = latency_collector.percentile(0) * 1000 p50_latency_ms = latency_collector.percentile(50) * 1000 p90_latency_ms = latency_collector.percentile(90) * 1000 p95_latency_ms = latency_collector.percentile(95) * 1000 p99_latency_ms = latency_collector.percentile(99) * 1000 p100_latency_ms = latency_collector.percentile(100) * 1000 report_dict = dict() report_dict["Latency P0"] = f'{p0_latency_ms:.1f}' report_dict["Latency P50"]=f'{p50_latency_ms:.1f}' report_dict["Latency P90"]=f'{p90_latency_ms:.1f}' report_dict["Latency P95"]=f'{p95_latency_ms:.1f}' report_dict["Latency P99"]=f'{p99_latency_ms:.1f}' report_dict["Latency P100"]=f'{p100_latency_ms:.1f}' report = f'RESULT FOR {test_name}:' for key, value in report_dict.items(): report += f' {key}={value}' print(report) class LatencyCollector: def __init__(self): self.start = None self.latency_list = [] def pre_hook(self, *args): self.start = time.time() def hook(self, *args): self.latency_list.append(time.time() - self.start) def percentile(self, percent): latency_list = self.latency_list pos_float = len(latency_list) * percent / 100 max_pos = len(latency_list) - 1 pos_floor = min(math.floor(pos_float), max_pos) pos_ceil = min(math.ceil(pos_float), max_pos) latency_list = sorted(latency_list) return latency_list[pos_ceil] if pos_float - pos_floor > 0.5 else latency_list[pos_floor] class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, added_cond_kwargs={"text_embeds": text_embeds, "time_ids": time_ids}, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.add_embedding = unetwrap.unet.add_embedding self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.to(dtype=DTYPE).expand((sample.shape[0],)), encoder_hidden_states, added_cond_kwargs["text_embeds"], added_cond_kwargs["time_ids"])[0] return UNet2DConditionOutput(sample=sample) # Helper function to run both refiner and base pipes and return the final image def run_refiner_and_base(base, refiner, prompt, n_steps=40, high_noise_frac=0.8, generator=None): image = base( prompt=prompt, num_inference_steps=n_steps, denoising_end=high_noise_frac, output_type="latent", generator=generator, ).images image = refiner( prompt=prompt, num_inference_steps=n_steps, denoising_start=high_noise_frac, image=image, ).images[0] return image # --- Load all compiled models and run pipeline --- COMPILER_WORKDIR_ROOT = 'sdxl_base_and_refiner_compile_dir_1024' base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" refiner_model_id = "stabilityai/stable-diffusion-xl-refiner-1.0" unet_base_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base/model.pt') unet_refiner_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner/model.pt') decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') # ------- Load base ------- pipe_base = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True) # Load the compiled UNet onto two neuron cores. pipe_base.unet = NeuronUNet(UNetWrap(pipe_base.unet)) device_ids = [0,1] pipe_base.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_base_filename), device_ids, set_dynamic_batching=False) # Load other compiled models onto a single neuron core. pipe_base.vae.decoder = torch.jit.load(decoder_filename) pipe_base.vae.post_quant_conv = torch.jit.load(post_quant_conv_filename) # ------- Load refiner ------- # refiner shares text_encoder_2 and vae with the base pipe_refiner = DiffusionPipeline.from_pretrained( refiner_model_id, text_encoder_2=pipe_base.text_encoder_2, vae=pipe_base.vae, torch_dtype=torch.float32, low_cpu_mem_usage=True, ) # Refiner - load the compiled UNet onto two neuron cores. pipe_refiner.unet = NeuronUNet(UNetWrap(pipe_refiner.unet)) device_ids = [0,1] pipe_refiner.unet.unetwrap = torch_neuronx.DataParallel(torch.jit.load(unet_refiner_filename), device_ids, set_dynamic_batching=False) # Define how many steps and what % of steps to be run on each experts (80/20) here n_steps = 40 high_noise_frac = 0.8 prompt = "a photo of an astronaut riding a horse on mars" inputs = (pipe_base, pipe_refiner, prompt, n_steps, high_noise_frac, torch.manual_seed(0),) n_runs = 50 benchmark(n_runs, "stable_diffusion_1024", run_refiner_and_base, inputs) ================================================ FILE: archive/src/benchmark/pytorch/sdxl_base_and_refiner_1024_compile.py ================================================ import os import torch import torch.nn as nn import torch_neuronx import copy from diffusers import DiffusionPipeline from diffusers.models.unet_2d_condition import UNet2DConditionOutput from diffusers.models.attention_processor import Attention # Define datatype DTYPE = torch.float32 # Optimized attention def get_attention_scores_neuron(self, query, key, attn_mask): if query.size() == key.size(): attention_scores = custom_badbmm( key, query.transpose(-1, -2), self.scale ) attention_probs = attention_scores.softmax(dim=1).permute(0,2,1) else: attention_scores = custom_badbmm( query, key.transpose(-1, -2), self.scale ) attention_probs = attention_scores.softmax(dim=-1) return attention_probs def custom_badbmm(a, b, scale): bmm = torch.bmm(a, b) scaled = bmm * scale return scaled class UNetWrap(nn.Module): def __init__(self, unet): super().__init__() self.unet = unet def forward(self, sample, timestep, encoder_hidden_states, text_embeds=None, time_ids=None): out_tuple = self.unet(sample, timestep, encoder_hidden_states, added_cond_kwargs={"text_embeds": text_embeds, "time_ids": time_ids}, return_dict=False) return out_tuple class NeuronUNet(nn.Module): def __init__(self, unetwrap): super().__init__() self.unetwrap = unetwrap self.config = unetwrap.unet.config self.in_channels = unetwrap.unet.in_channels self.add_embedding = unetwrap.unet.add_embedding self.device = unetwrap.unet.device def forward(self, sample, timestep, encoder_hidden_states, added_cond_kwargs=None, return_dict=False, cross_attention_kwargs=None): sample = self.unetwrap(sample, timestep.expand((sample.shape[0],)), encoder_hidden_states, added_cond_kwargs["text_embeds"], added_cond_kwargs["time_ids"])[0] return UNet2DConditionOutput(sample=sample) # For saving compiler artifacts COMPILER_WORKDIR_ROOT = 'sdxl_base_and_refiner_compile_dir_1024' # Model IDs for SD XL version pipeline base_model_id = "stabilityai/stable-diffusion-xl-base-1.0" refiner_model_id = "stabilityai/stable-diffusion-xl-refiner-1.0" # All components we compile in this script: # 1. unet (base, in fp32) # 2. unet (refiner, in fp32) # 3. vae.decoder (base & refiner) # 4. vae.post_quant_conv (base & refiner) # --- Compile UNet in fp32 (base) and save --- pipe_base = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True) # Replace original cross-attention module with custom cross-attention module for better performance Attention.get_attention_scores = get_attention_scores_neuron # Apply double wrapper to deal with custom return type pipe_base.unet = NeuronUNet(UNetWrap(pipe_base.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe_base.unet.unetwrap) del pipe_base # Compile unet - fp32 (note these tensors are cast to fp32 in UNetWrap) sample_1b = torch.randn([1, 4, 128, 128]) timestep_1b = torch.tensor(999).float().expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 2048]) added_cond_kwargs_1b = {"text_embeds": torch.randn([1, 1280]), "time_ids": torch.randn([1, 6])} example_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b["text_embeds"], added_cond_kwargs_1b["time_ids"],) unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base'), compiler_args=["--model-type=unet-inference"] ) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_base/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron # --- Compile UNet in fp32 (refiner) and save --- pipe_refiner = DiffusionPipeline.from_pretrained(refiner_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True) # Replace original cross-attention module with custom cross-attention module for better performance Attention.get_attention_scores = get_attention_scores_neuron # Apply double wrapper to deal with custom return type pipe_refiner.unet = NeuronUNet(UNetWrap(pipe_refiner.unet)) # Only keep the model being compiled in RAM to minimze memory pressure unet = copy.deepcopy(pipe_refiner.unet.unetwrap) del pipe_refiner # Compile unet - fp32 - some input shapes are different from base sample_1b = torch.randn([1, 4, 128, 128]) timestep_1b = torch.tensor(999).float().expand((1,)) encoder_hidden_states_1b = torch.randn([1, 77, 1280]) added_cond_kwargs_1b = {"text_embeds": torch.randn([1, 1280]), "time_ids": torch.randn([1, 5])} example_inputs = (sample_1b, timestep_1b, encoder_hidden_states_1b, added_cond_kwargs_1b["text_embeds"], added_cond_kwargs_1b["time_ids"],) unet_neuron = torch_neuronx.trace( unet, example_inputs, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner'), compiler_args=["--model-type=unet-inference"] ) # Enable asynchronous and lazy loading to speed up model load torch_neuronx.async_load(unet_neuron) torch_neuronx.lazy_load(unet_neuron) # save compiled unet unet_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'unet_refiner/model.pt') torch.jit.save(unet_neuron, unet_filename) # delete unused objects del unet del unet_neuron # --- Compile VAE decoder and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True) decoder = copy.deepcopy(pipe.vae.decoder) del pipe # Compile vae decoder decoder_in = torch.randn([1, 4, 128, 128], dtype=DTYPE) decoder_neuron = torch_neuronx.trace( decoder, decoder_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder') ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(decoder_neuron) # Save the compiled vae decoder decoder_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_decoder/model.pt') torch.jit.save(decoder_neuron, decoder_filename) # delete unused objects del decoder del decoder_neuron # --- Compile VAE post_quant_conv and save --- # Only keep the model being compiled in RAM to minimze memory pressure pipe = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=DTYPE, low_cpu_mem_usage=True) post_quant_conv = copy.deepcopy(pipe.vae.post_quant_conv) del pipe # Compile vae post_quant_conv post_quant_conv_in = torch.randn([1, 4, 128, 128], dtype=DTYPE) post_quant_conv_neuron = torch_neuronx.trace( post_quant_conv, post_quant_conv_in, compiler_workdir=os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv'), ) # Enable asynchronous loading to speed up model load torch_neuronx.async_load(post_quant_conv_neuron) # Save the compiled vae post_quant_conv post_quant_conv_filename = os.path.join(COMPILER_WORKDIR_ROOT, 'vae_post_quant_conv/model.pt') torch.jit.save(post_quant_conv_neuron, post_quant_conv_filename) # delete unused objects del post_quant_conv del post_quant_conv_neuron ================================================ FILE: archive/src/benchmark/pytorch/unet_benchmark.py ================================================ import torch import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_name = "UNet" batch_sizes = [1, 4] n_models = [1, 2] workers_per_model = [1, 2] # optimized for latency or throughput def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/unet_compile.py ================================================ import torch import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_name = "UNet" batch_sizes = [1, 4] pipeline_sizes = [1] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": # UNet Implementation from https://github.com/milesial/Pytorch-UNet # load the model model = torch.hub.load('milesial/Pytorch-UNet', 'unet_carvana', pretrained=False) # load the weights state_dict = torch.hub.load_state_dict_from_url('https://github.com/milesial/Pytorch-UNet/releases/download/v3.0/unet_carvana_scale0.5_epoch2.pth', map_location="cpu") model.load_state_dict(state_dict) inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Compile print("Compiling {}".format(filename)) npf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/src/benchmark/pytorch/vgg_benchmark.py ================================================ import torch import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_names = ["vgg11", "vgg16"] batch_sizes = [1, 8, 64] n_models = [1, 2] workers_per_model = [1, 2] # optimized for latency or throughput def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for model_name in model_names: inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.torch.benchmark(filename, inputs, n_models=n_models, workers_per_model=workers_per_model) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: archive/src/benchmark/pytorch/vgg_compile.py ================================================ import torch import torchvision import neuronperf as npf import neuronperf.torch # Add to these lists or change as needed model_names = ["vgg11", "vgg16"] batch_sizes = [1, 8, 64] pipeline_sizes = [1] def get_batch(batch_size): return torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32) if __name__ == "__main__": for model_name in model_names: model = getattr(torchvision.models, model_name)(pretrained=True) inputs = [get_batch(batch_size) for batch_size in batch_sizes] filename = f"{model_name}.json" # Compile print("Compiling {}".format(filename)) npf.torch.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: archive/tensorboard/getting-started-tensorboard-neuron-plugin.rst ================================================ .. _neuron-plugin-tensorboard: .. meta:: :noindex: :nofollow: :description: This page for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. :date-modified: 12-02-2025 Neuron Plugin for TensorBoard (Inf1) ==================================== .. contents:: Table of Contents :local: :depth: 2 Overview -------- This guide is for developers who want to better understand how their model is executed using Neuron SDK through TensorBoard. The Neuron plugin for TensorBoard provides metrics to the performance of machine learning tasks accelerated using the Neuron SDK. It is compatible with TensorBoard versions 1.15 and higher. It provides visualizations and profiling results for graphs executed on NeuronCores. .. note:: The following information is compatible with Neuron SDK for Inf1. For a walkthrough on the latest version, please check out the guide :ref:`neuronx-plugin-tensorboard`. .. note:: Graph visualization is currently only supported for TensorFlow-Neuron. Support for MXNet-Neuron and PyTorch-Neuron visualization will be added in a future release. Compile the neural network -------------------------- 3. Refer to the following guides on how to compile a graph using Neuron SDK. - TensorFlow-Neuron - :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb` - PyTorch-Neuron: - "Compile model for Neuron" in `PyTorch-Neuron Resnet50 Tutorial`_ - MXNet-Neuron: - :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb` Enable profiling ----------------- In this step, we enable Neuron profile data collection and collect results from executing an inference. 4.1. To start profiling the neural network and collect inference traces, create a directory where profile data will be dumped and set the ``NEURON_PROFILE`` environment variable. In this example, we will assume this directory is ``$HOME/profile`` .. code:: bash mkdir -p $HOME/profile export NEURON_PROFILE=$HOME/profile 4.2. Ensure Neuron Tools are executable by setting the ``PATH`` environment variable. .. code:: bash export PATH=/opt/aws/neuron/bin:$PATH 4.3. Execute inference! .. note:: Please run the inference script outside of Jupyter notebook. Profiling in Jupyter notebook is not supported at this time. .. note:: Please ensure the inference script executes only one inference, as profiling results are currently only supported for a single inference. For more info on how to execute inference, refer to the following guides: - TensorFlow-Neuron - :ref:`/src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb` - PyTorch-Neuron - "Run inference on Single Core" in :ref:`/src/examples/pytorch/resnet50.ipynb` - MXNet-Neuron - :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb` 4.4. Check if profiling results were successfully saved. In the directory pointed to by ``NEURON_PROFILE`` environment variable set in Step 4.1, there should be at least two files, one with the ``.neff`` extension and one with the ``.ntff`` extension. For TensorFlow-Neuron users, the graph file (``.pb``) will also be in this directory. .. code:: bash ls $NEURON_PROFILE Launch TensorBoard ------------------ In this step, we will process the Neuron profile data and launch TensorBoard. 5.1. Install the Neuron plugin for Tensorboard. .. include:: /setup/install-templates/inf1/tensorboard-plugin-neuron-pip-install.rst 5.2. After collecting the raw profile data, we need to post-process it to create the log files used by the Neuron plugin. This can be done when launching TensorBoard by passing an extra flag ``--run_neuron_profiler``. Using this flag will create the directory specified by ``--logdir`` and populate it with Neuron plugin data. Please note that the ``NEURON_PROFILE`` environment variable set in Step 4.1 must still point to the same directory as before. .. code:: bash tensorboard --logdir results --run_neuron_profiler .. note:: If using TensorBoard >= 2.5, please use the ``--load_fast=false`` option when launching. ``tensorboard --logdir results --run_neuron_profiler --load_fast=false`` 5.3. After you see the following message, TensorBoard is ready to use. By default, TensorBoard will be launched at ``localhost:6006`` on the Deployment Instance. :: ... Running neuron-profile Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit) View results in TensorBoard --------------------------- In this step, we will view the Neuron plugin for TensorBoard from a browser on your local development machine. 6.1. Connect to the Deployment Instance while enabling port forwarding. In this example, we assume TensorBoard has been launched using the default address ``localhost:6006`` on the Deployment Instance. .. code:: bash # if Ubuntu-based AMI ssh -i ubuntu@ -L 6006:localhost:6006 # if AL2-based AMI ssh -i ec2-user@ -L 6006:localhost:6006 6.2. In a browser, visit |tensorboard_address|. 6.3. In the top navigation bar, switch from ``Graphs`` to ``Neuron``. If it does not show up, please wait a while and refresh the page while the plugin loads. If the issue persists, check the ``Inactive`` dropdown list on the right and check for ``Neuron``. |image1| 6.4. If TensorBoard failed to find the generated logs, you will see the following message: |image10| In this case, please check the console output on the Deployment Instance where TensorBoard was launched for any warnings or error messages, and make sure the version of the ``aws-neuron-tools`` package is compatible. .. _tensorboard-plugin-visualize-graph: Visualize graphs executed on Neuron ----------------------------------- .. _tensorboard-plugin-graph-device: Show how the graph was partition to run on NeuronCores ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To view how the graph was partitioned to run on NeuronCores, select "Device" under "Graph Color Schemes" in the left navigation bar. |image2| Each operator will be colored according to the device used. In this example, light blue indicates an operator was executed on CPU, and orange indicates the operator was executed on NeuronCores. Operators that are white may have been optimized by the Neuron compiler and fused into another operation. .. _tensorboard-plugin-graph-time: Inspect which operators consumes the most time ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can also view how long each operator took by changing to the "Compute time" color scheme. |image3| This view will show time taken by each layer and will be colored according to how much relative time the layer took to compute. A lighter shade of red means that a relatively small portion of compute time was spent in this layer, while a darker red shows that more compute time was used. .. _tensorboard-plugin-graph-supported-ops: Check out Neuron support operators for each framework ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The "Compatibility" color scheme allows you to better understand what operators are currently supported by the Neuron compiler - green for compatible ops, red for incompatible ops, and yellow for subgraphs that contain both compatible and incompatible ops. |image4| .. _tensorboard-plugin-graph-filter-device: Filter view by device ^^^^^^^^^^^^^^^^^^^^^ Additionally, you can choose to filter by CPU and NeuronCores, which will only color ops that match the selected device(s). |image5| Expand/collapse subgraphs and view operator details ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each rectangular node in the graph represents a subgraph that can be expanded or collapse by clicking on the name. Operators will be represented by ellipses, and can be clicked to reveal more information on that operator, such as inputs and execution device. |image11| The ``Expand All`` and ``Collapse All`` buttons can be used to expand or collapse every subgraph. When using these features, the positioning of the graph may change when redrawing the new graph. Try using ``Reset Position`` button and zoom out by scrolling if the graph appears to be missing. .. _tensorboard-plugin-view-profile: Viewing the Neuron profile data ------------------------------- On the right side of the Neuron plugin, information on the profiled inference will be displayed. .. _tensorboard-plugin-profile-summary: See performance summary ^^^^^^^^^^^^^^^^^^^^^^^ First is the "Neuron Performance Summary," which gives a quick overview on how Neuron executed the graph, including information on the number of NeuronCores and both on-NeuronCore time and on-CPU time. |image6| .. _tensorboard-plugin-profile-nc: Get a breakdown of time spent per NeuronCore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Next, the "Neuron Execution" will give more details on how a graph was partitioned for Neuron. Each entry in the table will show the order it was executed in, what type of device was used, the compute time (in microseconds), and the percentage of total time spent. To dive deeper into subgraphs, you can check the "Show Details" box to display the breakdown per NeuronCore. |image7| .. _tensorboard-plugin-profile-op: Get a breakdown of time spent per operator ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The "Op Time Table" section shows the cycle count per operator, much like the "Compute time" coloring for graph visualization. This table can be sorted by clicking the column names, and searched using the provided text box in the top right corner. Due to Neuron compiler optimizations, some of the compute may not be associated with any specific operator and will be categorized as ``unknown``. Additionally, time spent moving data to and from NeuronCores will fall under ``(ND_ENGINE_LOAD)``. |image8| .. |image1| image:: /images/tb-plugin-img1.png :height: 2914 :width: 5344 :scale: 10% .. |image2| image:: /images/tb-plugin-img2.png :height: 2914 :width: 5344 :scale: 10% .. |image3| image:: /images/tb-plugin-img3.png :height: 2914 :width: 5344 :scale: 10% .. |image4| image:: /images/tb-plugin-img4.png :height: 2914 :width: 5344 :scale: 10% .. |image5| image:: /images/tb-plugin-img5.png :height: 2914 :width: 5344 :scale: 10% .. |image6| image:: /images/tb-plugin-img6.png :height: 2914 :width: 5344 :scale: 10% .. |image7| image:: /images/tb-plugin-img7.png :height: 2914 :width: 5344 :scale: 10% .. |image8| image:: /images/tb-plugin-img8.png :height: 2914 :width: 5344 :scale: 10% .. |image9| image:: /images/tb-plugin-img9.png :height: 2914 :width: 5344 :scale: 10% .. |image10| image:: /images/tb-plugin-img10.png :height: 2914 :width: 5344 :scale: 10% .. |image11| image:: /images/tb-plugin-img11.png :height: 2826 :width: 5341 :scale: 10% .. _PyTorch-Neuron Resnet50 Tutorial: ../../src/examples/pytorch/resnet50.ipynb .. |tensorboard_address| raw:: html localhost:6006 ================================================ FILE: archive/tensorflow/index.rst ================================================ .. _tensorflow-neuron-main: .. _tensorflow-neuron: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow Neuron ================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. TensorFlow Neuron unlocks high-performance and cost-effective deep learning acceleration on AWS Trainium-based and Inferentia-based Amazon EC2 instances. TensorFlow Neuron enables native TensorFlow models to be accelerated on Neuron devices, so you can use your existing framework application and get started easily with minimal code changes. .. toctree:: :maxdepth: 1 :hidden: /archive/tensorflow/tensorflow-setup .. toctree:: :maxdepth: 2 :hidden: Inference (Inf2 & Trn1) Inference (Inf1) .. card:: Tensorflow NeuronX for Inference on ``Inf2`` & ``Trn1`` / ``Trn1n`` :link: inference-tensorflow-neuronx :link-type: ref :class-body: sphinx-design-class-title-small .. card:: Tensorflow Neuron for Inference on ``Inf1`` :link: inference-tensorflow-neuron :link-type: ref :class-body: sphinx-design-class-title-small ================================================ FILE: archive/tensorflow/setup-legacy-inf1-tensorflow.rst ================================================ .. meta:: :description: Legacy TensorFlow installation guide for AWS Inferentia 1 (Inf1) instances :keywords: tensorflow, neuron, inf1, legacy, installation, tensorflow-neuron :framework: tensorflow :instance-types: inf1 :status: legacy :content-type: legacy-guide :date-modified: 2026-03-30 TensorFlow on Inf1 (legacy) ============================= .. warning:: **Legacy hardware**: Inf1 instances use NeuronCore v1 with TensorFlow 2.x (``tensorflow-neuron``). For new projects, use **Inf2, Trn1, Trn2, or Trn3** with PyTorch 2.9+ or JAX 0.7+. See :ref:`setup-guide-index` for current setup options. .. note:: TensorFlow support for Inf2 has reached end of support as of Neuron SDK 2.29. See :ref:`announce-eos-tensorflow-inf2` for details. Setup instructions ------------------ For complete Inf1 TensorFlow setup instructions, see the original setup guides: - :doc:`/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update` - TensorFlow Neuron setup and updates - :doc:`/archive/tensorflow/tensorflow-neuron-inference` - Inference on Inf1 The setup guides cover: - Ubuntu 20, Ubuntu 22, and Amazon Linux 2 installation - DLAMI-based installation - Manual pip installation - TensorFlow 2.10.1, 2.9.3, and 2.8.4 versions Verification ------------ After installation, verify with: .. code-block:: python import tensorflow as tf import tensorflow_neuron print(f"TensorFlow version: {tf.__version__}") .. code-block:: bash neuron-ls Next steps ---------- - :doc:`/archive/tensorflow/tensorflow-neuron-inference` - Inference tutorials for Inf1 - :ref:`setup-guide-index` - Current setup options (Inf2, Trn1, Trn2, Trn3) ================================================ FILE: archive/tensorflow/tensorflow-neuron/additional-examples.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Additional Examples (``tensorflow-neuron``) =========================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: AWS Neuron Samples GitHub Repository .. include:: /archive/tensorflow/tensorflow-neuron/additional-examples.txt ================================================ FILE: archive/tensorflow/tensorflow-neuron/additional-examples.txt ================================================ * `AWS Neuron Samples GitHub Repository `_ ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-auto-replication-api.rst ================================================ .. _tensorflow-ref-auto-replication-python-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuron``) Auto Multicore Replication (Beta) =================================================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The Neuron auto multicore replication Python API enables modifying TensorFlow 2.x traced models so that they can be automatically replicated across multiple cores. For Tensorflow-Serving models and TensorFlow 1.x models, see :ref:`tensorflow-ref-auto-replication-cli-api` .. contents:: Table of contents :local: :depth: 1 TensorFlow 2.x (``tensorflow-neuron TF2.x``) Auto Multicore Replication Python API (Beta) ----------------------------------------------------------------------------------------------------------- Method ^^^^^^ ``tensorflow.neuron.auto_multicore`` Description ^^^^^^^^^^^ Converts an existing AWS-Neuron-optimized ``keras.Model`` and returns an auto-replication tagged AWS-Multicore-Neuron-optimized ``keras.Model`` that can execute on AWS Machine Learning Accelerators. Like the traced model, the returned ``keras.Model`` will support inference only. Attributes or variables held by the original function or ``keras.Model`` will be dropped. The auto model replication feature in TensorFlow-Neuron enables you to create a model once and the model parallel replication would happen automatically. The desired number of cores can be less than the total available NeuronCores on an Inf1 instance but not less than 1. This reduces framework memory usage as you are not loading the same model multiple times manually. Calls to the returned model will execute the call on each core in a round-robin fashion. The returned ``keras.Model`` can be exported as SavedModel and served using TensorFlow Serving. Please see the TensorFlow Serving documentation for more information about exporting to saved model and serving using TensorFlow Serving. Note that the automatic replication will only work on models compiled with pipeline size 1: via ``--neuroncore-pipeline-cores=1``. If auto replication is not enabled, the model will default to replicate on up to 4 cores. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. Arguments ^^^^^^^^^ - **func:** The ``keras.Model`` or function to be traced. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. - **num_cores:** The desired number of cores where the model will be automatically replicated across Returns ^^^^^^^ - An AWS-Multicore-Neuron-optimized ``keras.Model``. Example Python API Usage for TF2.x traced models: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code :: python input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) inputs = [input0] outputs = [dense0] model = tf.keras.Model(inputs=inputs, outputs=outputs) input0_tensor = tf.random.uniform([1, 3]) model_neuron = tfn.trace(model, input0_tensor) num_cores = 4 multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores) multicore_model(input0_tensor) Example Python API Usage for TF2.x saved models: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code :: python from tensorflow.python import saved_model input0_tensor = tf.random.uniform([1, 3]) num_cores = 4 reload_model = saved_model.load(model_dir) multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores) .. _tensorflow-ref-auto-replication-cli-api: TensorFlow Neuron 2.x (``tensorflow-neuron``) Auto Multicore Replication CLI (Beta) --------------------------------------------------------------------------------------------------------------- The Neuron auto multicore replication CLI enables modifying TensorFlow 1.x and Tensorflow 2.x traced saved models so that they can be automatically replicated across multiple cores. By performing this call on Tensorflow Saved Models, we can support both Tensorflow-Serving and Tensorflow 1.x without significant modifications to the code. Note that the python API does not support Tensorflow 1.x. Method ^^^^^^ ``tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR`` Arguments ^^^^^^^^^ - **MODEL_DIR:** The directory of a saved AWS-Neuron-optimized ``keras.Model``. - **NUM_CORES:** The desired number of cores where the model will be automatically replicated across - **NEW_MODEL_DIR:** The directory of where the AWS-Multicore-Neuron-optimized ``keras.Model`` will be saved ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-compilation-python-api.rst ================================================ .. _tensorflow-ref-neuron-compile-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 1.x (``tensorflow-neuron``) Compilation API ======================================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The Neuron compilation API for TensorFlow 1.x enables compilation of saved model to an Inferentia target. Method ------ ``tensorflow.neuron.saved_model.compile`` Description ----------- Within the graph or subgraph, the compile method selects and send Neuron-supported operations to Neuron-Compiler for compilation and saves the compiled artifacts in the graph. Uncompilable operations are kept as original operations for framework execution. The compiled graph can be exported to saved model and served using TensorFlow Serving. Please see the TensorFlow Serving documentation for more information about exporting to saved model and serving using TensorFlow Serving. Options can be passed to Neuron compiler via the compile function. For example, the “\ ``--neuroncore-pipeline-cores``\ ” option directs Neuron compiler to compile each subgraph to fit in the specified number of NeuronCores. This number can be less than the total available NeuronCores on an Inf1 instance. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. Arguments --------- - **model_dir:** The path of the original ``SavedModel``. - **new_model_dir:** The path to which the Neuron-optimized ``SavedModel`` will be stored. - **batch_size:** (Optional) Positive integer representing batch size used in inference. The default value is 1. - **model_shape_feed_dict:** (Optional) Dictionary {str: list} used for inferring tensor shapes. Keys should match model input names. Values are lists of positive integers representing model input tensor shapes. - **model_feed_dict:** (Optional) Dictionary {str: numpy.array} used for inference. Useful for inferring tensor shapes. Keys should match model input names. Values are numpy arrays that can be fed as inputs to the ``SavedModel``. - **tags:** (Optional) Iterable of strings to identify the required ``MetaGraphDef``. These should correspond to the tags used when saving the variables using the ``SavedModel`` ``save()`` API. Default is to use the first ``tag_set`` available in the ``SavedModel``. - **signature_def_key:** (Optional) String specifying the ``signature_def`` to use. Default is to use 'serving_default' or the first ``signature_def`` corresponding to ``tags``. - **minimum_segment_size:** (Optional) Integer indicating the minimum number of operations in an NeuronOp. - **no_fuse_ops:** (Optional) None or iterable of strings (unordered) representing names of operations that are forcibly placed on CPU. - **compiler_args:** (Optional) List of strings representing neuron-cc compiler arguments. Note that these arguments apply to all subgraphs generated by whitelist partitioning. For example, use ``compiler_args=['--neuroncore-pipeline-cores', '4']`` to set number of NeuronCores per subgraph to 4. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. - **compiler_workdir:** (Optional) String representing work directory of the neuron-cc compiler. Returns ------- - Dictionary with operator counts before/after optimization. - Operator count statistics are displayed to show original count, post-optimization count, and the number placed on Neuron runtime. For example: :: INFO:tensorflow:Number of operations in TensorFlow session: 3978 INFO:tensorflow:Number of operations after tf.neuron optimizations: 555 INFO:tensorflow:Number of operations placed on Neuron runtime: 554 Example Usage ------------- .. code:: python import shutil import tensorflow.neuron as tfn saved_model_path = "" compiled_saved_model_path = "" shutil.rmtree(compiled_saved_model_path, ignore_errors=True) tfn.saved_model.compile(saved_model_path, compiled_saved_model_path) ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-reference-guide.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 API Reference Guide (``tensorflow-neuron``) =========================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /archive/tensorflow/tensorflow-neuron/api-tracing-python-api /archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api /archive/tensorflow/tensorflow-neuron/api-auto-replication-api .. include:: /archive/tensorflow/tensorflow-neuron/api-reference-guide.txt ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-reference-guide.txt ================================================ * :ref:`tensorflow-ref-neuron-tracing-api` * :ref:`tensorflow-ref-neuron-analyze_model-api` * :ref:`tensorflow-ref-auto-replication-python-api` ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api.rst ================================================ .. _tensorflow-ref-neuron-analyze_model-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuron``) analyze_model API ======================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. Method ------ ``tensorflow.neuron.analyze_model`` Description ----------- Analyzes a ``keras.Model`` or a Python callable that can be decorated by ``tf.function`` for it's compatibility with Neuron. It displays supported vs. unsupported operators in the model as well as percentages and counts of each operator and returns a dictionary with operator statistics. Arguments --------- - **func:** The ``keras.Model`` or function to be analyzed. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. Returns ------- - A results ``dict`` with these keys: ``'percent_supported', 'supported_count', 'total_count', 'supported_operators', 'unsupported_operators', 'operators', 'operator_count'``. Example Usage ------------- .. code:: python import tensorflow as tf import tensorflow.neuron as tfn input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) model = tf.keras.Model(inputs=[input0], outputs=[dense0]) example_inputs = tf.random.uniform([1, 3]) results = tfn.analyze_model(model, example_inputs) print(results) # expected output ''' BiasAdd MatMul 100.00% of all operations (2 of 2) are supported {'percent_supported': 100.0, 'supported_count': 2, 'total_count': 2, 'supported_operators': {'BiasAdd', 'MatMul'}, 'unsupported_operators': [], 'operators': ['BiasAdd', 'MatMul'], 'operator_count': {'MatMul': 1, 'BiasAdd': 1}} ''' ================================================ FILE: archive/tensorflow/tensorflow-neuron/api-tracing-python-api.rst ================================================ .. _tensorflow-ref-neuron-tracing-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuron``) Tracing API =================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The Neuron tracing API enables tracing TensorFlow 2.x models for deployment on AWS Machine Learning Accelerators. Method ------ ``tensorflow.neuron.trace`` Description ----------- Trace a ``keras.Model`` or a Python callable that can be decorated by ``tf.function``, and return an AWS-Neuron-optimized ``keras.Model`` that can execute on AWS Machine Learning Accelerators. Tracing is ideal for ``keras.Model`` that accepts a list of ``tf.Tensor`` objects and returns a list of ``tf.Tensor`` objects. It is expected that users will provide example inputs, and the ``trace`` function will execute ``func`` symbolically and convert it to a ``keras.Model``. The returned ``keras.Model`` will support inference only. Attributes or variables held by the original function or ``keras.Model`` will be dropped. The returned ``keras.Model`` can be exported as SavedModel and served using TensorFlow Serving. Please see the TensorFlow Serving documentation for more information about exporting to saved model and serving using TensorFlow Serving. The returned ``keras.Model`` has an ``.on_neuron_ratio`` attribute which shows the percentage of ops mapped to neuron hardware. This calculation ignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp. Options can be passed to Neuron compiler via the environment variable ``NEURON_CC_FLAGS``. For example, the syntax ``env NEURON_CC_FLAGS="--neuroncore-pipeline-cores=4"`` directs Neuron compiler to compile each subgraph to fit in the specified number of NeuronCores. This number can be less than the total available NeuronCores on an Inf1 instance. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. Arguments --------- - **func:** The ``keras.Model`` or function to be traced. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. - **subgraph_builder_function:** (Optional) A callable with signature ``subgraph_builder_function(node : NodeDef) -> bool`` (``NodeDef`` is defined in tensorflow/core/framework/node_def.proto) that is used as a call-back function to determine which part of the tensorflow GraphDef given by tracing ``func`` will be placed on Machine Learning Accelerators. If ``subgraph_builder_function`` is not provided, then ``trace`` will automatically place operations on Machine Learning Accelerators or on CPU to maximize the execution efficiency. If it is provided, and ``subgraph_builder_function(node)`` returns ``True``, and placing ``node`` on Machine Learning Accelerators will not cause deadlocks during execution, then ``trace`` will place ``node`` on Machine Learning Accelerators. If ``subgraph_builder_function(node)`` returns ``False``, then ``trace`` will place ``node`` on CPU. Special Flags ------------- These are flags that get passed directly to the Neuron tracing API (rather than the Neuron Compiler). The flags are still passed via the environment variable ``NEURON_CC_FLAGS``. - **workdir:** example usage - ``NEURON_CC_FLAGS='--workdir ./artifacts'`` will create a folder named artifacts in the current directory and save artifacts that can be used for debug. - **dynamic-batch-size:** example usage - ``NEURON_CC_FLAGS='--dynamic-batch-size'`` A flag to allow Neuron graphs to consume variable sized batches of data. Dynamic sizing is restricted to the 0th dimension of a tensor. - **extract-weights (Beta):** example usage - ``NEURON_CC_FLAGS='--extract-weights inf1.2xlarge'`` will reduce the compiled model's protobuf size by taking the weights out of the protobuf. Useful for compiling large models that would exceed the 2GB protobuf size limit. This feature is in beta. Model performance is not guaranteed and the flag does not work in combination with ``--neuroncore-pipeline-cores``, ``--dynamic-batch-size``, models with multiple NEFFs, and models that are 4GB or greater. Compiles models for different neuron instances depending on the instance type passed. Supports all inf1 instance types. Returns ------- - An AWS-Neuron-optimized ``keras.Model``. Example Usage ------------- .. code:: python import tensorflow as tf import tensorflow.neuron as tfn input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) model = tf.keras.Model(inputs=[input0], outputs=[dense0]) example_inputs = tf.random.uniform([1, 3]) model_neuron = tfn.trace(model, example_inputs) # trace # check to see how much of the model was compiled successfully print(model_neuron.on_neuron_ratio) model_dir = './model_neuron' model_neuron.save(model_dir) model_neuron_reloaded = tf.keras.models.load_model(model_dir) Example Usage with Manual Device Placement Using ``subgraph_builder_function`` ------------------------------------------------------------------------------ .. code:: python import tensorflow as tf import tensorflow.neuron as tfn input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) reshape0 = tf.keras.layers.Reshape([1, 3])(dense0) output0 = tf.keras.layers.Dense(2)(reshape0) model = tf.keras.Model(inputs=[input0], outputs=[output0]) example_inputs = tf.random.uniform([1, 3]) def subgraph_builder_function(node): return node.op == 'MatMul' model_neuron = tfn.trace( model, example_inputs, subgraph_builder_function=subgraph_builder_function, ) .. important :: Although the old API ``tensorflow.neuron.saved_model.compile`` is still available under tensorflow-neuron 2.x, it supports only the limited capabilities of ``tensorflow.neuron.trace`` and will be deprecated in future releases. ================================================ FILE: archive/tensorflow/tensorflow-neuron/dlc-then-ec2-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/dlc-then-ec2-devflow.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/dlc-then-ecs-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/dlc-then-ecs-devflow.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/dlc-then-eks-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/dlc-then-eks-devflow.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/ec2-then-ec2-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/ec2-then-ec2-devflow.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Misc (``tensorflow-neuron``) ============================ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron-v2 /archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops .. include:: /archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt ================================================ FILE: archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt ================================================ * :ref:`tensorflow-neuron-rn-v2` * :ref:`tensorflow-ref-neuron-accelerated-ops` ================================================ FILE: archive/tensorflow/tensorflow-neuron/neo-then-hosting-devflow.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /devflows/inference/neo-then-hosting-devflow.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.14.2-tensorflow-install.rst ================================================ .. _install-neuron-1.14.2-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron (Neuron 1.14.2) ====================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.0-tensorflow-install.rst ================================================ .. _install-neuron-1.15.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron (Neuron 1.15.0) ====================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.1-tensorflow-install.rst ================================================ .. _install-neuron-1.15.1-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron (Neuron 1.15.1) ========================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.2-tensorflow-install.rst ================================================ .. _install-neuron-1.15.2-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron (Neuron 1.15.2) ========================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: TensorFlow 2.4.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.4.2 .. tab-item:: TensorFlow 2.3.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.3.3 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.16.3-tensorflow-install.rst ================================================ .. _install-neuron-1.16.3-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.0-tensorflow-install.rst ================================================ .. _install-neuron-1.17.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.0 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.1-tensorflow-install.rst ================================================ .. _install-neuron-1.17.1-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.1 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.2-tensorflow-install.rst ================================================ .. _install-neuron-1.17.2-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.5.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: TensorFlow 2.4.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.4.3 .. tab-item:: TensorFlow 2.3.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.3.4 .. tab-item:: TensorFlow 2.2.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.2.3 .. tab-item:: TensorFlow 2.1.4 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-2.1.4 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.18.0-tensorflow-install.rst ================================================ .. _install-neuron-1.18.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.19.0-tensorflow-install.rst ================================================ .. _install-neuron-1.19.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.8.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: TensorFlow 2.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.7.1 .. tab-item:: TensorFlow 2.6.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.6.3 .. tab-item:: TensorFlow 2.5.3 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-2.5.3 .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install tensorflow --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=tensorflow-1.15.5 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2023.rst ================================================ .. _tensorflow-neuron-install-prev-al2023: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u20.rst ================================================ .. _tensorflow-neuron-install-prev-u20: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u22.rst ================================================ .. _tensorflow-neuron-install-prev-u20: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous Tensorflow Neuron Releases for Ubuntu (``tensorflow-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev.rst ================================================ .. _install-prev-neuron-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install previous TensorFlow Neuron releases =========================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. toctree:: :maxdepth: 1 Neuron 1.19.0 Neuron 1.18.0 Neuron 1.17.2 Neuron 1.17.1 Neuron 1.17.0 Neuron 1.16.3 Neuron 1.15.2 Neuron 1.15.1 Neuron 1.15.0 Neuron 1.14.2 ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-install.rst ================================================ .. _install-neuron-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow Neuron ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u20.rst ================================================ .. _tensorflow-neuron-u20-update: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest TensorFlow Neuron (``tensorflow-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u22.rst ================================================ .. _tensorflow-neuron-u20-update: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest TensorFlow Neuron (``tensorflow-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/setup/tensorflow-update.rst ================================================ .. _update-neuron-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest TensorFlow Neuron =============================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: TensorFlow 2.10.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.9.3 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.8.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 2.7.4 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=2.7.4 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: TensorFlow 1.15.5 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=tensorflow --framework-version=1.15.5 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops.rst ================================================ .. _tensorflow-ref-neuron-accelerated-ops: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuron``) Accelerated (``torch-neuron``) Python APIs and Graph Ops ====================================================================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. This page lists TensorFlow 2.x Python APIs and graph operators that are accelerated by AWS Neuron. The lists are not exhaustive. TensorFlow 2.x Python APIs or graph operators that are not listed here may still be accelerated if they are composed of accelerated primitives, or they will be executed on CPU without significant acceleration. The TensorFlow Neuron integration contains an automatic operator-device-placement mechanism that strives to maximize the execution efficiency of your deep learning models on AWS Machine Learning ASIC instances. Accelerated Python APIs -------------------------------- +---------------+-----------------------------------+-----------------------------------------------------------+ | Module | Accelerated Python API | Comments | +===============+===================================+===========================================================+ | ``tf`` | ``tf.abs`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.add`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.add_n`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.broadcast_static_shape`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.cast`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.constant`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.convert_to_tensor`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.cumsum`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.einsum`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.erf`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.exp`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.identity`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.matmul`` | Uses float16/bfloat16 matmul with float32 accumulation. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.maximum`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.minimum`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.multiply`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.negative`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.range`` | ``start``, ``limit`` and ``delta`` arguments must be | | | | compile-time constants. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.realdiv`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reciprocal`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_all`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_any`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_max`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_min`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_prod`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reduce_sum`` | ``axis`` must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.reshape`` | ``shape`` argument must be a compile-time constant. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.rsqrt`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.scalar_mul`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.shape`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.shape_n`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.sigmoid`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.size`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.slice`` | ``size`` must be a compile-time constant. In addition, | | | | | | | | either ``begin`` must be a compile-time constant or | | | | | | | | ``size`` must be non-negative. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.sqrt`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.square`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.squared_difference`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.squeeze`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.stack`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.stop_gradient`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.strided_slice`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.tanh`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.tensordot`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.to_bfloat16`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.to_float`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.truediv`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | ``tf.layers`` | ``tf.layers.batch_normalization`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.layers.dense`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.layers.flatten`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | ``tf.nn`` | ``tf.nn.batch_normalization`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.bias_add`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.dropout`` | Always treated as ``tf.identity`` during inference. | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.fused_batch_norm`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.leaky_relu`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.relu`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.relu6`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.relu_layer`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ | | ``tf.nn.softmax`` | | +---------------+-----------------------------------+-----------------------------------------------------------+ Accelerated graph operators -------------------------------- .. code:: python Add AddN AddV2 BatchMatMul BatchMatMulV2 BiasAdd Cast Const Cumsum Einsum Erf Exp ExpandDims FusedBatchNorm FusedBatchNormV2 FusedBatchNormV3 Greater Identity LeakyRelu MatMul Max Maximum Minimum Mean Mul Neg Pack RealDiv Relu Relu6 Reshape Rsqrt Sigmoid Softmax Split SplitV Sqrt Square SquaredDifference Squeeze StridedSlice Sub Sum Tanh Transpose Unpack The lists share many commonalities with `Available TensorFlow Ops `_. Portions of this page are modifications based on work created and `shared by Google `_ and used according to terms described in the `Creative Commons 4.0 Attribution License `_. ================================================ FILE: archive/tensorflow/tensorflow-neuron/tf2_faq.rst ================================================ .. _tf2_faq: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x FAQ =================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 1 How do I get started with TensorFlow? ------------------------------------- The easiest entry point is the tutorials offered by the AWS Neuron team. For beginners, the :ref:`HuggingFace DistilBERT Tutorial ` is a good place to start. What TensorFlow versions are supported by Neuron? ------------------------------------------------- The AWS Neuron provide well-tested tensorflow-neuron packages that work with a range of tensorflow official releases, as long as the version of tensorflow-neuron matches that of tensorflow. For example, you may install ``tensorflow-neuron==2.3.3.1.0.9999.0`` on top of ``tensorflow==2.3.3`` and expect them to work together. Currently, tensorflow-neuron can work with tensorflow versions 2.1.4, 2.2.3, 2.3.3, 2.4.2, 2.5.0. In a fresh Python environment, ``pip install tensorflow-neuron`` would bring in the highest version (2.5.0 as of 07/13/2021), which then pulls ``tensorflow==2.5.0`` into the current environment. If you already have a particular version of tensorflow 2.x installed, then it is recommended to pay attention to the precise version of tensorflow-neuron and only install the desired one. For example, in an existing Python environment with ``tensorflow==2.3.3`` installed, you may install tensorflow-neuron by pip install ``tensorflow-neuron==2.3.3``, which will reuse the existing tensorflow installation. What operators are supported? ----------------------------- Due to fundamental backend design changes in the TensorFlow 2.x framework, the concept of "supported graph operators" is no longer well-defined. Please refer to :ref:`Accelerated Python APIs and graph operators ` for a guide to the set of TensorFlow 2.x Python APIs and graph operators that can be accelerated by Neuron. How do I compile my model? -------------------------- It is achieved by a new public API called tfn.trace, which resembles the compilation API of AWS PyTorch Neuron integration. Programmatically, customers would be able to execute the following code. .. code:: import tensorflow as tf import tensorflow.neuron as tfn ... model = tf.keras.Model(inputs=inputs, outputs=outputs) model_neuron = tfn.trace(model, example_inputs) model_neuron.save('./model_neuron_dir') ... model_loaded = tf.saved_model.load('./model_dir') predict_func = model_loaded['serving_default'] model_loaded_neuron = tfn.trace(predict_func, example_inputs2) model_loaded_neuron.save('./model_loaded_neuron_dir') ... How do I deploy my model? ------------------------- Python tensorflow ^^^^^^^^^^^^^^^^^ Pre-compiled models can be saved and reloaded back into a Python environment using regular tensorflow model loading APIs, as long as tensorflow-neuron is installed. .. code:: import tensorflow as tf model = tf.keras.models.load_model('./model_loaded_neuron_dir') example_inputs = ... output = model(example_inputs) tensorflow-serving ^^^^^^^^^^^^^^^^^^ Pre-compiled models can be saved into SavedModel format via tensorflow SavedModel APIs .. code:: import tensorflow as tf import tensorflow.neuron as tfn ... model = tf.keras.Model(inputs=inputs, outputs=outputs) model_neuron = tfn.trace(model, example_inputs) tf.saved_model.save(model_neuron, './model_neuron_dir/1') The generated SavedModel './model_neuron_dir' can be loaded into tensorflow-model-server-neuron, which can be installed through apt or yum based on the type of the operating system. For example, on Ubuntu 18.04 LTS the following command installs and launches a tensorflow-model-server-neuron on a pre-compiled SavedModel. .. code:: sudo apt install tensorflow-model-server-neuron # --model_base_path needs to be an absolute path tensorflow_model_server_neuron --model_base_path=$(pwd)/model_neuron_dir Where can I find tutorials and examples ? ----------------------------------------- :ref:`HuggingFace DistilBERT Tutorial ` is a good place to start. How to debug or profile my model? --------------------------------- :ref:`AWS Neuron TensorBoard integration ` provides visibility into what is happening inside of the Neuron runtime, and allows a more fine-grained (but also more hardware-awared) reasoning on where to improve the performance of machine learning applications. ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.rst ================================================ .. _tensorflow-bert-demo: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 [Broken] Running TensorFlow BERT-Large with AWS Neuron ============================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. This example shows a Neuron compatible BERT-Large implementation that is functionally equivalent to open source BERT-Large model. This demo uses TensorFlow-Neuron, BERT-Large weights fine tuned for MRPC and also shows the performance achieved by the Inf1 instance. For users who want to use public BERT SavedModels please also follow the steps described :ref:`using-public-bert-savedmodels`. Launch EC2 instances -------------------- For this demo, launch two EC2 instances : - a c5.4xlarge instance for compiling the BERT-Large Model and - an inf1.xlarge instance for running inference For both of these instances choose the latest Ubuntu 18 Deep Learning AMI (DLAMI). .. _compiling-neuron-compatible-bert-large: Compiling Neuron compatible BERT-Large -------------------------------------- First connect to a c5.4xlarge instance and update tensorflow-neuron and neuron-cc Update compilation EC2 instance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Update to the latest neuron software by executing the instructions at :ref:`install-neuron-tensorflow`. Note: if your tensorflow-neuron version on the inference instance is lower than 1.15.0.1.0.1333.0, you will need to run this demo on inf1.2xlarge instead of inf1.xlarge. Compile open source BERT-Large saved model using Neuron compatible BERT-Large implementation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Neuron software works with TensorFlow saved models. Users should bring their own BERT-Large saved model for this section. This demo will run inference for the MRPC task and the saved model should be fine tuned for MRPC. Users who need additional help to fine-tune the model for MRPC or to create a saved model can refer to :ref:`bert-tensorflow-demo-appendix1`. In the same environment and directory bert_demo scripts, run the following : .. code:: bash git clone https://github.com/aws/aws-neuron-sdk cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/ export BERT_LARGE_SAVED_MODEL="/path/to/user/bert-large/savedmodel" pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/ pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com python bert_model.py --input_saved_model $BERT_LARGE_SAVED_MODEL --output_saved_model ./bert-saved-model-neuron --batch_size=6 --aggressive_optimizations This compiles BERT-Large pointed to by $BERT_LARGE_SAVED_MODEL for an input size of 128 and batch size of 6. The compilation output is stored in bert-saved-model-neuron. Copy this to your Inf1 instance for inferencing. The bert_model.py script encapsulates all the steps necessary for this process. For details on what is done by bert_model.py please refer to :ref:`bert-tensorflow-demo-appendix2`. Running the inference demo -------------------------- Connect to your inf1.xlarge instance and update tensorflow-neuron, aws-neuron-runtime and aws-neuron-tools. Update inference EC2 instance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Update to the latest neuron software by executing the instructions at :ref:`install-neuron-tensorflow`. Launching the BERT-Large demo server ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copy the compiled model (bert-saved-model-neuron) from your c5.4xlarge to your inf1.xlarge instance. Place the model in the same directory as the bert_demo scripts. Then from the same conda environment launch the BERT-Large demo server : .. code:: bash cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/ pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/ python bert_server.py --dir bert-saved-model-neuron --batch 6 --parallel 4 This loads 4 BERT-Large models, one into each of the 4 NeuronCores found in an inf1.xlarge instance. For each of the 4 models, the BERT-Large demo server opportunistically stitches together asynchronous requests into batch 6 requests. When there are insufficient pending requests, the server creates dummy requests for batching. Wait for the bert_server to finish loading the BERT-Large models to Inferentia memory. When it is ready to accept requests it will print the inferences per second once every second. This reflects the number of real inferences only. Dummy requests created for batching are not credited to inferentia performance. Once the inferences are done you can send a keyboard interrupt to print out the average throughput of your run. Sending requests to server from multiple clients ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Wait until the bert demo server is ready to accept requests. Then on the same inf1.xlarge instance, launch a separate linux terminal. From the bert_demo directory execute the following commands : .. code:: bash source activate aws_neuron_tensorflow_p36 cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/ for i in {1..96}; do python bert_client.py --cycle 128 & done This spins up 96 clients, each of which sends 128 inference requests. Printing latency metrics ~~~~~~~~~~~~~~~~~~~~~~~~ After all your requests have been sent to your server you can run the following command: .. code:: bash python latency_printer.py .. _using-public-bert-savedmodels: Using public BERT SavedModels ----------------------------- We are now providing a compilation script that has better compatibility with various flavors of BERT SavedModels generated from https://github.com/google-research/bert. Here are the current limitations: 1. You did not change `modeling.py `__ 2. BERT SavedModel is generated using ``estimator.export_saved_model`` 3. BERT SavedModel uses fixed sequence length 128 (you may check by ``saved_model_cli show --dir /path/to/user/bert/savedmodel --all``) 4. ``neuron-cc`` version is at least 1.0.12000.0 5. ``aws-neuron-runtime`` version is at least 1.0.7000.0 6. The ``--batch_size`` argument specified in this script is at most 4 Example usage is shown below: .. code:: bash export BERT_LARGE_SAVED_MODEL="/path/to/user/bert-large/savedmodel" cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/ python bert_no_model.py --input_saved_model $BERT_LARGE_SAVED_MODEL --output_saved_model ./bert-saved-model-neuron --batch_size=1 .. _bert-tensorflow-demo-appendix1: Appendix 1 ---------- Users who need help finetuning BERT-Large for MRPC and creating a saved model may follow the instructions here. Connect to the c5.4xlarge compilation EC2 instance you started above and download these three items : 1. clone `this `__ github repo. 2. download GLUE data as described `here `__. Do not run the finetuning command. 3. download a desired pre-trained BERT-Large checkpoint from `here `__. This is the model we will fine tune. Next edit run_classifier.py in the cloned bert repo to apply the patch described in the following git diff. :: diff --git a/run_classifier.py b/run_classifier.py index 817b147..c9426bc 100644 --- a/run_classifier.py +++ b/run_classifier.py @@ -955,6 +955,18 @@ def main(_): drop_remainder=predict_drop_remainder) result = estimator.predict(input_fn=predict_input_fn) + features = { + "input_ids": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='input_ids'), + "input_mask": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='input_mask'), + "segment_ids": tf.placeholder(shape=[None, FLAGS.max_seq_length], dtype=tf.int32, name='segment_ids'), + "label_ids": tf.placeholder(shape=[None], dtype=tf.int32, name='label_ids'), + "is_real_example": tf.placeholder(shape=[None], dtype=tf.int32, name='is_real_example'), + } + serving_input_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(features) + estimator._export_to_tpu = False ## !!important to add this + estimator.export_saved_model( + export_dir_base='./bert_classifier_saved_model', + serving_input_receiver_fn=serving_input_fn) output_predict_file = os.path.join(FLAGS.output_dir, "test_results.tsv") with tf.gfile.GFile(output_predict_file, "w") as writer: NOTE : Users who are interested may refer to this `link `__ for additional background information on the patch but it is not necessary for running this demo. Then from the bert_demo directory run the following : .. code:: bash source activate aws_neuron_tensorflow_p36 cd ~/aws-neuron-sdk/src/examples/tensorflow/bert_demo/ export BERT_REPO_DIR="/path/to/cloned/bert/repo/directory" export GLUE_DIR="/path/to/glue/data/directory" export BERT_BASE_DIR="/path/to/pre-trained/bert-large/checkpoint/directory" ./tune_save.sh The a saved model will be created in $BERT_REPO_DIR/bert-saved-model/*random_number*/. Where, *random_number* is a random number generated for every run. Use this saved model to continue with the rest of the demo. .. _bert-tensorflow-demo-appendix2: Appendix 2 ---------- For all BERT variants, we currently need to augment the standard Neuron compilation process for performance tuning. In the future, we intend to automate this tuning process. This would allow users to use the standard Neuron compilation process, which requires only a one line change in user source code. The standard compilation process is described :ref:`/src/examples/mxnet/resnet50/resnet50.ipynb`. The augmented Neuron compilation process is encapsulated by the bert_model.py script, which performs the following things : 1. Define a Neuron compatible implementation of BERT-Large. For inference, this is functionally equivalent to the open source BERT-Large. The changes needed to create a Neuron compatible BERT-Large implementation is described in :ref:`bert-tensorflow-demo-appendix3`. 2. Extract BERT-Large weights from the open source saved model pointed to by --input_saved_model and associates it with the Neuron compatible model 3. Invoke TensorFlow-Neuron to compile the Neuron compatible model for Inferentia using the newly associated weights 4. Finally, the compiled model is saved into the location given by --output_saved_model .. _bert-tensorflow-demo-appendix3: Appendix 3 ---------- The Neuron compatible implementation of BERT-Large is functionally equivalent to the open source version when used for inference. However, the detailed implementation does differ and here are the list of changes : 1. Data Type Casting : If the original BERT-Large an FP32 model, bert_model.py contains manually defined cast operators to enable mixed-precision. FP16 is used for multi-head attention and fully-connected layers, and fp32 everywhere else. This will be automated in a future release. 2. Remove Unused Operators: A model typically contains training operators that are not used in inference, including a subset of the reshape operators. Those operators do not affect inference functionality and have been removed. 3. Reimplementation of Selected Operators : A number of operators (mainly mask operators), has been reimplemented to bypass a known compiler issue. This will be fixed in a planned future release. 4. Manually Partition Embedding Ops to CPU : The embedding portion of BERT-Large has been partitioned manually to a subgraph that is executed on the host CPU, without noticable performance impact. In near future, we plan to implement this through compiler auto-partitioning without the need for user intervention. ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/glue_mrpc_dev.tsv ================================================ Quality #1 ID #2 ID #1 String #2 String 1 1355540 1355592 He said the foodservice pie business doesn 't fit the company 's long-term growth strategy . " The foodservice pie business does not fit our long-term growth strategy . 0 2029631 2029565 Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war . His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war . 0 487993 487952 The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat . The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent . 1 1989515 1989458 The AFL-CIO is waiting until October to decide if it will endorse a candidate . The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries . 0 1783137 1782659 No dates have been set for the civil or the criminal trial . No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty . 1 3039165 3039036 Wal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed . It has also said it would review all of its domestic employees more than 1 million to ensure they have legal status . 0 1490811 1490840 While dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell . The Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s . 1 426112 426210 This integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET. IBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net. 1 1439663 1439808 The top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 . For residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent . 1 3147370 3147525 The results appear in the January issue of Cancer , an American Cancer Society journal , being published online today . The results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday . 1 3300040 3299992 The delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . Bin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . 0 524136 524119 " Sanitation is poor ... there could be typhoid and cholera , " he said . " Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . " 0 969512 969295 The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . The technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 . 1 1685339 1685429 The only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall . So far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall . 1 1967578 1967664 The decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July . Scotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July . 1 2047034 2046820 Unable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California . The judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California . 1 2046630 2046644 The decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing . The decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget . 0 2221603 2221633 In midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 . The Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 . 1 129995 129864 Morgan Stanley raised its rating on the beverage maker to " overweight " from " equal-weight " saying in part that pricing power with its bottlers should improve in 2004 . Morgan Stanley raised its rating on the company to " overweight " from " equal-weight , " saying the beverage maker 's pricing power with bottlers should improve in 2004 . 0 919683 919782 The pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 . The British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 . 0 970740 971209 Friday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 . Stanford ( 46-15 ) has a team full of such players this season . 1 2745055 2745022 Last month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion . At the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion . 0 2199097 2199072 The driver , Eugene Rogers , helped to remove children from the bus , Wood said . At the accident scene , the driver was " covered in blood " but helped to remove children , Wood said . 1 1609290 1609098 ONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader . Tens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader . 1 1597193 1597119 Saddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers . Hussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers . 1 2758944 2758975 Its closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean . Its closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean . 0 2584416 2584653 Cooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo . Lee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad . 1 86007 86373 " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " " Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , " Graham said . 1 1602860 1602844 He said they lied on a sworn affidavit that requires them to list prior marriages . Morgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages . 1 1201306 1201329 The association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes . The Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes . 0 461779 461815 With these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 . Funny Cide is looking to become horse racing 's first Triple Crown winner in a generation . 1 1438666 1438643 Intel was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel , " spokesman Chuck Mulloy said . Intel spokesman Chuck Mulloy said the company was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel . " 1 3261484 3261306 Mr Annan also warned the US should not use the war on terror as an excuse to suppress " long-cherished freedoms " . Annan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress " long-cherished " freedoms . 1 1277539 1277527 At community colleges , tuition will jump to $ 2,800 from $ 2,500 . Community college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent . 1 3035788 3035918 He made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol . Though Dean made a point of saying during the debate that the Confederate flag is a racist symbol . 0 132553 132725 Bush wanted " to see an aircraft landing the same way that the pilots saw an aircraft landing , " White House press secretary Ari Fleischer said yesterday . On Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing . 0 2259788 2259747 On Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office . Palestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise . 0 2307064 2307235 The civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 . The civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health . 1 3046488 3046824 Per-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning . Workplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 . 1 86020 86007 " Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , " Mr. Graham said . " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " 0 1100998 1100441 SARS has killed about 800 people and affected more than 8400 since being detected in China in November . SARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia . 1 2268396 2268480 Authorities had no evidence to suggest the two incidents were connected . There was no immediate evidence that the two incidents were connected , police said . 0 1984039 1983986 " Jeremy 's a good guy , " Barber said , adding : " Jeremy is living the dream life of the New York athlete . He also said Shockey is " living the dream life of a New York athlete . 0 2697659 2697747 Ratliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death . Peterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial . 0 2175939 2176090 After losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 . In midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 . 1 886618 886456 Rumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals . Rumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals . 1 588637 588864 Consumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 . Consumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April . 0 2252795 2252970 He has no immediate plans for television advertising , believing it is unnecessary this early . A Lieberman aide said there were no immediate plans for television advertising . 1 1756329 1756394 " I think it happened very quickly , " Houston Police Department homicide investigator Phil Yochum said of the crime . " I think it happened very quickly , " said Investigator Phil Yochum of the Houston Police Department 's homicide division . 1 1673112 1673068 United issued a statement saying it will " work professionally and cooperatively with all its unions . " Senior vice president Sara Fields said the airline " will work professionally and cooperatively with all our unions . " 1 2357324 2357271 " But they never climb out of the pot of beer again . " It 's just that they never climb out of the beer again . " 1 780408 780363 Chief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected . Bryant has said that hike had a greater effect on demand than officials expected . 1 821523 821385 Robert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD . NCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection . 1 2304696 2304863 HP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell . HPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell . 1 2531749 2531607 Chirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family . Chirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family . 1 3180014 3179967 The charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country . The government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries . 1 726966 726945 In the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points . It has a margin of error of plus or minus three to four percentage points . 1 2638861 2638982 Mr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities . Clinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities . 1 2495223 2495307 " This decision is clearly incorrect , " FTC Chairman Timothy Muris said in a written statement . The decision is " clearly incorrect , " FTC Chairman Tim Muris said . 1 55187 54831 Prosecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building . Prosecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building . 0 2763381 2763517 Terri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years . Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . 1 1990975 1991132 Secretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday . U.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens . 1 2204353 2204418 " Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . " " Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . " 1 60122 60445 That would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . The inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . 1 961836 962243 PeopleSoft also said its board had officially rejected Oracle 's offer . Thursday morning , PeopleSoft 's board rejected the Oracle takeover offer . 0 3140260 3140288 The Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday . The Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 . 1 1720166 1720115 Cortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest . Cortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest . 1 2573262 2573319 " The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , " Mr Howard said . " The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . " 0 1353356 1353174 " Biotech products , if anything , may be safer than conventional products because of all the testing , " Fraley said , adding that 18 countries have adopted biotechnology . " Biotech products , if anything , may be safer than conventional products because of all the testing , " said Robert Fraley , Monsanto 's executive vice president . 1 2738677 2738741 The rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study . The study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s . 1 1638813 1639087 We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , " Rumsfeld said . Rather , the US acted because the administration saw " existing evidence in a new light , through the prism of our experience on September 11 " . 1 1605350 1605425 Trans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat . Trans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat . 1 2494149 2494073 However , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market . A 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market . 1 3023029 3023229 Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . Peterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner . 1 1351550 1351155 Carlson on Tuesday said he would not recuse himself from the case . Service officials said Carlson refused to recuse himself from the case . 1 981185 981234 The program will grow to include ports in Dubai , Turkey and Malaysia , among others . The program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said . 0 2111629 2111786 McCabe said he was considered a witness , not a suspect . " He is not considered a suspect , " McCabe said . 1 655498 655391 The woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health . The woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health . 1 533823 533909 He added that those " are not solely American principles , nor are they exclusively Western . " " These are not solely American principles nor are they exclusively Western , " Rumsfeld said . 1 581592 581570 " If we don 't march into Tehran , I think we will be in pretty good shape , " he said . " As long as we don 't march on Tehran , I think we are going to be in pretty good shape , " he said . 0 1010655 1010430 On Saturday , a 149mph serve against Agassi equalled Rusedski 's world record . On Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi . 1 2241925 2242066 Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently . Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate . 1 2796978 2797024 " APEC leaders are painfully aware that security and prosperity are inseparable , " Thai Prime Minister Thaksin Shinawatra told business leaders . " APEC leaders are painfully aware that security and prosperity are inseparable , " Thaksin said . 0 101746 101775 Danbury prosecutor Warren Murray could not be reached for comment Monday . Prosecutors could not be reached for comment after the legal papers were obtained late Monday afternoon . 1 327839 327748 Wittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business . Wittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company . 0 2988297 2988555 Shattered Glass , " starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters . " Shattered Glass " _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters . 1 2217613 2217659 He was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston . He was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife . 0 2128530 2128455 However , EPA officials would not confirm the 20 percent figure . Only in the past few weeks have officials settled on the 20 percent figure . 1 2208376 2208198 University of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , " Our fundamental values haven 't changed . " Our fundamental values haven 't changed , " Mary Sue Coleman , president of the university , said in a statement in Ann Arbor . 1 1980654 1980641 The first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs . The first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs . 0 589579 589557 However , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co . Lapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda . 1 1636060 1635946 Michel , who remains in the government , denied that US pressure had provoked the government 's move . Michel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move . 1 1630585 1630657 Some of the computers also are used to send spam e-mail messages to drum up traffic to the sites . Some are also used to send spam e-mail messages to boost traffic to the sites . 0 447728 447699 Indonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations . Indonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied . 1 1606495 1606619 Bush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease . President Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic . 1 1550897 1550977 Later this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions . This fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence . 0 490376 490490 The reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday . Stocks slipped at the open after the euro hit record highs against the dollar . 1 3084554 3084612 Sales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros . Sales rose 37 per cent year-on-year to 1.76bn , beating expectations . 1 315647 315778 If the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back . If the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change . 1 3428298 3428362 Robert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus . Walsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night . 1 2523564 2523358 The Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box . The µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box . 1 2079200 2079131 U.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America . U.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America . 1 818091 817811 The company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results . The company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July . 1 1580638 1580663 " I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . " I stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , " Blair said . 0 1919740 1919926 " I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , " Parrish said . " I don 't know whether the person I 'm talking to now may end up being someone else , " Parrish said . 1 2748287 2748550 " I think it 's going to be a close vote , but I think the grant proposal is going to win , " McConnell said . " I think it 's going to be a close vote , but I think the grant proposal 's going to win , " said Sen. Mitch McConnell , assistant majority leader . 1 3394891 3394775 Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins . Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through . 0 2963943 2963880 One , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday . Her 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition . 0 1865364 1865251 The United States finally relented during President Bush 's visit to Africa earlier this month . During President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase . 1 263690 263819 " There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , " he said . He also said there is no conscious policy by the United States to move the value of the dollar . 1 283751 283290 It 's the first such drill since the September 11 terrorist attacks on New York and Washington . It is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks . 1 2517014 2516995 Myanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said . Myanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday . 1 1330643 1330622 According to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . The Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . 1 3111452 3111428 In an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites . In an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages . 0 1167835 1167651 Kansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year . Statistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year . 0 1423836 1423708 A European Union spokesman said the Commission was consulting EU member states " with a view to taking appropriate action if necessary " on the matter . Laos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter . 1 2090911 2091154 Waiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades . But waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades . 1 2265271 2265152 Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States . Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market . 1 3062202 3062308 By skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is " less predictable " than for those obtained in the United States . By skirting the FDA 's oversight , Eagan said the quality of the imported drugs is " less predictable " than U.S. drugs . 1 2155514 2155377 He said : " For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . " " For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , " Dyke said . 1 1552068 1551928 Three such vigilante-style attacks forced the hacker organizer , who identified himself only as " Eleonora [ 67 ] , " to extend the contest until 7 p.m. EST Sunday . Three such vigilante-style attacks forced the hacker organiser , who identified himself only as " Eleonora67 ] , " to extend the contest until 8am ( AEST ) today . 1 936978 937500 Eric Gagne pitched a perfect ninth for his 23rd save in as many opportunities . Gagne struck out two in a perfect ninth inning for his 23rd save . 0 985015 984975 One way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday . Just about everything about " Harry Potter and the Order of the Phoenix " will set records . 1 1430357 1430425 " Allison just proves you don 't need to wait until August or September to have a disaster , " said Josh Lichter , a meteorologist with the Houston-Galveston weather office . " Allison just proves you don 't need to wait until August or September to have a disaster , " Lichter said . 1 3039310 3039413 Today , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks . On Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 . 1 34513 34742 Police say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States . Mr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US . 1 368067 368018 Chiron already has nearly 20 percent acceptances from PowderJect 's shareholders . Chiron has acceptances from holders of nearly 20 percent of PowderJect shares . 0 611663 611716 Ernst & Young has denied any wrongdoing and plans to fight the allegations . Ernst & Young has denied the SEC 's claims , and called its recommendations " irresponsible " . 1 98432 98657 The attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence . The attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence . 1 3039007 3038845 No company employee has received an individual target letter at this time . She said no company official had received " an individual target letter at this time . " 1 1708040 1708062 Second-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share . The second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share . 0 1757264 1757375 He allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement . The two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement . 1 383417 383558 Worldwide , more than 50 million people have seen " Les Miz , " with gross receipts of $ 1.8 billion . Worldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion . 0 2766112 2766084 In fiction : Edward P. Jones ( " The Known World " ) and Scott Spencer ( " A Ship Made of Paper " ) . The fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper . 1 1261116 1261234 " Overwhelmingly the Windows brand really resonated with them . " " Windows was the part of the experience that really resonated with people . " 1 3028143 3028234 The Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes . The Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year . 0 249699 249623 Vivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing . During difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing . 0 3448488 3448449 The Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months . The Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 . 1 2749322 2749663 The Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission . The Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission . 0 2204592 2204588 Sun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition . The vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) . 1 2889005 2888954 Prosecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings . Prosecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings . 0 1657632 1657619 The Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today . Goodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment . 0 555617 555528 The 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 2396937 2396818 " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the Fed said in a statement accompanying the unanimous decision . " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the policy-setting Federal Open Market Committee said . 0 2339738 2339771 " It is bad for Symbian , " said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein . " Motorola has displayed clear disloyalty " to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London . 0 1616174 1616206 Bob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling . Bob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment . 1 635783 635802 But Ms Ward said the headroom under its financial covenants was " tight " and that there could be another downgrade if Southcorp breached any of its banking covenants . But Ms Ward said the headroom under its financial covenants was " tight " and that there could be a rating downgrade if Southcorp did breach any banking covenants . 1 3444633 3444733 He added : ``I 've never heard of more reprehensiblebehaviour by a doctor . The Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor . 1 555553 555528 Broomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 1112021 1111925 Other staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue . Some E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue . 0 2749410 2749625 President Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday . President Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday . 1 1629064 1629043 An episode is declared when the ozone reaches .20 parts per million parts of air for one hour . A Stage 1 episode is declared when ozone levels reach 0.20 parts per million . 1 789691 789665 " He may not have been there , " the defence official said on Thursday . " He may not have been there , " said a defence official speaking on condition of anonymity . 1 844421 844679 The U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence . The troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence . 1 58540 58567 North American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight . North American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight . 1 781439 781461 Xerox itself paid a $ 10 million fine last year to settle similar SEC charges . Xerox itself previously paid a $ 10-million penalty to settle the SEC accusations . 1 1909579 1909408 " This deal makes sense for both companies , " said National Chief Executive Brian Halla . " This deal makes sense for both companies , " Halla said in a prepared statement . 0 787432 787464 The blasts killed two people and injured more than 150 others . The Atlanta Olympic Games attack killed one woman and injured more than 100 other people . 0 52758 52343 Morrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service . At the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her . 1 1675025 1675047 Spansion products are to be available from both AMD and Fujitsu , AMD said . Spansion Flash memory solutions are available worldwide from AMD and Fujitsu . 1 2131318 2131372 About 1,500 police will be deployed for the visit . Around 1,500 police are to be deployed at Niigata for the ferry 's visit . 1 325763 325928 Gamarekian told The News she remembers only the woman 's first name - and refused to reveal it . She told the New York Daily News she remembers only the intern 's first name , which she refused to reveal . 1 2638975 2638855 One of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding . One of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding . 1 2198694 2198937 A nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year . A nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 . 1 1825432 1825301 A man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday . The Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said . 1 2906104 2906322 They were being held Sunday in the Camden County Jail on $ 100,000 bail . They remained in Camden County Jail on Sunday on $ 100,000 bail . 1 722278 722383 Ms Stewart , the chief executive , was not expected to attend . Ms Stewart , 61 , its chief executive officer and chairwoman , did not attend . 0 101747 101777 Christina 's aunt , Shelley Riling , said the defense 's claims were preposterous . Christina 's aunt , Shelley Riling , said she will address the court . 1 2224884 2224819 The Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights . The Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights . 0 977938 978162 Lord Falconer hailed the changes as " a new beginning as far as the courts , Crown Prosecution Service and police are concerned " . " It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . " 0 1015010 1014963 GE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 1513190 1513246 At least 27 US troops have been killed in hostile fire since Bush 's statement . At least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 . 1 2385348 2385394 A recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday . A recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday . 1 2317018 2317252 November 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 . November 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 . 0 1831696 1831660 The agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies . The agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify . 1 1528383 1528083 Zulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards . Witness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards . 1 917965 918315 For the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase . For the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral . 0 3218713 3218830 Q : Can I buy coverage for prescription drugs right away ? Congress has added a new benefit - an option to buy insurance coverage for prescription drugs . 1 221079 221003 The airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers . The airline has the option to buy 380 more , split evenly between the two manufacturers . 1 2546175 2546198 Dr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions . Dr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function . 0 799346 799268 The chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion . The chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue . 0 2673104 2673130 All patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea . Symptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping . 1 1354501 1354476 Federal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc . Federal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream . 1 3070979 3070949 Environmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK . Environmental campaigners used the eclipse to highlight the surge in light pollution across Britain . 0 1264509 1264471 Available July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems . The OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers . 1 103280 103431 Justice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use . Justice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot . 0 110731 110648 But Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic . Billups scored 77 points in the final two games of the first-round series against the Magic . 1 2274844 2274714 Kelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war . He killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq . 0 1050307 1050144 And it 's going to be a wild ride , " said Allan Hoffenblum , a Republican consultant . Now the rest is just mechanical , " said Allan Hoffenblum , a Republican consultant . 1 2810634 2810670 While the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each . Instead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each . 1 3073773 3073779 Lay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination . Lay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination . 0 261202 260995 The WHO experts didn 't say how many cases in Hebei were in rural areas . Hebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas . 1 1824224 1824209 Nearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours . Mutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired . 1 548867 548785 In three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th . In the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list . 0 2796658 2796682 About two hours later , his body , wrapped in a blanket , was found dumped a few blocks away . Then his body was dumped a few blocks away , found in a driveway on Argyle Road . 1 1808166 1808434 Columbia broke up over Texas upon re-entry on Feb. 1 . Columbia broke apart in the skies above Texas on Feb. 1 . 1 853475 853342 A year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs . Within two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs . 0 977772 977804 The Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign . Falconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign . 1 577854 578500 Cindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents . She started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents . 1 2829194 2829229 The two are not related , but have referred to each other as father and son . He 's not related to Malvo , but the two have referred to each other as father and son . 1 2074182 2074668 Gibson said last month in a press statement that " neither I nor my film are anti-Semitic . Gibson said in a June statement that he and his film are not anti-Semitic . 0 2758265 2758282 The world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates . The world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them . 1 1958079 1958143 The Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data . The blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 . 1 544217 544325 The vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council . The vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council . 1 2385288 2385256 Large swells and dangerous surf already were being felt along sections of the coast . Already large swells and dangerous surf have arrived along the mid-Atlantic . 0 2324708 2325028 Based on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent . Labor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent . 1 2139506 2139427 " We will work with the board to ensure a smooth transition . " He said federal regulators would work with the corporation to ensure a " smooth transition . " 1 2965576 2965701 Gasps could be heard in the courtroom when the photo was displayed . Gasps could be heard as the photo was projected onto the screen . 1 2931098 2931144 Gilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter . Quarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said . 0 644788 644816 " I had one bad stretch of holes that put me out of contention to win , " Woods said . " I had one bad stretch of holes that put me out of contention , " Woods said , referring to his 42 on the front nine Saturday . 0 2551891 2551563 The poll had a margin of error of plus or minus 2 percentage points . It had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday . 1 1089053 1089297 Sen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic . Sen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered . 1 3435735 3435717 The broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 . The Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 . 0 1954 2142 Watertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country . Along with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday . 1 3400796 3400822 That is evident from their failure , three times in a row , to get a big enough turnout to elect a president . Three times in a row , they failed to get a big _ enough turnout to elect a president . 1 1220668 1220801 We firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . " We firmly believe that we have an absolute right to use the common word ' spike ' to name our network . 1 1889954 1889847 Sources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE . Late last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets . 1 315785 315653 But MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found . MTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said . 0 1521034 1520582 White , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman . White , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke . 1 2083598 2083810 About 10 percent of high school and 16 percent of elementary students must be proficient at math . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 1910610 1910455 The legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company . The legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company . 1 3113791 3113782 The European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached . The European Commission is expected to issue its decision in the case next spring — unless a settlement is reached . 1 3214517 3214483 " So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , " she told jurors . " Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , " Ms. Richardson declared . 0 2083612 2083810 Twenty percent of Latino students and 23 percent of black students performed at proficient or higher . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 661390 661218 He is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama . He is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama . 1 1269572 1269682 The men were remanded in custody and are due to appear again before court on July 8 . They were remanded in custody and will appear in court again on July 8 . 1 1095780 1095652 " No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , " Wheeler said in a statement . No matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday . 1 116294 116332 The Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 . The Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 . 1 941617 941673 He said his hatred for such people grew from these discussions and had helped convince him violence was the answer . His hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea . 1 2640607 2640576 " There is no need for one deadline for all to create the ASEAN Economic Community , " Thaksin said . Thus , he said , there did not have to one deadline to create the economic community . 1 3310210 3310286 The announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said . The broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said . 1 3376093 3376101 The additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes . The donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 . 1 1549586 1549609 Leon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville . The dead man , Leon Williams , was found in his third-floor apartment . 1 460211 460445 The player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said . He failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said . 1 1196962 1197061 But Virgin wants to operate Concorde on routes to New York , Barbados and Dubai . Branson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai . 0 862804 862715 He tried to fight off officers and was taken to a hospital after a police dog bit him but was later released . Cruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said . 1 1726935 1726879 The announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs . Economists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs . 0 331980 332110 Asked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission . Asked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : " Of course they may not go . 1 173879 173832 Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar . Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates . 0 2834988 2835026 Iran has until the end of the month to satisfy the agency it has no plans for nuclear weapons . The Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities . 1 2587300 2587243 Her father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will . Her father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will . 0 554905 554627 Claire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee . One by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee . 1 1912524 1912648 Citigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group . Citigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business . 1 3255597 3255668 " They 've been in the stores for over six weeks , " says Carney . The quarterlies usually stay in stores for between six to eight weeks , " Carney added . 1 629316 629289 Let me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community . " The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , " he said . 1 54181 53570 Ridge said no actual explosives or other harmful substances will be used . Ridge said no real explosives or harmful devices will be used in the exercise . 1 723557 724115 Thus far , Stewart 's company appears ready to stand behind her . For now , the company 's management appears to be standing behind Stewart . 0 2607718 2607708 But late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement . But late yesterday , the campaign and the state Democratic Party said there would be no news conference . 1 753858 753890 There 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 587009 586969 Another $ 100-million in savings will come from management layoffs and pay cuts . The airline expects to save another $ 100-million a year through management layoffs and pay cuts . 1 308567 308525 He called on Prime Minister John Howard to establish a royal commission on child sex abuse . The Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse . 0 665419 665612 " We think that the United States of America should support the free speech of all groups , " Mr. White said , objecting to Mr. Olson 's recommendation . We think that the United States of America should support the free speech of all groups , he said . 1 2763517 2763576 Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . The tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years . 0 3107118 3107136 After 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries . After 18 months , the atorvastatin patients had no change in the plaque in their arteries . 1 780604 780466 Toll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail . Toll last week offered to buy the company for NZ75c a share , or $ NZ158 million . 0 1989213 1989116 " This child was literally neglected to death , " Armstrong County District Attorney Scott Andreassi said . Armstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen . 1 1462409 1462504 Wal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday . Wal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday . 1 260952 260924 Metro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported . Subway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said . 1 1224743 1225510 In the undergraduate case , Rehnquist said the use of race was not " narrowly tailored " to achieve the university 's asserted interest in diversity . Rehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity . 0 3329379 3329416 SP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) . The firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) . 1 2362761 2362698 A landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said . In central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said . 0 1465073 1464854 They will help draft a plan to attack obesity that Kraft will implement over three to four years . The team will help draft a plan by the end of the year to attack obesity . 1 195728 196099 But that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion . Such an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion . 1 2587767 2587673 In the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs . In Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs . 0 1490044 1489975 Corixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market . Shares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 . 1 958161 957782 Committee approval , expected today , would set the stage for debate on the Senate floor beginning Monday . That would clear the way for debate in the full Senate beginning on Monday . 1 1033204 1033365 O 'Brien was charged with leaving the scene of a fatal accident , a felony . Bishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident . 0 2996241 2996734 Tom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning . Bethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday . 0 2015389 2015410 The Calgary woman , who is in her twenties , donated blood on Aug. 7 . The woman -- who has no symptoms of illness -- donated blood Aug. 7 . 1 221515 221509 Quattrone lawyer John W. Keker said his client is innocent . In a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent . 0 2283737 2283794 In the weeks leading up to the execution , several Florida officials received anonymous threatening letters . Several Florida officials connected to the case have received threatening letters , accompanied by rifle bullets . 1 2826681 2826474 The disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday . The fight over online music sales was disclosed in documents made available Monday by the court . 1 2249237 2249305 Parson was charged with intentionally causing and attempting to cause damage to protected computers . Parson is charged with one count of intentionally causing damage to a protected computer . 1 389239 389299 " The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , " the judge said . " The court and the public need to know more of the defendants ' seemingly massive fraud , " he said . 1 2652187 2652218 The U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users . The high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users . 1 2945693 2945847 The IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts . The IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts . 1 2065523 2065836 " More than 70,000 men and women from bases in Southern California were deployed in Iraq . In all , more than 70,000 troops based in Southern California were deployed to Iraq . 1 2222998 2223097 BP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange . BP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange . 1 2561999 2561941 Because of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 . Including the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 . 0 2324704 2325023 Friday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate . U.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery . 1 2336453 2336545 Federal Emergency Management Administration designated $ 20 million to establish the registry . The registry was launched with $ 20 million from the Federal Emergency Management Agency . 1 720572 720486 BREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday . Cases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time . 1 1605818 1605806 " It was never our intention to sell the product , " said Health Minister Anne McClellan , a skeptic of medical marijuana use . " It was never the intention of us to sell product , " federal Health Minister Anne McLellan said yesterday in Edmonton . 0 2440680 2440474 GM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses . They cover more than 300,000 UAW workers and 500,000 retirees and spouses . 0 726399 726078 Rosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , " Breyer said to tumultuous cheers in the courtroom . " Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . " 1 533903 533818 " We are committed to helping the Iraqi people get on the path to a free society , " Rumsfeld said in a speech to the Council on Foreign Relations . " We are committed to helping the Iraqi people get on the path to a free society , " he said . 1 1166473 1166857 Mr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money . Young said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money . 1 144089 143697 The 12-nation currency has risen by 33 percent against the dollar over the past 15 months . The euro is up 9 percent against the dollar in the past six weeks . 1 3439854 3439874 In February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing . The officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges . 1 3464314 3464302 I was surprised it turned out me talking and the president just listening . " I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . " 1 2008984 2009175 The state 's House delegation currently consists of 17 Democrats and 15 Republicans . Democrats hold a 17-15 edge in the state 's U.S. House delegation . 0 816867 816831 Freddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board . He replaces Leland Brendsel , 61 , who retired as chairman and chief executive . 1 192285 192327 We 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting . " We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . " 1 2688145 2688162 In that position , Elias will report to Joe Tucci , president and CEO of EMC . As executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive . 1 3294207 3294290 But with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made . But with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made . 0 205100 205145 A pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote . Miodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent . 0 3242051 3241897 Mr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board . Kerkorian and Tracinda had also tried to take over Chrysler in 1995 . 0 1076861 1077018 Glover spoke at a news conference that included about 20 relatives of the victims . About 20 family members of the victims were invited to the news conference . 1 2095803 2095786 Drax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe . Drax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe . 1 2112330 2112376 But I would rather be talking about high standards than low standards . " " I would rather be talking about positive numbers rather than negative . 1 3389318 3389271 It was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew . It was still not known exactly how many people were on the plane , which could carry 141 passengers and crew . 1 698948 698933 The market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March . The market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March . 1 539585 539355 Witnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew . Witnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew . 1 684848 684557 As Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times . As he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted " Takbir ! " , or " Proclaim ! " , a religious rallying cry . 1 347017 347002 In hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet . In hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty . 1 1592037 1592076 In a statement , Lee said he " no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . " Spike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture " Spike TV , " according to a statement read in court Tuesday . 0 3013483 3013540 Singapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries . HAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia . 1 2020252 2020081 The worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about . The worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July . 0 2614947 2614904 The premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 . The premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 . 0 1744257 1744378 In the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion . In the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share . 0 1119721 1119714 Sony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning . Its capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning . 1 1186754 1187056 Amazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history . Amazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history . 1 2842562 2842582 The show 's closure affected third-quarter earnings per share by a penny . The company said this impacted earnings by a penny a share . 0 431076 431242 After the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances . The committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd . 1 1393764 1393984 It 's been a busy couple of days for security gurus assigned to keep their companies safe and sound . It 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound . 0 2916199 2916164 Lu reclined in a soft chair wearing a woolly coat near the blackened capsule . " It 's great to be back home , " said Lu , dressed in a woolly coat near the blackened capsule . 1 2530671 2530542 Gov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 . After Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs . 1 219064 218969 " It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , " he said . " It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , " Parsons told reporters at NASA headquarters . 0 2377289 2377259 Estonia 's place in the European mainstream and safeguard its independence regained in 1991 . Estonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 . 0 2110220 2110199 Franklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center . A county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center . 0 1864253 1863810 Police suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs . Nobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs . 0 3150803 3150839 During this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations . During the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states . 0 969381 969512 The technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 . The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . 1 271891 271839 Sony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots . It also features a 4.5 in back-lit LCD screen and memory expansion facilities . 0 2829648 2829613 Clinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill . Two Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans . 1 886904 887158 Some of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit . Some of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit . 0 2632692 2632767 Wal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County . At least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County . 1 2240399 2240149 Cintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process . Cintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process . 1 805457 805985 The opposition would resort to rolling mass action " at strategic times of our choice and without warning to the dictatorship , " he said . " From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , " he said . 1 2896308 2896334 Federal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 . He said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 . 1 2110775 2110924 Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering . Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario . 1 1762569 1762526 Hester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers . Hester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers . 0 2706154 2706185 The other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said . After the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said . 1 1057995 1057778 The hearing , expected to last a week , will determine whether Akbar faces a court-martial . The purpose of the hearing is to determine whether Akbar should be court-martialled . 1 1386884 1386857 He said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed . He said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed . 1 3093023 3092996 Speaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried . Brigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney . 1 1661381 1661317 " Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . " Close cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said . 0 2926039 2925982 The mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks . The parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month . 0 637168 637447 We strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community . McBride characterized Novell 's move as " a desperate measure to curry favor with the Linux community . " 1 696677 696932 After more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday . After more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion . 1 3122429 3122305 Mr Russell , 46 , a coal miner from Brisbane , said : " They are obviously hurting , so we are basically going over there to help them . " " They are obviously hurting so we are basically going over there to help them , " Russell , 46 , said . 1 1348909 1348954 The New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years . The former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on . 0 162203 162101 It does not affect the current Windows Media Player 9.0 Series . Windows Media Player has had security problems before . 0 71501 71627 The seizure took place at 4 a.m. on March 18 , just hours before the first American air assault . The time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital . 1 2907762 2907649 Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively . Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent . 1 2167771 2167744 In May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown . Last May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood . 1 3320577 3320553 " I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , " he said . " If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . " 1 849291 849442 IBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets . IBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets . 0 763948 763991 Costa 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final . Costa will play Juan Carlos Ferrero next in a rematch of last year 's final . 1 1908763 1908744 A former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year . A former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics . 0 1876120 1876059 Thyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat . Thyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too . 1 518089 518133 Judge Craig Doran said it wasn 't his role to determine if Hovan was " an evil man " but maintained that " he has committed an evil act . " Judge Craig Doran said he couldn 't determine if Hovan was " an evil man " but said he " has committed an evil act . " 0 224932 224868 The Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange . Shares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading . 1 1771131 1771091 It also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip . The S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip . 0 2728425 2728251 It decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency . It decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status . 0 953733 953537 Altria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser . Its shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser . 1 349215 349241 It will be followed in November by a third movie , " The Matrix Revolutions . " The film is the second of a trilogy , which will wrap up in November with " The Matrix Revolutions . " 1 2919853 2919804 Massachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading . State and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal . 1 954526 954607 He is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise . He is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard . 1 69773 69792 Cisco pared spending to compensate for sluggish sales . In response to sluggish sales , Cisco pared spending . 0 2823575 2823513 The study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said . The study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research . 1 2455942 2455978 My decision today is not based on any one event . " Governor Rowland said his decision was " not based on any one event . " 1 131979 131957 Nelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death . Nelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death . 0 2010705 2010779 " The government elements who have been causing trouble are still in place . The government elements who have been causing trouble are still in place , they are attacking us . " 1 54142 53641 Next Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . Around the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . 1 1015249 1015204 Wal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations . Wal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations . 0 753928 753890 The patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 3022833 3023029 Peterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying . Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . 0 751520 751373 SPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems . The DirectBand network was developed with the assistance of SCA Data Systems . 0 218848 218851 He replaces Ron Dittemore , who announced his resignation in April . Dittemore announced his plans to resign on April 23 . 1 3181118 3181443 Detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended . Shortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development . 1 515581 515752 They were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches . He said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches . 1 347022 347003 Taiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket . Taiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April . 1 3311600 3311633 Mr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty . Rowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan . 0 3439114 3439084 Ross Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue . Ross Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal . 0 487951 488007 The euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session . The euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session . 0 314997 315030 On the stand Wednesday , she said she was referring only to the kissing . On the stand Wednesday , she testified that she was referring to the kissing before the alleged rape . 0 4733 4557 Garner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader . The group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader . 1 2820371 2820525 Blair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union . Blair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week . 1 801552 801516 " There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , " Baker said . " There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills . 1 1704987 1705268 Charles O. Prince , 53 , was named as Mr. Weill 's successor . Mr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor . 1 396041 396188 Officials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . Canadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . 0 1014983 1014963 GE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 2320654 2320666 The Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . The Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . 1 1057876 1057778 The hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding . The purpose of the hearing is to determine whether Akbar should be court-martialled . 0 2116843 2116883 In the United States , heart attacks kill about 460,000 year , in Canada about 80,000 . In the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health . 1 1461629 1461781 Ninety-five percent of international cargo to the United States is carried by ship . Ships carry 95 percent of international cargo to the United States . 0 374015 374162 " It 's a major victory for Maine , and it 's a major victory for other states . The Maine program could be a model for other states . 1 2493369 2493428 News that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street . News that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday . 1 490355 490378 They note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher . After several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery . 1 2691044 2691264 Most economists had expected a more dire report , with many anticipating the fifth month of job losses in six months . Most economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September . 1 1831453 1831491 But software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million . License sales , a key measure of demand , fell 21 percent to $ 107.6 million . 1 2380695 2380822 King , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters . Stephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation . 1 2577517 2577531 The Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission . The natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC . 1 3267026 3266930 The steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned . The U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned . 1 360875 360943 Business Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday . BusinessWeek Online has learned that the settlement could come as early as Monday , May 19 . 1 162632 162653 Only one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site . Only one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site . 1 1128884 1128865 Shares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 . Since the initial takeover offer , Salix shares have risen about 35 percent . 1 3264732 3264648 The jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself . The quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself . 1 1721433 1721267 It 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season . It 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season . 0 146112 146127 The broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 . The technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 . 1 389117 389052 The company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States . McDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States . 1 872784 872834 Gregory Parseghian , a former investment banker , was appointed chief executive . Greg Parseghian was appointed the new chief executive . 0 2977500 2977547 Their contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 . " It has outraged the membership , " said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 . 1 3107137 3107119 But plaque volume increased by 2.7 percent in pravastatin patients . The volume of plaque in Pravachol patients ' arteries rose by 3 % . 1 1619244 1619274 Today in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores . Tomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores . 0 3061836 3062031 The S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points . On the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points . 1 485999 486011 Ex-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' . In Soviet times the Beatles ' music " was considered propaganda of an alien ideology . ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/mrpc.proto ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ syntax = "proto3"; package mrpc; service mrpc { rpc paraphrase (TextPair) returns (YesNo) {} } message TextPair { bytes text_a = 1; bytes text_b = 2; } message YesNo { bytes message = 1; bytes prediction = 2; } ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/index.rst ================================================ .. _tensorflow-tutorials: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow Tutorials ==================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. Before running a tutorial ------------------------- You will run the tutorials on an inf1.6xlarge instance running Deep Learning AMI (DLAMI) to enable both compilation and deployment (inference) on the same instance. In a production environment we encourage you to try different instance sizes to optimize to your specific deployment needs. Follow instructions at :ref:`tensorflow-tutorial-setup` before running a TensorFlow tutorial on Inferentia. We recommend new users start with the ResNet-50 tutorial. .. toctree:: :hidden: /archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup .. _tensorflow-nlp: Natural Language Processing --------------------------- * Tensorflow 2.x - HuggingFace DistilBERT with Tensorflow2 Neuron :ref:`[html] ` :github:`[notebook] ` .. toctree:: :hidden: /archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo /src/examples/tensorflow/huggingface_bert/huggingface_bert .. _tensorflow-utilize-neuron: Utilizing Neuron Capabilities ----------------------------- * Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] ` .. toctree:: :hidden: /src/examples/tensorflow/tensorflow_serving_tutorial.rst ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/k8s_bert_demo/Dockerfile.tfserving_example ================================================ From ubuntu:16.04 RUN apt-get update RUN apt-get install -y wget apt-transport-https ca-certificates awscli RUN echo "deb https://apt.repos.neuron.amazonaws.com xenial main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - RUN apt-get update RUN apt-get install -y tensorflow-model-server-neuron ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.rst ================================================ .. _tensorflow-tutorial-setup: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow Tutorial Setup ========================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. #. Launch an Inf1.6xlarge Instance: .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst #. Set up a development environment: * Enable or install TensorFlow-Neuron: :ref:`install-neuron-tensorflow`. #. Run tutorial in Jupyter notebook: * Follow instruction at :ref:`Setup Jupyter notebook ` to: #. Start the Jupyter Notebook on the instance #. Run the Jupyter Notebook from your local browser * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory: .. code:: git clone https://github.com/aws/aws-neuron-sdk.git cd aws-neuron-sdk/src/examples/tensorflow * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/tensorflow`` * From your local browser, open the tutorial notebook from the menu and follow the instructions. ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tutorials (``tensorflow-neuron``) =================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Natural Language Processing (NLP) Tutorials Utilizing Neuron Capabilities Tutorials .. include:: /archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt ================================================ .. tab-set:: .. tab-item:: Natural Language Processing (NLP) Tutorials * Tensorflow 2.x - HuggingFace Pipelines distilBERT with Tensorflow2 Neuron :ref:`[html] ` :github:`[notebook] ` .. tab-item:: Utilizing Neuron Capabilities Tutorials * Tensorflow 2.x - Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] ` .. note:: To use Jupyter Notebook see: * :ref:`setup-jupyter-notebook-steps-troubleshooting` * :ref:`running-jupyter-notebook-as-script` ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-nlp.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Natural Language Processing (NLP) Tutorials (``tensorflow-neuron``) =================================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * Tensorflow 2.x - HuggingFace DistilBERT with Tensorflow2 Neuron :ref:`[html] ` :github:`[notebook] ` .. toctree:: :hidden: /archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo /src/examples/tensorflow/huggingface_bert/huggingface_bert ================================================ FILE: archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Utilizing Neuron Capabilities Tutorials (``tensorflow-neuron``) =============================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving :ref:`[html] ` .. note:: To use Jupyter Notebook see: * :ref:`setup-jupyter-notebook-steps-troubleshooting` * :ref:`running-jupyter-notebook-as-script` ================================================ FILE: archive/tensorflow/tensorflow-neuron-inference.rst ================================================ .. _inference-tensorflow-neuron: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Inference on Inf1 (``tensorflow-neuron``) ========================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Tutorials Additional Examples API Reference Guide Misc .. include:: tensorflow-neuron-inference.txt ================================================ FILE: archive/tensorflow/tensorflow-neuron-inference.txt ================================================ .. card:: Setup (``tensorflow-neuron``) :class-body: sphinx-design-class-title-small See :doc:`TensorFlow Neuron setup `. .. dropdown:: Tutorials (``tensorflow-neuron``) :class-title: sphinx-design-class-title-med :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.txt .. dropdown:: Additional Examples (``tensorflow-neuron``) :class-title: sphinx-design-class-title-med :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuron/additional-examples.txt .. dropdown:: API Reference Guide (``tensorflow-neuron``) :class-title: sphinx-design-class-title-med :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuron/api-reference-guide.txt .. dropdown:: Misc (``tensorflow-neuron``) :class-title: sphinx-design-class-title-med :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.txt ================================================ FILE: archive/tensorflow/tensorflow-neuronx/api-reference-guide.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 API Reference Guide (``tensorflow-neuronx``) =========================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api /archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api /archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api .. include:: /archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt ================================================ FILE: archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt ================================================ * :ref:`tfneuronx-ref-neuron-tracing-api` * :ref:`tf-neuronx-ref-auto-replication-python-api` * :ref:`tf-neuronx-ref-analyze-model-api` ================================================ FILE: archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Misc (``tensorflow-neuronx``) ============================ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /release-notes/archive/tensorflow/tensorflow-neuronx/tensorflow-neuronx .. include:: /archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt ================================================ FILE: archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt ================================================ * :ref:`tensorflow-neuronx-release-notes` ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/index.rst ================================================ .. _tensorflow-neuron-setup: .. _tensorflow-neuronx-main: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow Setup Guide for Inf2 & Trn1 ====================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 Fresh install ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.rst ================================================ .. _install-neuronx-2.8.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Tensorflow Neuron (Neuron 2.8.0) ======================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. tab-set:: .. tab-item:: Tensorflow 2.10.0 .. tab-set:: .. tab-item:: Amazon Linux 2 AMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami .. tab-item:: Ubuntu 20 AMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.rst ================================================ .. _install-neuronx-2.9.0-tensorflow: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Tensorflow Neuron (Neuron 2.9.0) ======================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. tab-set:: .. tab-item:: Tensorflow 2.10.0 .. tab-set:: .. tab-item:: Amazon Linux 2 AMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami .. tab-item:: Ubuntu 20 AMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.0 --neuron-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2.rst ================================================ .. _tensorflow-neuronx-install-prev-al2: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous TensorFlow Neuron Releases for Amazon Linux (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.17.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.16.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2023.rst ================================================ .. _tensorflow-neuronx-install-prev-al2023: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous TensorFlow NeuronX Releases for Amazon Linux 2023 (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u20.rst ================================================ .. _tensorflow-neuronx-install-prev-u20: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous TensorFlow Neuron Releases for Ubuntu (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u22.rst ================================================ .. _tensorflow-neuronx-install-prev-u20: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous TensorFlow Neuron Releases for Ubuntu (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.rst ================================================ .. _install-tensorflow-neuronx: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install TensorFlow 2.x (``tensorflow-neuronx``) =============================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. tab-set:: .. tab-item:: Amazon Linux 2 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 13 :end-line: 16 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 32 :end-line: 33 .. tab-item:: Ubuntu 20 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 19 :end-line: 22 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 35 :end-line: 36 .. tab-item:: Tensorflow 2.9.3 .. tab-set:: .. tab-item:: Amazon Linux 2 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 13 :end-line: 16 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 74 :end-line: 75 .. tab-item:: Ubuntu 20 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 19 :end-line: 22 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 77 :end-line: 78 .. tab-item:: Tensorflow 2.8.4 .. tab-set:: .. tab-item:: Amazon Linux 2 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 13 :end-line: 16 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 80 :end-line: 81 .. tab-item:: Ubuntu 20 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 19 :end-line: 22 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 83 :end-line: 84 .. tab-item:: Tensorflow 2.7.4 .. tab-set:: .. tab-item:: Amazon Linux 2 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 13 :end-line: 16 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 86 :end-line: 87 .. tab-item:: Ubuntu 20 .. include :: /setup/install-templates/trn1/dlami-notes.rst :start-line: 19 :end-line: 22 .. include :: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 89 :end-line: 90 ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2-dlami.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. tensorflow-neuronx-al2-update: Update to latest TensorFlow Neuron (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 122 :end-line: 123 .. tab-item:: Tensorflow 2.9.3 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 125 :end-line: 126 .. tab-item:: Tensorflow 2.8.4 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 128 :end-line: 129 ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. tensorflow-neuronx-al2-update: Update to latest TensorFlow Neuron (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 56 :end-line: 57 .. tab-item:: Tensorflow 2.9.3 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 62 :end-line: 63 .. tab-item:: Tensorflow 2.8.4 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 68 :end-line: 69 ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20-dlami.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. tensorflow-neuronx-u20-update: Update to latest TensorFlow Neuron (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 131 :end-line: 132 .. tab-item:: Tensorflow 2.9.3 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 134 :end-line: 135 .. tab-item:: Tensorflow 2.8.4 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 137 :end-line: 138 ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. tensorflow-neuronx-u20-update: Update to latest TensorFlow NeuronX (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 59 :end-line: 60 .. tab-item:: Tensorflow 2.9.3 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 65 :end-line: 66 .. tab-item:: Tensorflow 2.8.4 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. include:: /src/helperscripts/installationScripts/python_instructions.txt :start-line: 71 :end-line: 72 ================================================ FILE: archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u22.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. tensorflow-neuronx-u22-update: Update to latest TensorFlow Neuron (``tensorflow-neuronx``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: Tensorflow 2.10.1 .. include:: /frameworks/torch/torch-neuronx/setup/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami --category=compiler_framework ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api.rst ================================================ .. _tf-neuronx-ref-auto-replication-python-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuronx``) Auto Multicore Replication (Beta) =========================================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The Neuron auto multicore replication Python API enables modifying TensorFlow 2.x models trace by ```tensorflow_neuronx.trace``` so that they can be automatically replicated across multiple cores. .. contents:: Table of contents :local: :depth: 1 TensorFlow 2.x (``tensorflow-neuron TF2.x``) Auto Multicore Replication Python API (Beta) ------------------------------------------------------------------------------------------- Method ^^^^^^ ``tensorflow.neuron.auto_multicore`` on models traced by ``tensorflow_neuronx.trace`` Description ^^^^^^^^^^^ Converts an existing AWS-Neuron-optimized ``keras.Model`` and returns an auto-replication tagged AWS-Multicore-Neuron-optimized ``keras.Model`` that can execute on AWS Machine Learning Accelerators. Like the traced model, the returned ``keras.Model`` will support inference only. Attributes or variables held by the original function or ``keras.Model`` will be dropped. The auto model replication feature in TensorFlow-Neuron enables you to create a model once and the model parallel replication would happen automatically. The desired number of cores can be less than the total available NeuronCores on an trn1 or inf2 instance but not less than 1. This reduces framework memory usage as you are not loading the same model multiple times manually. Calls to the returned model will execute the call on each core in a round-robin fashion. The returned ``keras.Model`` can be exported as SavedModel and served using TensorFlow Serving. Please see the TensorFlow Serving documentation for more information about exporting to saved model and serving using TensorFlow Serving. Note that the automatic replication will only work on models compiled with pipeline size 1: via ``--neuroncore-pipeline-cores=1``. If auto replication is not enabled, the model will default to replicate on up to 4 cores. See :ref:`neuron-compiler-cli-reference-guide` for more information about compiler options. Arguments ^^^^^^^^^ - **func:** The ``keras.Model`` or function to be traced. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. - **num_cores:** The desired number of cores where the model will be automatically replicated across Returns ^^^^^^^ - An AWS-Multicore-Neuron-optimized ``keras.Model``. Example Python API Usage for TF2.x traced models: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code :: python import tensorflow as tf import tensorflow.neuron as tfn import tensorflow_neuronx as tfnx input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) inputs = [input0] outputs = [dense0] model = tf.keras.Model(inputs=inputs, outputs=outputs) input0_tensor = tf.random.uniform([1, 3]) model_neuron = tfnx.trace(model, input0_tensor) # a trn1.2xlarge has 2 neuron cores num_cores = 2 multicore_model = tfn.auto_multicore(model_neuron, input0_tensor, num_cores=num_cores) multicore_model(input0_tensor) Example Python API Usage for TF2.x saved models: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code :: python from tensorflow.python import saved_model input0_tensor = tf.random.uniform([1, 3]) num_cores = 4 reload_model = saved_model.load(model_dir) multicore_model = tfn.auto_multicore(reload_model, input0_tensor, num_cores=num_cores) .. _tensorflow-ref-auto-replication-cli-api-neuronx: TensorFlow Neuron TF2.x (``tensorflow-neuronx TF2.x``) Auto Multicore Replication CLI (Beta) --------------------------------------------------------------------------------------------------------------- The Neuron auto multicore replication CLI enables modifying Tensorflow 2.x traced saved models so that they can be automatically replicated across multiple cores. By performing this call on Tensorflow Saved Models, we can support Tensorflow-Serving without significant modifications to the code. Method ^^^^^^ ``tf-neuron-auto-multicore MODEL_DIR --num_cores NUM_CORES --new_model_dir NEW_MODEL_DIR`` Arguments ^^^^^^^^^ - **MODEL_DIR:** The directory of a saved AWS-Neuron-optimized ``keras.Model``. - **NUM_CORES:** The desired number of cores where the model will be automatically replicated across - **NEW_MODEL_DIR:** The directory of where the AWS-Multicore-Neuron-optimized ``keras.Model`` will be saved Example CLI Usage for Tensorflow-Serving saved models: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code :: python tf-neuron-auto-multicore ./resnet --num_cores 8 --new_model_dir ./modified_resnet ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api.rst ================================================ .. _tfneuronx-ref-neuron-tracing-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuronx``) Tracing API ==================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The Neuron tracing API enables tracing TensorFlow 2.x models for deployment on trn1 and inf2 AWS machine learning accelerators. Method ------ ``tensorflow_neuronx.trace`` Description ----------- Trace a ``keras.Model`` or a Python callable that can be decorated by ``tf.function``, and return an AWS-Neuron-optimized ``keras.Model`` that can execute on trn1 and inf2 AWS machine learning accelerators. Tracing is ideal for ``keras.Model`` that accepts a list of ``tf.Tensor`` objects and returns a list of ``tf.Tensor`` objects. It is expected that users will provide example inputs, and the ``trace`` function will execute ``func`` symbolically and convert it to a ``keras.Model``. The returned ``keras.Model`` will support inference only. Attributes or variables held by the original function or ``keras.Model`` will be dropped. The returned ``keras.Model`` can be exported as SavedModel and served using TensorFlow Serving. Please see the TensorFlow Serving documentation for more information about exporting to saved model and serving using TensorFlow Serving. The returned ``keras.Model`` has an ``.on_neuron_ratio`` attribute which shows the percentage of ops mapped to neuron hardware. This calculation ignores PlaceholerOp, IdentityOp, ReadVariableOp and NoOp. Options can be passed to Neuron compiler via the environment variable ``NEURON_CC_FLAGS``. For example, the syntax ``env NEURON_CC_FLAGS="--workdir ./artifacts"`` directs the Neuron compiler to dump artifacts in the artifacts directory for debugging. See :ref:`neuron-compiler-cli-reference-guide` for more information about compiler options. Arguments --------- - **func:** The ``keras.Model`` or function to be traced. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. - **subgraph_builder_function:** (Optional) A callable with signature ``subgraph_builder_function(node : NodeDef) -> bool`` (``NodeDef`` is defined in tensorflow/core/framework/node_def.proto) that is used as a call-back function to determine which part of the tensorflow GraphDef given by tracing ``func`` will be placed on Machine Learning Accelerators. If ``subgraph_builder_function`` is not provided, then ``trace`` will automatically place operations on Machine Learning Accelerators or on CPU to maximize the execution efficiency. If it is provided, and ``subgraph_builder_function(node)`` returns ``True``, and placing ``node`` on Machine Learning Accelerators will not cause deadlocks during execution, then ``trace`` will place ``node`` on Machine Learning Accelerators. If ``subgraph_builder_function(node)`` returns ``False``, then ``trace`` will place ``node`` on CPU. .. _tensorflow-neuronx-special-flags: Special Flags ------------- These are flags that get passed directly to the Neuron tracing API (rather than the Neuron Compiler). The flags are still passed via the environment variable ``NEURON_CC_FLAGS``. - **workdir:** example usage - ``NEURON_CC_FLAGS='--workdir ./artifacts'`` will create a folder named artifacts in the current directory and save artifacts that can be used for debug. - **dynamic-batch-size:** example usage - ``NEURON_CC_FLAGS='--dynamic-batch-size'`` A flag to allow Neuron graphs to consume variable sized batches of data. Dynamic sizing is restricted to the 0th dimension of a tensor. - **extract-weights (Beta):** example usage - ``NEURON_CC_FLAGS='--extract-weights trn1.2xlarge'`` will reduce the compiled model's protobuf size by taking the weights out of the protobuf. Useful for compiling large models that would exceed the 2GB protobuf size limit. This feature is in beta. Model performance is not guaranteed and the flag does not work in combination with ``--neuroncore-pipeline-cores``, ``--dynamic-batch-size``, models with multiple NEFFs, and models that are 16GB or greater. Compiles models for different neuron instances depending on the instance type passed. Supports all trn1 and inf2 instance types except for trn1n. Returns ------- - An AWS-Neuron-optimized ``keras.Model``. Example Usage ------------- .. code:: python import tensorflow as tf import tensorflow_neuronx as tfnx input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) model = tf.keras.Model(inputs=[input0], outputs=[dense0]) example_inputs = tf.random.uniform([1, 3]) model_neuron = tfnx.trace(model, example_inputs) # trace # check to see how much of the model was compiled successfully print(model_neuron.on_neuron_ratio) model_dir = './model_neuron' model_neuron.save(model_dir) model_neuron_reloaded = tf.keras.models.load_model(model_dir) Example Usage with Manual Device Placement Using ``subgraph_builder_function`` ------------------------------------------------------------------------------ .. code:: python import tensorflow as tf import tensorflow_neuronx as tfnx input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) reshape0 = tf.keras.layers.Reshape([1, 3])(dense0) output0 = tf.keras.layers.Dense(2)(reshape0) model = tf.keras.Model(inputs=[input0], outputs=[output0]) example_inputs = tf.random.uniform([1, 3]) def subgraph_builder_function(node): return node.op == 'MatMul' model_neuron = tfnx.trace( model, example_inputs, subgraph_builder_function=subgraph_builder_function, ) ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api.rst ================================================ .. _tf-neuronx-ref-analyze-model-api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 TensorFlow 2.x (``tensorflow-neuronx``) analyze_model API ========================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. Method ------ ``tensorflow_neuronx.analyze_model`` Description ----------- Analyzes a ``keras.Model`` or a Python callable that can be decorated by ``tf.function`` for it's compatibility with Neuron. It displays supported vs. unsupported operators in the model as well as percentages and counts of each operator and returns a dictionary with operator statistics. Arguments --------- - **func:** The ``keras.Model`` or function to be analyzed. - **example_inputs:** A ``tf.Tensor`` or a tuple/list/dict of ``tf.Tensor`` objects for tracing the function. When ``example_inputs`` is a ``tf.Tensor`` or a list of ``tf.Tensor`` objects, we expect ``func`` to have calling signature ``func(example_inputs)``. Otherwise, the expectation is that inference on ``func`` is done by calling ``func(*example_inputs)`` when ``example_inputs`` is a ``tuple``, or ``func(**example_inputs)`` when ``example_inputs`` is a ``dict``. The case where ``func`` accepts mixed positional and keyword arguments is currently unsupported. Returns ------- - A results ``dict`` with these keys: ``'percent_supported'``, ``'supported_count'``, ``'total_count'``, ``'supported_operators'``, ``'unsupported_operators'``, ``'operators'``, ``'operator_count'``. Example Usage ------------- .. code:: python import tensorflow as tf import tensorflow_neuron as tfnx input0 = tf.keras.layers.Input(3) dense0 = tf.keras.layers.Dense(3)(input0) model = tf.keras.Model(inputs=[input0], outputs=[dense0]) example_inputs = tf.random.uniform([1, 3]) results = tfnx.analyze_model(model, example_inputs) print(results) # expected output ''' BiasAdd MatMul 100.00% of all operations (2 of 2) are supported {'percent_supported': 100.0, 'supported_count': 2, 'total_count': 2, 'supported_operators': {'BiasAdd', 'MatMul'}, 'unsupported_operators': [], 'operators': ['BiasAdd', 'MatMul'], 'operator_count': {'MatMul': 1, 'BiasAdd': 1}} ''' ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores.rst ================================================ .. _tensorflow-servingx-neuronrt-visible-cores: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving ===================================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. TensorFlow serving allows customers to scale-up inference workloads across a network. TensorFlow Neuron Serving uses the same API as normal TensorFlow Serving with two differences: (a) the saved model must be compiled for neuron and (b) the entry point is a different binary named ``tensorflow_model_server_neuronx``. Follow the steps below to install the package using apt-get or dnf. This will be pre-installed in a future release. Install TensorFlow Model Server and Serving API ----------------------------------------------- Follow the steps in the TensorFlow NeuronX installation guide. Then ensure you install using either apt-get or dnf. .. code:: bash sudo apt-get install tensorflow-model-server-neuronx or .. code:: bash sudo dnf install tensorflow-model-server-neuronx Also, you would need TensorFlow Serving API (use --no-deps to prevent installation of regular tensorflow). .. code:: bash pip install --no-deps tensorflow_serving_api For the example image preprocessing using Keras preprocessing, the Python Imaging Library Pillow is required: .. code:: bash pip install pillow To workaround h5py issue https://github.com/aws/aws-neuron-sdk/issues/220: .. code:: bash pip install "h5py<3.0.0" Export and Compile Saved Model ------------------------------ The following example shows graph construction followed by the addition of Neuron compilation step before exporting to saved model. .. code:: python import tensorflow as tf import tensorflow_neuronx as tfnx import numpy as np tf.keras.backend.set_learning_phase(0) tf.keras.backend.set_image_data_format('channels_last') image_sizes = [224, 224] model = tf.keras.applications.ResNet50(weights='imagenet') example_inputs = tf.random.uniform([1, *image_sizes, 3], dtype=tf.float32) model_neuron = tfnx.trace(model, example_inputs) # run the model once to define the forward pass and allow for saving model_neuron(example_inputs) tf.keras.models.save_model(model_neuron, './resnet50_neuron/1') Serving Saved Model ------------------- User can now serve the saved model with the tensorflow_model_server_neuron binary. To utilize multiple NeuronCores, it is recommended to launch multiple tensorflow model servers that listen to the same gRPC port: .. code:: bash export NEURON_RT_VISIBLE_CORES=0 # important to set this environment variable before launching model servers tensorflow_model_server_neuron --model_name=resnet50_neuron \ --model_base_path=$(pwd)/resnet50_neuron/ --port=8500 # then to run another server on a different neuron core open another # window and run this, except this time set NEURON_RT_VISIBLE_CORES=1 # you can keep doing this up to the number of Neuron Cores on your machine export NEURON_RT_VISIBLE_CORES=1 tensorflow_model_server_neuron --model_name=resnet50_neuron \ --model_base_path=$(pwd)/resnet50_neuron/ --port=8500 The compiled model is staged in neuron DRAM by the server to prepare for inference. Generate inference requests to the model server ----------------------------------------------- Now run inferences via GRPC as shown in the following sample client code: .. code:: python import numpy as np import grpc import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from tensorflow.keras.applications.resnet50 import decode_predictions tf.keras.backend.set_image_data_format('channels_last') if __name__ == '__main__': channel = grpc.insecure_channel('localhost:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) img_file = tf.keras.utils.get_file( "./kitten_small.jpg", "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg") img = image.load_img(img_file, target_size=(224, 224)) img_array = preprocess_input(image.img_to_array(img)[None, ...]) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50_neuron' request.inputs['input_1'].CopyFrom( tf.make_tensor_proto(img_array, shape=img_array.shape)) result = stub.Predict(request) prediction = tf.make_ndarray(result.outputs['output_1']) print(decode_predictions(prediction)) ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.rst ================================================ .. _inference-tensorflow-neuronx-tutorials: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tutorials (``tensorflow-neuronx``) =================================== .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: HuggingFace Roberta-Base /archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores .. include:: /archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt ================================================ FILE: archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt ================================================ * HuggingFace Roberta-Base :ref:`[html]` :github:`[notebook] ` * :ref:`tensorflow-servingx-neuronrt-visible-cores` .. note:: To use Jupyter Notebook see: * :ref:`setup-jupyter-notebook-steps-troubleshooting` * :ref:`running-jupyter-notebook-as-script` ================================================ FILE: archive/tensorflow/tensorflow-neuronx-inference.rst ================================================ .. _inference-tensorflow-neuronx: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Inference on Inf2 & Trn1/Trn1n (``tensorflow-neuronx``) ======================================================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Tutorials API Reference Guide Misc .. include:: tensorflow-neuronx-inference.txt ================================================ FILE: archive/tensorflow/tensorflow-neuronx-inference.txt ================================================ .. card:: Setup (``tensorflow-neuronx``) :class-body: sphinx-design-class-title-small See :doc:`TensorFlow NeuronX setup `. .. dropdown:: Tutorials (``tensorflow-neuronx``) :class-title: sphinx-design-class-title-med :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.txt .. dropdown:: API Reference Guide (``tensorflow-neuronx``) :class-title: sphinx-design-class-title-med :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuronx/api-reference-guide.txt .. dropdown:: Misc (``tensorflow-neuronx``) :class-title: sphinx-design-class-title-med :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.txt ================================================ FILE: archive/tensorflow/tensorflow-setup.rst ================================================ .. _tf-setup: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tensorflow Neuron Setup ======================= .. warning:: This document is archived. TensorFlow is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: tensorflow-setup.txt ================================================ FILE: archive/tensorflow/tensorflow-setup.txt ================================================ .. card:: Tensorflow Neuron (``tensorflow-neuronx``) Setup for Inf2, Trn1/Trn1n Instances :class-body: sphinx-design-class-title-small See :doc:`TensorFlow NeuronX setup `. .. card:: Tensorflow Neuron (``tensorflow-neuron``) Setup for Inf1 Instances :class-body: sphinx-design-class-title-small See :doc:`TensorFlow Neuron setup `. ================================================ FILE: archive/torch-neuron/additional-examples-inference-torch-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Additional Examples (``torch-neuron``) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: AWS Neuron Samples GitHub Repository .. include:: /archive/torch-neuron/additional-examples-inference-torch-neuron.txt ================================================ FILE: archive/torch-neuron/additional-examples-inference-torch-neuron.txt ================================================ * `AWS Neuron Samples GitHub Repository `_ ================================================ FILE: archive/torch-neuron/api-compilation-python-api.rst ================================================ .. _torch_neuron_trace_api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch-Neuron trace python API ================================ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The PyTorch-Neuron trace Python API provides a method to generate PyTorch models for execution on Inferentia, which can be serialized as TorchScript. It is analogous to :func:`torch.jit.trace` function in PyTorch. .. py:function:: torch_neuron.trace(model, example_inputs, **kwargs) The :func:`torch_neuron.trace` method sends operations to the Neuron-Compiler (``neuron-cc``) for compilation and embeds compiled artifacts in a TorchScript graph. Compilation can be done on any EC2 machine with sufficient memory and compute resources. c5.4xlarge or larger is recommended. Options can be passed to Neuron compiler via the compile function. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. This function partitions nodes into operations that are supported by Neuron and operations which are not. Operations which are not supported by Neuron are run on CPU. Graph partitioning can be controlled by the ``subgraph_builder_function``, ``minimum_segment_size``, and ``fallback`` parameters (See below). By default all supported operations are compiled and run on Neuron. The compiled graph can be saved using the :func:`torch.jit.save` function and restored using :func:`torch.jit.load` function for inference on Inf1 instances. During inference, the previously compiled artifacts will be loaded into the Neuron Runtime for inference execution. *Required Arguments* :arg ~torch.nn.Module,callable model: The functions that that will be run with ``example_inputs`` arguments. The arguments and return types must compatible with :func:`torch.jit.trace`. When a :class:`~torch.nn.Module` is passed to :func:`torch_neuron.trace`, only the :func:`~torch.nn.Module.forward` method is run and traced. :arg tuple example_inputs: A tuple of example inputs that will be passed to the ``model`` while tracing. The resulting trace can be run with inputs of different types and shapes assuming the traced operations support those types and shapes. This parameter may also be a single :class:`torch.Tensor` in which case it is automatically wrapped in a ``tuple``. *Optional Keyword Arguments* :keyword list[str] compiler_args: List of strings representing ``neuron-cc`` compiler arguments. Note that these arguments apply to all subgraphs generated by allowlist partitioning. For example, use :code:`compiler_args=['--neuroncore-pipeline-cores', '4']` to set number of NeuronCores per subgraph to 4. See :ref:`neuron-compiler-cli-reference` for more information about compiler options. :keyword int compiler_timeout: Timeout in seconds for waiting ``neuron-cc`` to complete. Exceeding this timeout will cause a ``subprocess.TimeoutExpired`` exception. :keyword str compiler_workdir: Work directory used by ``neuron-cc``. Useful for debugging and/or inspecting ``neuron-cc`` logs/IRs. :keyword callable subgraph_builder_function: A function which is evaluated on each node during graph partitioning. This takes in a torch graph operator node and returns a :class:`bool` value of whether it should be included in the fused Neuron graph or not. By default the partitioner selects all operators which are supported by Neuron. :keyword int minimum_segment_size: A parameter used during partitioning. This specifies the minimum number of graph nodes which should be compiled into a Neuron graph (default= :code:`2`). If the number of nodes is smaller than this size, the operations will run on CPU. :keyword float single_fusion_ratio_threshold: A parameter used during partitioning. During partitioning, if a single partition contains a fraction of operations greater than this threshold, only one graph partition will be compiled (default= :code:`0.6`). This is used to avoid compiling many small Neuron graphs. To force compilation of all graphs to Neuron (even when they are very small), a value of ``1.0`` can be used. :keyword bool fallback: A function parameter to turn off graph partitioning. Indicates whether to attempt to fall back to CPU operations if an operation is not supported by Neuron. By default this is ``True``. If this is set to ``False`` and an operation is not supported by Neuron, this will fail compilation and raise an ``AttributeError``. :keyword bool dynamic_batch_size: A flag to allow Neuron graphs to consume variable sized batches of data. Dynamic sizing is restricted to the 0th dimension of a tensor. :keyword list optimizations: A list of :class:`~torch_neuron.Optimization` passes to apply to the model. :keyword bool separate_weights: A flag to enable compilation of models with over 1.9GB of constant parameters. By default this flag is ``False``. If this is set to ``True`` and the compiler version is not new enough to support the flag, this will raise an ``NotImplementedError``. :keyword \*\*kwargs: All other keyword arguments will be forwarded directly to :func:`torch.jit.trace`. This supports flags like ``strict=False`` in order to allow dictionary outputs. :returns: The traced :class:`~torch.jit.ScriptModule` with embedded compiled neuron sub-graphs. Operations in this module will run on Neuron unless they are not supported by Neuron or manually partitioned to run on CPU. Note that in ``torch<1.8`` This would return a :class:`~torch.jit.ScriptFunction` if the input was function type. :rtype: ~torch.jit.ScriptModule, ~torch.jit.ScriptFunction .. py:class:: torch_neuron.Optimization A set of optimization passes that can be applied to the model. .. py:attribute:: FLOAT32_TO_FLOAT16 A post-processing pass that converts all :attr:`torch.float32` tensors to :attr:`torch.float16` tensors. The advantage to this optimization pass is that input/output tensors will be type cast. This reduces the amount of data that will be copied to and from Inferentia hardware. The resulting traced model will accept both :attr:`torch.float32` and :attr:`torch.float16` inputs where the model used :attr:`torch.float32` inputs during tracing. It is only beneficial to enable this optimization if the throughput of a model is highly dependent upon data transfer speed. This optimization is not recommended if the final application will use :attr:`torch.float32` inputs since the :attr:`torch.float16` type cast will occur on CPU during inference. Example Usage ------------- Function Compilation ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import torch import torch_neuron def foo(x, y): return 2 * x + y # Run `foo` with the provided inputs and record the tensor operations traced_foo = torch.neuron.trace(foo, (torch.rand(3), torch.rand(3))) # `traced_foo` can now be run with the TorchScript interpreter or saved # and loaded in a Python-free environment torch.jit.save(traced_foo, 'foo.pt') traced_foo = torch.jit.load('foo.pt') Module Compilation ~~~~~~~~~~~~~~~~~~ .. code-block:: python import torch import torch_neuron import torch.nn as nn class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return self.conv(x) + 1 n = Net() n.eval() inputs = torch.rand(1, 1, 3, 3) # Trace a specific method and construct `ScriptModule` with # a single `forward` method neuron_forward = torch.neuron.trace(n.forward, inputs) # Trace a module (implicitly traces `forward`) and constructs a # `ScriptModule` with a single `forward` method neuron_net = torch.neuron.trace(n, inputs) Pre-Trained Model Compilation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following is an example usage of the compilation Python API, with default compilation arguments, using a pretrained :class:`torch.nn.Module`: .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image) .. _compiling-models-with-kwargs: Compiling models with torch.jit.trace kwargs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example uses the :code:`strict=False` flag to compile a model with dictionary outputs. Similarly, any other keyword argument of :func:`torch.jit.trace` can be passed directly to :func:`torch_neuron.trace` so that it is passed to the underlying trace call. .. code-block:: python import torch import torch_neuron import torch.nn as nn class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return {'conv': self.conv(x) + 1} model = Model() model.eval() inputs = torch.rand(1, 1, 3, 3) # use the strict=False kwarg to compile a model with dictionary outputs # the model output format does not change model_neuron = torch.neuron.trace(model, inputs, strict=False) Dynamic Batching ~~~~~~~~~~~~~~~~ This example uses the optional :code:`dynamic_batch_size` option in order to support variable sized batches at inference time. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input of batch size 1 image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image, dynamic_batch_size=True) # Execute with a batch of 7 images batch = torch.rand([7, 3, 224, 224]) results = model_neuron(batch) Manual Partitioning ~~~~~~~~~~~~~~~~~~~ The following example uses the optional :code:`subgraph_builder_function` parameter to ensure that only a specific convolution layer is compiled to Neuron. The remaining operations are executed on CPU. .. code-block:: python import torch import torch_neuron import torch.nn as nn class ExampleConvolutionLayer(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return self.conv(x) + 1 class Model(nn.Module): def __init__(self): super().__init__() self.layer = ExampleConvolutionLayer() def forward(self, x): return self.layer(x) * 100 def subgraph_builder_function(node) -> bool: """Select if the node will be included in the Neuron graph""" # Node names are tuples of Module names. if 'ExampleConvolutionLayer' in node.name: return True # Ignore all operations not in the example convolution layer return False model = Model() model.eval() inputs = torch.rand(1, 1, 3, 3) # Log output shows that `aten::_convolution` and `aten::add` are compiled # but `aten::mul` is not. This will seamlessly switch between Neuron/CPU # execution in a single graph. neuron_model = torch_neuron.trace( model, inputs, subgraph_builder_function=subgraph_builder_function ) Separate Weights ~~~~~~~~~~~~~~~~ This example uses the optional :code:`separate_weights` option in order to support compilation of models greater than 1.9GB. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) #the models' output format does not change model_neuron = torch.neuron.trace(model, image, separate_weights=True) ================================================ FILE: archive/torch-neuron/api-core-placement.rst ================================================ .. _torch_core_placement_api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch Neuron (``torch-neuron``) Core Placement API ===================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. automodule:: placement :module-name: torch_neuron.experimental :members: ================================================ FILE: archive/torch-neuron/api-reference-guide-torch-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 API Reference Guide (``torch-neuron``) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: PyTorch Neuron trace Python API torch.neuron.DataParallel API /archive/torch-neuron/api-core-placement .. include:: /archive/torch-neuron/api-reference-guide-torch-neuron.txt ================================================ FILE: archive/torch-neuron/api-reference-guide-torch-neuron.txt ================================================ * :ref:`PyTorch Neuron trace Python API ` * :ref:`torch.neuron.DataParallel API ` * :ref:`torch_core_placement_api` ================================================ FILE: archive/torch-neuron/api-torch-neuron-dataparallel-api.rst ================================================ .. _api_torch_neuron_dataparallel_api: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 torch.neuron.DataParallel API ============================= .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The :func:`torch.neuron.DataParallel` Python API implements data parallelism on :class:`~torch.jit.ScriptModule` models created by the :ref:`torch_neuron_trace_api`. This function is analogous to :class:`~torch.nn.DataParallel` in PyTorch. The :ref:`torch-neuron-dataparallel-app-note` application note provides an overview of how :func:`torch.neuron.DataParallel` can be used to improve the performance of inference workloads on Inferentia. .. py:function:: torch.neuron.DataParallel(model, device_ids=None, dim=0) Applies data parallelism by replicating the model on available NeuronCores and distributing data across the different NeuronCores for parallelized inference. By default, DataParallel will use all available NeuronCores allocated for the current process for parallelism. DataParallel will apply parallelism on ``dim=0`` if ``dim`` is not specified. DataParallel automatically enables :ref:`dynamic batching ` on eligible models if ``dim=0``. Dynamic batching can be dsiabled using :func:`torch.neuron.DataParallel.disable_dynamic_batching`. If dynamic batching is not enabled, the batch size at compilation-time must be equal to the batch size at inference-time divided by the number of NeuronCores being used. Specifically, the following must be true when dynamic batching is disabled: ``input.shape[dim] / len(device_ids) == compilation_input.shape[dim]``. DataParallel will throw a warning if dynamic batching cannot be enabled. DataParallel will try load all of a model’s NEFFs onto a single NeuronCore, only if all of the NEFFs can fit on a single NeuronCore. DataParallel does not currently support models that have been compiled with :ref:`neuroncore-pipeline`. :func:`torch.neuron.DataParallel` requires PyTorch >= 1.8. *Required Arguments* :arg ~torch.jit.ScriptModule model: Model created by the :ref:`torch_neuron_trace_api` to be parallelized. *Optional Arguments* :arg list device_ids: List of :obj:`int` or ``'nc:#'`` that specify the NeuronCores to use for parallelization (default: all NeuronCores). Refer to the :ref:`device_ids note ` for a description of how ``device_ids`` indexing works. :arg int dim: Dimension along which the input tensor is scattered across NeuronCores (default ``dim=0``). *Attributes* :arg int num_workers: Number of worker threads used for multithreaded inference (default: ``2 * number of NeuronCores``). :arg int split_size: Size of the input chunks (default: ``max(1, input.shape[dim] // number of NeuronCores)``). .. py:function:: torch.neuron.DataParallel.disable_dynamic_batching() Disables automatic dynamic batching on the DataParallel module. See :ref:`Dynamic batching disabled ` for example of how DataParallel can be used with dynamic batching disabled. Use as follows: >>> model_parallel = torch.neuron.DataParallel(model_neuron) >>> model_parallel.disable_dynamic_batching() .. _device_ids_note: .. note:: ``device_ids`` uses per-process NeuronCore granularity and zero-based indexing. Per-process granularity means that each Python process "sees" its own view of the world. Specifically, this means that ``device_ids`` only "sees" the NeuronCores that are allocated for the current process. Zero-based indexing means that each Python process will index its allocated NeuronCores starting at 0, regardless of the "global" index of the NeuronCores. Zero-based indexing makes it possible to redeploy the exact same code unchanged in different process. This behavior is analogous to the ``device_ids`` argument in the PyTorch :class:`~torch.nn.DataParallel` function. As an example, assume DataParallel is run on an inf1.6xlarge, which contains four Inferentia chips each of which contains four NeuronCores: * If ``NEURON_RT_VISIBLE_CORES`` is not set, a single process can access all 16 NeuronCores. Thus specifying ``device_ids=["nc:0"]`` will correspond to chip0:core0 and ``device_ids=["nc:14"]`` will correspond to chip3:core2. * However, if two processes are launched where: process 1 has ``NEURON_RT_VISIBLE_CORES=0-6`` and process 2 has ``NEURON_RT_VISIBLE_CORES=7-15``, ``device_ids=["nc:14"]`` cannot be specified in either process. Instead, chip3:core2 can only be accessed in process 2. Additionally, chip3:core2 is specified in process 2 with ``device_ids=["nc:7"]``. Furthermore, in process 1, ``device_ids=["nc:0"]`` would correspond to chip0:core0; in process 2 ``device_ids=["nc:0"]`` would correspond to chip1:core3. Examples -------- The following sections provide example usages of the :func:`torch.neuron.DataParallel` module. Default usage ^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-default.rst Specifying NeuronCores ^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst DataParallel with dim != 0 ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst Dynamic batching ^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst .. _dataparallel_example_disable_dynamic_batching_api: Dynamic batching disabled ^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst Full tutorial with torch.neuron.DataParallel ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For an end-to-end tutorial that uses DataParallel, see the :ref:`PyTorch Resnet Tutorial `. ================================================ FILE: archive/torch-neuron/developer-guide-torch-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Developer Guide (``torch-neuron``) ================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Running Inference on Variable Input Shapes with Bucketing Data Parallel Inference on PyTorch Neuron /archive/torch-neuron/guides/torch-lstm-support /archive/torch-neuron/guides/core-placement/torch-core-placement .. include:: /archive/torch-neuron/developer-guide-torch-neuron.txt ================================================ FILE: archive/torch-neuron/developer-guide-torch-neuron.txt ================================================ * :ref:`Running Inference on Variable Input Shapes with Bucketing ` * :ref:`Data Parallel Inference on PyTorch Neuron ` * :ref:`torch_neuron_lstm_support` * :ref:`torch_neuron_core_placement_guide` ================================================ FILE: archive/torch-neuron/guides/core-placement/torch-core-placement.rst ================================================ .. _torch_neuron_core_placement_guide: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch Neuron (``torch-neuron``) Core Placement ================================================ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. This programming guide describes the available techniques and APIs to be able to allocate NeuronCores to a process and place models onto specific NeuronCores. In order of precedence, the current recommendation is to use the following placement techniques: 1. For most regular models, default core placement should be used in conjunction with ``NEURON_RT_NUM_CORES`` (:ref:`torch_placement_default`) 2. For more specific core placement for NeuronCore Pipelined models, then ``NEURONCORE_GROUP_SIZES`` should be used (:ref:`torch_placement_ncg`). 3. Finally, for even more granular control, then the beta explicit placement APIs may be used (:ref:`torch_placement_explicit`). .. contents:: Table of Contents :depth: 3 The following guide will assume a machine with 8 NeuronCores: - NeuronCores will use the notation ``nc0``, ``nc1``, etc. - NeuronCore Groups will use the notation ``ncg0``, ``ncg1`` etc. - Models will use the notation ``m0``, ``m1`` etc. NeuronCores, NeuronCore Groups, and model allocations will be displayed in the following format: .. raw:: html :file: images/0-0-legend.svg Note that the actual cores that are visible to the process can be adjusted according to the :ref:`nrt-configuration`. NeuronCore Pipeline ------------------- A key concept to understand the intent behind certain core placement strategies is NeuronCore Pipelining (See :ref:`neuroncore-pipeline`). NeuronCore Pipelining allows a model to be automatically split into pieces and executed on different NeuronCores. For most models only 1 NeuronCore will be required for execution. A model will **only** require more than one NeuronCore when using NeuronCore Pipeline. When model pipelining is enabled, the model is split between multiple NeuronCores and data is transferred between them. For example, if the compiler flag ``--neuroncore-pipeline-cores 4`` is used, this splits the model into 4 pieces to be executed on 4 separate NeuronCores. .. _torch_placement_default: Default Core Allocation & Placement ----------------------------------- The most basic requirement of an inference application is to be able to place a single model on a single NeuronCore. More complex applications may use multiple NeuronCores or even multiple processes each executing different models. The important thing to note about designing an inference application is that a single NeuronCore will always be allocated to a single process. *Processes do not share NeuronCores*. Different configurations can be used to ensure that an application process has enough NeuronCores allocated to execute its model(s): - Default: A process will attempt to take ownership of **all NeuronCores** visible on the instance. This should be used when an instance is only running a single inference process since no other process will be allowed to take ownership of any NeuronCores. - ``NEURON_RT_NUM_CORES``: Specify the **number of NeuronCores** to allocate to the process. This places no restrictions on which NeuronCores will be used, however, the resulting NeuronCores will always be contiguous. This should be used in multi-process applications where each process should only use a subset of NeuronCores. - ``NEURON_RT_VISIBLE_CORES``: Specifies exactly **which NeuronCores** are allocated to the process by index. Similar to ``NEURON_RT_NUM_CORES``, this can be used in multi-process applications where each process should only use a subset of NeuronCores. This provides more fined-grained controls over the exact NeuronCores that are allocated to a given process. - ``NEURONCORE_GROUP_SIZES``: Specifies a number of **NeuronCore Groups** which are allocated to the process. This is described in more detail in the :ref:`torch_placement_ncg` section. See the :ref:`nrt-configuration` for more environment variable details. Example: Default ^^^^^^^^^^^^^^^^ **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc1 .. raw:: html :file: images/0-1-default-2.svg With no environment configuration, the process will take ownership of all NeuronCores. In this example, only two of the NeuronCores are used by the process and the remaining are allocated but left idle. Example: ``NEURON_RT_NUM_CORES`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Environment Setup**: .. code-block:: bash export NEURON_RT_NUM_CORES = '2' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc1 .. raw:: html :file: images/0-2-default-rt-num-cores.svg Since there is no other process on the instance, only the first 2 NeuronCores will be acquired by the process. Models load in a simple linear order to the least used NeuronCores. Example: ``NEURON_RT_VISIBLE_CORES`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Environment Setup**: .. code-block:: bash export NEURON_RT_VISIBLE_CORES = '4-5' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc4 m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc5 .. raw:: html :file: images/0-3-default-rt-visible-cores.svg Unlike ``NEURON_RT_NUM_CORES``, setting the visible NeuronCores allows the process to take control of a specific contiguous set. This allows an application to have a more fine-grained control of where models will be placed. Example: Overlapping Models ^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Environment Setup**: .. code-block:: bash export NEURON_RT_VISIBLE_CORES = '0-1' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc0-nc1 m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc1 .. raw:: html :file: images/0-4-default-overlap-model-2.svg .. raw:: html :file: images/0-4-default-overlap.svg This shows how models may share NeuronCores but the default model placement will attempt to evenly distribute NeuronCore usage rather than overlapping all models on a single NeuronCore. Example: Multiple Processes ^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Environment Setup**: .. code-block:: bash export NEURON_RT_NUM_CORES = '2' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 m1 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc1 In this example, if the script is run **twice**, the following allocations will be made: .. raw:: html :file: images/0-5-default-multiprocess.svg Note that each process will take ownership of as many NeuronCores as is specified by the ``NEURON_RT_NUM_CORES`` configuration. .. _torch_placement_ncg: NEURONCORE_GROUP_SIZES ---------------------- .. important:: The use of explicit core placement should only be used when a specific performance goal is required. By default ``torch-neuron`` places models on the **least used** NeuronCores. This should be optimal for most applications. Secondly, ``NEURONCORE_GROUP_SIZES`` is being deprecated in a future release and should be avoided in favor of newer placement methods. Use ``NEURON_RT_NUM_CORES`` or ``NEURON_RT_VISIBLE_CORES`` with default placement if possible (See :ref:`torch_placement_default`) In the current release of NeuronSDK, the most well-supported method of placing models onto specific NeuronCores is to use the ``NEURONCORE_GROUP_SIZES`` environment variable. This will define a set of "NeuronCore Groups" for the application process. NeuronCore Groups are *contiguous sets of NeuronCores* that are allocated to a given process. Creating groups allows an application to ensure that a model has a defined set of NeuronCores that will always be allocated to it. Note that NeuronCore Groups *can* be used to allocate non-pipelined models (those requiring exactly 1 NeuronCore) to specific NeuronCores but this is not the primary intended use. The intended use of NeuronCore Groups is to ensure pipelined models (those requiring >1 NeuronCore) have exclusive access to a specific set of contiguous NeuronCores. In the cases where models are being used *without* NeuronCore Pipeline, the general recommendation is to use default placement (See :ref:`torch_placement_default`). The following section demonstrates how ``NEURONCORE_GROUP_SIZES`` can be used and the issues that may arise. Example: Single NeuronCore Group ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the example where one model requires 4 NeuronCores, the correct environment configuration would be: **Environment Setup**: .. code-block:: bash export NEURONCORE_GROUP_SIZES = '4' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt') # Loads to nc0-nc3 .. raw:: html :file: images/1-ncg-4.svg This is the most basic usage of a NeuronCore Group. The environment setup causes the process to take control of 4 NeuronCores and then the script loads a model compiled with a NeuronCore Pipeline size of 4 to the first group. Example: Multiple NeuronCore Groups ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ With more complicated configurations, the intended use of ``NEURONCORE_GROUP_SIZES`` is to create 1 Group per model with the correct size to ensure that the models are placed on the intended NeuronCores. Similarly, the environment would need to be configured to create a NeuronCore Group for each model: **Environment Setup**: .. code-block:: bash export NEURONCORE_GROUP_SIZES = '3,4,1' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt') # Loads to nc0-nc2 m1 = torch.jit.load('model-with-4-neuron-pipeline-cores.pt') # Loads to nc3-nc6 m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc7 .. raw:: html :file: images/2-ncg-3-4-1.svg Issue: Overlapping Models with Differing Model Sizes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When multiple models are loaded to a single NeuronCore Group, this can cause unintended inefficiencies. A single model is only intended to span a single NeuronCore Group. Applications with many models of varying sizes can be restricted by NeuronCore Group configurations since the most optimal model layout may require more fine-grained controls. **Environment Setup**: .. code-block:: bash export NEURONCORE_GROUP_SIZES = '2,2' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc0-nc1 m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc2-nc3 m2 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc2 m4 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc0 .. raw:: html :file: images/3-models-m4-0-warning.svg .. raw:: html :file: images/3-models-m2-0-m3-2.svg .. raw:: html :file: images/3-ncg-2-2.svg Here the ``NEURONCORE_GROUP_SIZES`` does not generate an optimal layout because placement strictly follows the layout of NeuronCore Groups. A potentially more optimal layout would be to place ``m4`` onto ``nc1``. In this case, since a pipelined model will not be able to have exclusive access to a set of NeuronCores, the default NeuronCore placement (no NeuronCore Groups specified) would more evenly distribute the models. Also note here that this is an example of where the order of model loads affects which model is assigned to which NeuronCore Group. If the order of the load statements is changed, models may be assigned to different NeuronCore Groups. Issue: Incompatible Model Sizes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Another problem occurs when attempting to place a model which does not evenly fit into a single group: **Environment Setup**: .. code-block:: bash export NEURONCORE_GROUP_SIZES = '2,2' **Python Script**: .. code-block:: python import torch import torch_neuron m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc0-nc1 m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc2-nc3 m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt') # Loads to nc0-nc2 .. raw:: html :file: images/4-models-m2-0-2-warning.svg .. raw:: html :file: images/3-ncg-2-2.svg The model will be placed *across* NeuronCore Groups since there is no obvious group to assign the model to according to the environment variable configuration. Depending on the individual model and application requirements, the placement here may not be optimal. Issue: Multiple Model Copies ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It is common in inference serving applications to use multiple replicas of a single model across different NeuronCores. This allows the hardware to be fully utilized to maximize throughput. In this scenario, when using NeuronCore Groups, the only way to replicate a model on multiple NeuronCores is to create a *new model* object. In the example below, 4 models loads are performed to place a model in each NeuronCore Group. **Environment Setup**: .. code-block:: bash export NEURONCORE_GROUP_SIZES = '2,2,2,2' **Python Script**: .. code-block:: python import torch import torch_neuron models = list() for _ in range(4): model = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') models.append(model) .. raw:: html :file: images/3-ncg-2-2-2-2-copies.svg The largest consequence of this type of model allocation is that the application code is responsible for routing inference requests to models. There are a variety of ways to implement the inference switching but in all cases routing logic needs to be implemented in the application code. Issue Summary ^^^^^^^^^^^^^ The use of ``NEURONCORE_GROUP_SIZES`` has the following problems: - **Variable Sized Models**: Models which require crossing NeuronCore Group boundaries may be placed poorly. This means group configuration limits the size of which models can be loaded. - **Model Load Order**: Models are loaded to NeuronCore Groups greedily. This means that the order of model loads can potentially negatively affect application performance by causing unintentional overlap. - **Implicit Placement**: NeuronCore Groups cannot be explicitly chosen in the application code. - **Manual Replication**: When loading multiple copies of a model to different NeuronCore Groups, this requires that multiple model handles are used. .. _torch_placement_explicit: Explicit Core Placement ------------------------------------- To address the limitations of ``NEURONCORE_GROUP_SIZES``, a new set of APIs has been added which allows specific NeuronCores to be chosen by the application code. These can be found in the :ref:`torch_neuron_core_placement_api` documentation. Example: Manual Core Selection ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The most direct usage of the placement APIs is to manually select the start NeuronCore that each model is loaded to. This will automatically use as many NeuronCores as is necessary for that model (1 for most models, >1 for NeuronCore Pipelines models). **Environment Setup**: .. code-block:: bash export NEURON_RT_NUM_CORES = '4' **Python Script**: .. code-block:: python import torch import torch_neuron # NOTE: Order of loads does NOT matter with torch_neuron.experimental.neuron_cores_context(2): m1 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc2-nc3 with torch_neuron.experimental.neuron_cores_context(0): m2 = torch.jit.load('model-with-3-neuron-pipeline-cores.pt') # Loads to nc0-nc2 with torch_neuron.experimental.neuron_cores_context(0): m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads to nc0-nc1 with torch_neuron.experimental.neuron_cores_context(3): m3 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads to nc3 .. raw:: html :file: images/5-models-m2-0-2-m3-3.svg .. raw:: html :file: images/5-placement.svg Note that this directly solves the ``NEURONCORE_GROUP_SIZES`` issues of: - **Variable Sized Models**: Now since models are directly placed on the NeuronCores requested by the application, there is no disconnect between the model sizes and NeuronCore Group sizes. - **Model Load Order**: Since the NeuronCores are explicitly selected, there is no need to be careful about the order in which models are loaded since they can be placed deterministically regardless of the load order. - **Implicit Placement**: Similarly, explicit placement means there is no chance that a model will end up being allocated to an incorrect NeuronCore Group. Example: Automatic Multicore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using explicit core placement it is possible to replicate a model to multiple NeuronCores simultaneously. This means that a single model object within python can utilize all available NeuronCores (or NeuronCores allocated to the process). **Environment Setup**: .. code-block:: bash export NEURON_RT_NUM_CORES = '8' **Python Script**: .. code-block:: python import torch import torch_neuron with torch_neuron.experimental.multicore_context(): m0 = torch.jit.load('model-with-1-neuron-pipeline-cores.pt') # Loads replications to nc0-nc7 .. raw:: html :file: images/6-multicore.svg This addresses the last ``NEURONCORE_GROUP_SIZES`` issue of: - **Manual Replication**: Since models can be automatically replicated to multiple NeuronCores, this means that applications no longer need to implement routing logic and perform multiple loads. This API has a secondary benefit that the exact same loading logic can be used on an ``inf1.xlarge`` or an ``inf1.6xlarge``. In either case, it will use all of the NeuronCores that are visible to the process. This means that no special logic needs to be coded for different instance types. Example: Explicit Replication ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Replication is also possible with the :func:`~torch_neuron.experimental.neuron_cores_context` API. The number of replications is chosen by ``replications = floor(nc_count / cores_per_model)``. **Environment Setup**: .. code-block:: bash export NEURON_RT_NUM_CORES = '8' **Python Script**: .. code-block:: python import torch import torch_neuron with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=4): m0 = torch.jit.load('model-with-2-neuron-pipeline-cores.pt') # Loads replications to nc2-nc5 .. raw:: html :file: images/7-replication.svg ================================================ FILE: archive/torch-neuron/guides/torch-lstm-support.rst ================================================ .. _torch_neuron_lstm_support: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Developer Guide - PyTorch Neuron (``torch-neuron``) |LSTM| Support ================================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The `torch-neuron` package can support |LSTM| operations and yield high performance on both fixed-length and variable-length sequences. Most network configurations can be supported, with the exception of those that require |PackedSequence| usage outside of |LSTM| or |pad_packed_sequence| operations. Neuron must guarantee that the shapes can remain fixed throughout the network. The following sections describe which scenarios can and cannot be supported. Supported Usage --------------- Fixed-Length Sequences ~~~~~~~~~~~~~~~~~~~~~~ In normal usage of an |LSTM|, the inputs and outputs are expected to be a fixed size sequence length. This is the most basic usage of an |LSTM| but may not be applicable to applications where the input sequence length may vary. .. code-block:: python import torch import torch_neuron class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs): output, (ht, ct) = self.lstm(inputs) return output, (ht, ct) # Example Inputs seq_len, batch_size, input_size = 5, 2, 3 inputs = torch.rand(seq_len, batch_size, input_size) # Trace torch_neuron.trace(Network(), (inputs,)) Packed Input, Padded Output, *Pre-Sorted* Inputs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A common usage of an |LSTM| is when the input sequence sizes vary according to an input sequence lengths (such as tokens). For example, the following sentences could result in two different sequence lengths after tokenization: .. code-block:: python # Input text = [ 'Hello, sailor', 'Example', ] # ... Tokenization ... # Result tokens = [ [101, 7592, 1010, 11803, 102], [101, 2742, 102, 0, 0], ] lengths = [5, 3] Because the lengths are different, the final |LSTM| state will be dependent upon the lengths of each sequence in the batch. Torch provides a way to deal with these types of sequences by densely packing batches into a |PackedSequence|. The most common way this is constructed is by using the |pack_padded_sequence| utility function prior to feeding inputs into the |LSTM|. Packing the above sequences would result in the following data and batch size tensors. .. code-block:: python data = [101, 101, 7592, 2742, 1010, 102, 11803, 102] batch_sizes = [2, 2, 2, 1, 1] In addition to correctly computing final |LSTM| state, using a packed sequence instead of a padded sequence also improves model performance on CPU. On Neuron, where computation is fixed to the maximum length ahead of time, **this is does not improve performance**. When an |LSTM| is processing a |PackedSequence|, it must do so in a descending sorted length order. To ensure that sequences are sorted, |pack_padded_sequence| provides an ``enforce_sorted`` flag. When ``enforce_sorted`` is ``True``, the input is *already expected* to contain sequences sorted by length in a decreasing order along the batch dimension. Note that this must be enforced in the application-level code but is only relevant when batch size > 1. The following network can compile successfully because the input and output to the network are guaranteed to be a fixed shape. The input shape is expected to be a padded tensor and the output tensor is expected to be padded to the maximum sequence length using the |pad_packed_sequence| function call: .. code-block:: python :emphasize-lines: 14 import torch import torch_neuron class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs, lengths): packed_input = torch.nn.utils.rnn.pack_padded_sequence( inputs, lengths=lengths, enforce_sorted=True, ) packed_result, (ht, ct) = self.lstm(packed_input) padded_result, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_result) return padded_result, ht, ct # Example Inputs seq_len, batch_size, input_size = 5, 2, 3 inputs = torch.rand(seq_len, batch_size, input_size) lengths = torch.tensor([seq_len] * batch_size) # Trace torch_neuron.trace(Network(), (inputs, lengths)) Packed Input, Padded Output, *Unsorted* Inputs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When ``enforce_sorted`` is ``False``, the input will be sorted unconditionally. This causes some CPU overhead on Neuron because unsupported operators will be inserted into the graph such as ``aten::sort`` and ``aten::scatter_``. The ``aten::lstm`` operation can still be supported, but it will be less efficient than when ``enforce_sorted`` is ``True``. The following code is able to be traced, but results in the sorting operations running on CPU. This is not problematic in this case because the ``aten::sort`` and ``aten::scatter_`` are executed on CPU at the very beginning of the graph just prior to Neuron execution. Like the previous example, the call to |pad_packed_sequence| ensures that the output is a fixed-shape based on the maximum sequence length. .. code-block:: python :emphasize-lines: 14 import torch import torch_neuron class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs, lengths): packed_input = torch.nn.utils.rnn.pack_padded_sequence( inputs, lengths=lengths, enforce_sorted=False, ) packed_result, (ht, ct) = self.lstm(packed_input) padded_result, _ = torch.nn.utils.rnn.pad_packed_sequence(packed_result) return padded_result, ht, ct # Example Inputs seq_len, batch_size, input_size = 5, 2, 3 inputs = torch.rand(seq_len, batch_size, input_size) lengths = torch.tensor([seq_len] * batch_size) # Trace trace = torch_neuron.trace(Network(), (inputs, lengths)) Packed Inputs, Final Hidden & Cell State Only ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When **only** the final |LSTM| hidden & cell state is used, it does not matter if the inputs are packed or unpacked since these state tensors will not vary in size. .. code-block:: python :emphasize-lines: 16,17 import torch import torch_neuron class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs, lengths): packed_input = torch.nn.utils.rnn.pack_padded_sequence( inputs, lengths=lengths, enforce_sorted=True, ) packed_output, (ht, ct) = self.lstm(packed_input) return ht, ct # Example Inputs seq_len, batch_size, input_size = 5, 2, 3 inputs = torch.rand(seq_len, batch_size, input_size) lengths = torch.tensor([seq_len] * batch_size) # Trace trace = torch_neuron.trace(Network(), (inputs, lengths)) Note that when the ``packed_output`` is unused, it does not need to be passed to the |pad_packed_sequence| to enable the |LSTM| to be compiled. Unsupported Usage ----------------- Neuron does not support the use of a |PackedSequence| outside of the |LSTM| operation and the |pad_packed_sequence| operation. This is because the shape of a |PackedSequence| can vary depending on the input data. This is incompatible with the Neuron restriction that all tensor sizes must be known at compilation time. When a |PackedSequence| is used only by an |LSTM| or |pad_packed_sequence| operation, Neuron *can guarantee* the size of the intermediary tensors by padding on behalf of the application. This means that If the |PackedSequence| is either used by a different operation or returned from the network this would result in all of the |LSTM| operations to be executed on CPU or the network compilation will fail. |PackedSequence| Returned ~~~~~~~~~~~~~~~~~~~~~~~~~ The following is unsupported because the |PackedSequence| result of the |LSTM| is returned by the network: .. code-block:: python :emphasize-lines: 14 class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs, lengths): packed_input = torch.nn.utils.rnn.pack_padded_sequence( inputs, lengths=lengths, enforce_sorted=False, ) packed_result, (ht, ct) = self.lstm(packed_input) return packed_result.data, ht, ct **Behavior**: In this case, compilation fails and the following warning is generated: .. code-block:: text Operator "aten::lstm" consuming a PackedSequence input can only be supported when its corresponding PackedSequence output is unused or unpacked using "aten::_pad_packed_input". Found usage by "prim::Return" **Resolution**: To avoid this error, the ``packed_result`` should be padded prior to being returned from the network by using |pad_packed_sequence| Invalid |PackedSequence| Usage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following is unsupported because the |PackedSequence| result of the |LSTM| is used by a non-LSTM operator: .. code-block:: python :emphasize-lines: 14 class Network(torch.nn.Module): def __init__(self): super().__init__() self.lstm = torch.nn.LSTM(input_size=3, hidden_size=7) def forward(self, inputs, lengths): packed_input = torch.nn.utils.rnn.pack_padded_sequence( inputs, lengths=lengths, enforce_sorted=False, ) packed_result, (ht, ct) = self.lstm(packed_input) return torch.max(packed_result.data) **Behavior**: In this case, compilation fails and the following warning is generated: .. code-block:: text Operator "aten::lstm" consuming a PackedSequence input can only be supported when its corresponding PackedSequence output is unused or unpacked using "aten::_pad_packed_input". Found usage by "aten::max" **Resolution**: To avoid this error, the ``packed_result`` should be padded prior to being used in the :func:`~torch.max` from the network by using |pad_packed_sequence|. .. |LSTM| replace:: :class:`~torch.nn.LSTM` .. |PackedSequence| replace:: :class:`~torch.nn.utils.rnn.PackedSequence` .. |pack_padded_sequence| replace:: :func:`~torch.nn.utils.rnn.pack_padded_sequence` .. |pad_packed_sequence| replace:: :func:`~torch.nn.utils.rnn.pad_packed_sequence` ================================================ FILE: archive/torch-neuron/index.rst ================================================ .. _torch-neuron-main: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch Neuron (torch-neuron) — Archived ========================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer actively developed. For new workloads, use TorchNeuron Native or torch-neuronx. See :doc:`/frameworks/torch/index` for current PyTorch support. PyTorch Neuron (``torch-neuron``) was the original PyTorch integration for AWS Inferentia (Inf1) instances. This package supported inference workloads on NeuronCores v1 architecture. .. contents:: Table of contents :local: :depth: 2 API Reference ------------- .. toctree:: :maxdepth: 1 api-reference-guide-torch-neuron api-compilation-python-api api-core-placement api-torch-neuron-dataparallel-api Developer Guide --------------- .. toctree:: :maxdepth: 1 developer-guide-torch-neuron troubleshooting-guide Tutorials --------- .. toctree:: :maxdepth: 1 tutorials/tutorials-inference-torch-neuron Setup ----- .. toctree:: :maxdepth: 1 setup/pytorch-install setup/pytorch-update Misc ---- .. toctree:: :maxdepth: 1 additional-examples-inference-torch-neuron misc-inference-torch-neuron ================================================ FILE: archive/torch-neuron/inference-torch-neuron.rst ================================================ .. _inference-torch-neuron: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-13 Inference with ``torch-neuron`` (Inf1) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer actively developed. For new workloads, use TorchNeuron Native or torch-neuronx. See :doc:`/frameworks/torch/index` for current PyTorch support. .. toctree:: :maxdepth: 1 :hidden: Tutorials Additional Examples API Reference Guide Developer Guide Misc .. card:: Setup (``torch-neuron``) :link: setup-torch-neuron :link-type: ref :class-body: sphinx-design-class-title-small .. dropdown:: Tutorials (``torch-neuron``) :class-title: sphinx-design-class-title-small :animate: fade-in :name: torch-neuronx-training-tutorials .. include:: /archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt .. dropdown:: Additional Examples (``torch-neuron``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /archive/torch-neuron/additional-examples-inference-torch-neuron.txt .. dropdown:: API Reference Guide (``torch-neuron``) :class-title: sphinx-design-class-title-small :animate: fade-in .. include:: /archive/torch-neuron/api-reference-guide-torch-neuron.txt .. dropdown:: Developer Guide (``torch-neuron``) :class-title: sphinx-design-class-title-small :animate: fade-in .. include:: /archive/torch-neuron/developer-guide-torch-neuron.txt .. dropdown:: Misc (``torch-neuron``) :class-title: sphinx-design-class-title-small :animate: fade-in * :ref:`neuron-cc-ops-pytorch` * :ref:`pytorch-neuron-inference-troubleshooting` * :ref:`pytorch-neuron-rn` ================================================ FILE: archive/torch-neuron/misc-inference-torch-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Misc (``torch-neuron``) ======================= .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: /release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch /archive/torch-neuron/troubleshooting-guide /release-notes/components/pytorch .. include:: /archive/torch-neuron/misc-inference-torch-neuron.txt ================================================ FILE: archive/torch-neuron/misc-inference-torch-neuron.txt ================================================ * :ref:`neuron-cc-ops-pytorch` * :ref:`pytorch-neuron-inference-troubleshooting` * :ref:`pytorch-neuron-rn` ================================================ FILE: archive/torch-neuron/placement.py ================================================ """ .. warning:: The following functionality is beta and **will not be supported** in future releases of the Neuron SDK. This module serves only as a preview for future functionality. In future releases, equivalent functionality may be moved directly to the :code:`torch_neuron` module and will no longer be available in the :code:`torch_neuron.experimental` module. Functions which enable placement of :class:`torch.jit.ScriptModule` to specific NeuronCores. Two sets of functions are provided which can be used interchangeably but have different performance characteristics and advantages: - The :func:`~torch_neuron.experimental.multicore_context` & :func:`~torch_neuron.experimental.neuron_cores_context` functions are context managers that allow a model to be placed on a given NeuronCore at :func:`torch.jit.load` time. These functions are the most efficient way of loading a model since the model is loaded directly to a NeuronCore. The alternative functions described below require that a model is unloaded from one core and then reloaded to another. - The :func:`~torch_neuron.experimental.set_multicore` & :func:`~torch_neuron.experimental.set_neuron_cores` functions allow a model that has already been loaded to a NeuronCore to be moved to a different NeuronCore. This functionality is less efficient than directly loading a model to a NeuronCore within a context manager but allows device placement to be fully dynamic at runtime. This is analogous to the :meth:`torch.nn.Module.to` function for device placement. .. important:: A prerequisite to enable placement functionality is that the loaded :class:`torch.jit.ScriptModule` has already been compiled with the :func:`torch_neuron.trace` API. Attempting to place a regular :class:`torch.nn.Module` onto a NeuronCore prior to compilation will do nothing. """ import contextlib def set_neuron_cores(trace: 'torch.jit.ScriptModule', start_nc: int=-1, nc_count: int=-1): """ Set the NeuronCore start/count for all Neuron subgraphs in a torch Module. This will unload the model from an existing NeuronCore if it is already loaded. *Requires Torch 1.8+* Arguments: trace: A torch module which contains one or more Neuron subgraphs. start_nc: The starting NeuronCore index where the Module is placed. The value ``-1`` automatically loads to the optimal NeuronCore (least used). Note that this index is always relative to NeuronCores visible to this process. nc_count: The number of NeuronCores to use. The value ``-1`` will load a model to exactly the number of cores required by that model (1 for most models, >1 when using NeuronCore Pipeline). If ``nc_count`` is greater than the number of NeuronCores required by the model, the model will be replicated across multiple NeuronCores. ``(replications = floor(nc_count / cores_per_model))`` Raises: RuntimeError: If the Neuron runtime cannot be initialized. ValueError: If the ``nc_count`` is an invalid number of NeuronCores. Examples: *Single Load*: Move a model to the first visible NeuronCore after loading. >>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=0, nc_count=1) >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 *Multiple Core Replication*: Replicate a model to 2 NeuronCores after loading. This allows a single :class:`torch.jit.ScriptModule` to use multiple NeuronCores by running round-robin executions. >>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model, start_nc=2, nc_count=2) >>> model(example) # Executes on NeuronCore 2 >>> model(example) # Executes on NeuronCore 3 >>> model(example) # Executes on NeuronCore 2 *Multiple Model Load*: Move and pin 2 models to separate NeuronCores. This causes each :class:`torch.jit.ScriptModule` to always execute on a specific NeuronCore. >>> model1 = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model1, start_nc=2) >>> model2 = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_neuron_cores(model2, start_nc=0) >>> model1(example) # Executes on NeuronCore 2 >>> model1(example) # Executes on NeuronCore 2 >>> model2(example) # Executes on NeuronCore 0 >>> model2(example) # Executes on NeuronCore 0 """ def set_multicore(trace: 'torch.jit.ScriptModule'): """ Loads all Neuron subgraphs in a torch Module to all visible NeuronCores. This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule` to multiple NeuronCores without requiring multiple calls to :func:`torch.jit.load`. This allows a single :class:`torch.jit.ScriptModule` to use multiple NeuronCores for concurrent threadsafe inferences. Executions use a round-robin strategy to distribute across NeuronCores. This will unload the model from an existing NeuronCore if it is already loaded. *Requires Torch 1.8+* Arguments: trace: A torch module which contains one or more Neuron subgraphs. Raises: RuntimeError: If the Neuron runtime cannot be initialized. Examples: *Multiple Core Replication*: Move a model across all visible NeuronCores after loading. This allows a single :class:`torch.jit.ScriptModule` to use all NeuronCores by running round-robin executions. >>> model = torch.jit.load('example_neuron_model.pt') >>> torch_neuron.experimental.set_multicore(model) >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 1 >>> model(example) # Executes on NeuronCore 2 """ @contextlib.contextmanager def neuron_cores_context(start_nc: int=-1, nc_count: int=-1): """ A context which sets the NeuronCore start/count for all Neuron subgraphs. Any calls to :func:`torch.jit.load` will cause any underlying Neuron subgraphs to load to the specified NeuronCores within this context. This context manager only needs to be used during the model load. After loading, inferences do not need to occur in this context in order to use the correct NeuronCores. Note that this context is *not* threadsafe. Using multiple core placement contexts from multiple threads may not correctly place models. Arguments: start_nc: The starting NeuronCore index where the Module is placed. The value ``-1`` automatically loads to the optimal NeuronCore (least used). Note that this index is always relative to NeuronCores visible to this process. nc_count: The number of NeuronCores to use. The value ``-1`` will load a model to exactly the number of cores required by that model (1 for most models, >1 when using NeuronCore Pipeline). If ``nc_count`` is greater than the number of NeuronCores required by the model, the model will be replicated across multiple NeuronCores. ``(replications = floor(nc_count / cores_per_model))`` Raises: RuntimeError: If the Neuron runtime cannot be initialized. ValueError: If the ``nc_count`` is an invalid number of NeuronCores. Examples: *Single Load*: Directly load a model from disk to the first visible NeuronCore. >>> with torch_neuron.experimental.neuron_cores_context(start_nc=0, nc_count=1): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 0 *Multiple Core Replication*: Directly load a model from disk to 2 NeuronCores. This allows a single :class:`torch.jit.ScriptModule` to use multiple NeuronCores by running round-robin executions. >>> with torch_neuron.experimental.neuron_cores_context(start_nc=2, nc_count=2): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 2 >>> model(example) # Executes on NeuronCore 3 >>> model(example) # Executes on NeuronCore 2 *Multiple Model Load*: Directly load 2 models from disk and pin them to separate NeuronCores. This causes each :class:`torch.jit.ScriptModule` to always execute on a specific NeuronCore. >>> with torch_neuron.experimental.neuron_cores_context(start_nc=2): >>> model1 = torch.jit.load('example_neuron_model.pt') >>> with torch_neuron.experimental.neuron_cores_context(start_nc=0): >>> model2 = torch.jit.load('example_neuron_model.pt') >>> model1(example) # Executes on NeuronCore 2 >>> model1(example) # Executes on NeuronCore 2 >>> model2(example) # Executes on NeuronCore 0 >>> model2(example) # Executes on NeuronCore 0 """ @contextlib.contextmanager def multicore_context(): """ A context which loads all Neuron subgraphs to all visible NeuronCores. This loads each Neuron subgraph within a :class:`torch.jit.ScriptModule` to multiple NeuronCores without requiring multiple calls to :func:`torch.jit.load`. This allows a single :class:`torch.jit.ScriptModule` to use multiple NeuronCores for concurrent threadsafe inferences. Executions use a round-robin strategy to distribute across NeuronCores. Any calls to :func:`torch.jit.load` will cause any underlying Neuron subgraphs to load to the specified NeuronCores within this context. This context manager only needs to be used during the model load. After loading, inferences do not need to occur in this context in order to use the correct NeuronCores. Note that this context is *not* threadsafe. Using multiple core placement contexts from multiple threads may not correctly place models. Raises: RuntimeError: If the Neuron runtime cannot be initialized. Examples: *Multiple Core Replication*: Directly load a model to all visible NeuronCores. This allows a single :class:`torch.jit.ScriptModule` to use all NeuronCores by running round-robin executions. >>> with torch_neuron.experimental.multicore_context(): >>> model = torch.jit.load('example_neuron_model.pt') >>> model(example) # Executes on NeuronCore 0 >>> model(example) # Executes on NeuronCore 1 >>> model(example) # Executes on NeuronCore 2 """ ================================================ FILE: archive/torch-neuron/setup/index.rst ================================================ .. _setup-torch-neuron-archived: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Setup Guide for Inf1 ==================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 Fresh install Update to latest release Install previous releases /archive/torch-neuron/setup/pytorch-install-cxx11 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.14.2-pytorch-install.rst ================================================ .. _install-neuron-1.14.2-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.14.2) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.14.2 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.15.0-pytorch-install.rst ================================================ .. _install-neuron-1.15.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.15.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.0 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.15.1-pytorch-install.rst ================================================ .. _install-neuron-1.15.1-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.15.1) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.1 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.15.2-pytorch-install.rst ================================================ .. _install-neuron-1.15.2-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.15.2) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.15.2 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.16.1-pytorch-install.rst ================================================ .. _install-neuron-1.16.1-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.16.1) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.1 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.16.2-pytorch-install.rst ================================================ .. _install-neuron-1.16.2-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.16.2) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.2 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.2 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.16.3-pytorch-install.rst ================================================ .. _install-neuron-1.16.3-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.16.3) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.16.3 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.17.2-pytorch-install.rst ================================================ .. _install-neuron-1.17.2-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.17.2) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.17.2 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.18.0-pytorch-install.rst ================================================ .. _install-neuron-1.18.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.18.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.10.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.7.1 .. tab-item:: PyTorch 1.5.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.18.0 --framework-version=pytorch-1.5.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-1.19.0-pytorch-install.rst ================================================ .. _install-neuron-1.19.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 1.19.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=1.19.0 --framework-version=pytorch-1.7.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-2.3.0-pytorch-install.rst ================================================ .. _install-neuron-2.3.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 2.3.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.3.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.3.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.3.0 --framework-version=pytorch-1.7.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-2.4.0-pytorch-install.rst ================================================ .. _install-neuron-2.4.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 2.4.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.4.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.4.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.4.0 --framework-version=pytorch-1.7.1 ================================================ FILE: archive/torch-neuron/setup/prev-releases/neuron-2.5.0-pytorch-install.rst ================================================ .. _install-neuron-2.5.0-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (Neuron 2.5.0) ====================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=develop --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=compile --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.11.0 .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.10.2 .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.9.1 .. tab-item:: PyTorch 1.8.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.8.1 .. tab-item:: PyTorch 1.7.1 .. tab-set:: .. tab-item:: Ubuntu AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux AMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=non-dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Ubuntu DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=ubuntu --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 .. tab-item:: Amazon Linux DLAMI .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/neuronsetuphelper.py --file src/helperscripts/neuron-releases-manifest.json --install pytorch --mode=deploy --ami=dlami --os=amazonlinux --neuron-version=2.5.0 --framework-version=pytorch-1.7.1 ================================================ FILE: archive/torch-neuron/setup/pytorch-install-cxx11.rst ================================================ .. _pytorch-install-cxx11: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install with support for cxx11 ABI ================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. warning:: The intended user of this guide is using a custom built version of ``torch`` or compiling a non-python application which must be built using the cxx11 ABI. *Most applications do not require this specialized distribution.* For regular installation instructions see: :ref:`Fresh install ` The standard ``torch-neuron`` packages (which are normally installed according to the :ref:`Fresh install ` guide) are compiled with the pre-cxx11 ABI and linked against the pre-cxx11 ``libtorch``. These compilation options ensure that the ``torch-neuron`` ABI matches the *publicly* released version of the ``torch`` package that is installed from the default PyPI index. To support applications with specific ABI requirements, Neuron distributes packages which are linked against the cxx11 version of ``libtorch``. These ``torch-neuron`` packages are built using the ``-D_GLIBCXX_USE_CXX11_ABI=1`` compilation flag. The only difference between these packages and the standard packages is the torch plugin library contained within the package. This is the ``libtorchneuron.so`` library located in the ``torch_neuron/lib/`` package directory. All other libraries and python files within the packages are identical. This means that these cxx11-compatible packages are drop-in replacements in environments that are incompatible with the standard releases of ``torch-neuron``. Behavior is identical whether compiling models or executing inferences. Installation ^^^^^^^^^^^^ All versions of the library are available to download from the following pip index: :: https://pip.repos.neuron.amazonaws.com/cxx11 To install a wheel, it is recommended to use the ``--no-deps`` flag since versions of ``torch`` compiled using the cxx11 ABI are not distributed on this index. :: pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuron --no-deps Specific versions of ``torch-neuron`` with cxx11 ABI support can be installed just like standard versions of ``torch-neuron``. :: pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 "torch-neuron>=1.8" --no-deps pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 "torch-neuron==1.9.1" --no-deps pip install --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 "torch-neuron<1.10" --no-deps .. important:: This pip index does not include a distribution of ``torch`` compiled with the new cxx11 ABI. The intent of this index is *only* to provide Neuron SDK wheels. The version of ``torch`` that is distributed on the default PyPI index is compiled with the old pre-cxx11 ABI. If a cxx11 ``torch-neuron`` package is installed *with* dependencies using the *default* PyPI index, then the installed version of ``torch`` will be using the pre-cxx11 ABI and ``torch-neuron`` will be using the cxx11 ABI. This ABI mismatch will lead to errors in both python usage and at link time for non-python applications. FAQ ^^^ When should I use a cxx11 torch-neuron wheel? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Distributions compiled with the new cxx11 ABI should only be used in the following cases: 1. You have built your own version of ``torch`` which uses the new cxx11 ABI and need a corresponding version of ``torch-neuron`` that is compatible. 2. You are compiling an application against a ``libtorch`` which uses the cxx11 ABI and would like to include ``libtorchneuron.so`` as well. Torch distributes these cxx11 ``libtorch`` libraries with a ``libtorch-cxx11`` prefix. Example: :: https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.10.2%2Bcpu.zip Can I download a library/header zip file similar to the torch distribution? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Currently ``torch-neuron`` does not distribute a bundled library ``.zip`` with only library/header files. The recommended alternative when compiling ``libtorchneuron.so`` into a non-python application is to install the ``torch-neuron`` wheel using ``pip`` according to the installation instructions. Then use the ``libtorchneuron.so`` library from within the python ``site-packages`` directory. A second alternative to isolate the package contents from a python environment is to download the wheel and unpack the contents: .. code:: bash pip download --extra-index-url=https://pip.repos.neuron.amazonaws.com/cxx11 torch-neuron --no-deps wheel unpack torch_neuron-*.whl If the exact version of the ``torch-neuron`` package is known and no python/pip is available in the build environment, an alternative to is fetch the package file directly and ``unzip`` the wheel: .. code:: wget https://pip.repos.neuron.amazonaws.com/cxx11/torch-neuron/torch_neuron--py3-none-any.whl unzip torch_neuron--py3-none-any.whl .. _pytorch-cxx11-versioning: How can I know which ABI torch-neuron is using? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Packages which use the pre-cxx11 ABI have no local identifier and use the following version scheme: :: . Packages which use the cxx11 ABI have a ``+cxx11`` local identifier and use following version scheme: :: .+cxx11 This allows the ABI to be validated in the by inspecting the local identifier (or version suffix). Example: :: 1.8.1.0.0.0.0+cxx11 1.9.1.0.0.0.0+cxx11 1.10.2.0.0.0.0+cxx11 How can I know which ABI torch is using? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``torch`` python package provides an API at the that allows you to check if the underlying ``libtorch`` was compiled with the cxx11 ABI: .. code:: python import torch torch.compiled_with_cxx11_abi() # True/False Currently ``torch-neuron`` does not have an equivalent API. If the cxx11 ABI was used, it will be visible in the version string (See :ref:`pytorch-cxx11-versioning`). Troubleshooting ^^^^^^^^^^^^^^^ What python errors could I see if I mix ABI versions? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using a version of ``torch`` compiled with the cxx11 ABI will trigger an error in the python interpreter when importing a version of ``torch-neuron`` using the old (pre-cxx11) ABI from the standard index. This will manifest as an error when the ``import torch_neuron`` statement is executed. :: Traceback (most recent call last): File "/python3.7/site-packages/torch_neuron/__init__.py", line 64, in _register_extension() File "/python3.7/site-packages/torch_neuron/__init__.py", line 60, in _register_extension torch.ops.load_library(neuron_op_filename) File "/python3.7/site-packages/torch/_ops.py", line 110, in load_library ctypes.CDLL(path) File "/python3.7/ctypes/__init__.py", line 364, in __init__ self._handle = _dlopen(self._name, mode) OSError: /python3.7/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_ Similarly if using the standard pre-cxx11 version of ``torch`` with the cxx11 version of ``torch-neuron`` will also cause an error upon import. :: Traceback (most recent call last): File "/python3.7/site-packages/torch_neuron/__init__.py", line 79, in _register_extension() File "/python3.7/site-packages/torch_neuron/__init__.py", line 75, in _register_extension torch.ops.load_library(neuron_op_filename) File "/python3.7/site-packages/torch/_ops.py", line 110, in load_library ctypes.CDLL(path) File "/python3.7/ctypes/__init__.py", line 364, in __init__ self._handle = _dlopen(self._name, mode) OSError: /python3.7/site-packages/torch_neuron/lib/libtorchneuron.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE In either of these cases, the remedy is to ensure that the ABI of the ``torch`` distribution matches the ABI of the ``torch-neuron`` distribution. What compiler/linking errors could I see if I mix ABI versions? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you link an application which uses the old (pre-cxx11) ABI ``libtorchneuron.so`` with a cxx11 version of ``torch``, this will trigger a link error. :: libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&)' libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::string)' libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)' libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::string const&) const' libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::string)' libtorchneuron.so: undefined reference to `c10::DeviceTypeName(c10::DeviceType, bool)' libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::string const&)' libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData()' libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::string const&, bool)' libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::string const&)' libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::string const&)' libtorchneuron.so: undefined reference to `c10::Error::Error(std::string, std::string, void const*)' libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::string&&, std::string&&, c10::ArrayRef, c10::ArrayRef)' libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&)' libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString(c10::FunctionSchema const&)' Similarly, an error will also occur in the opposite scenario where the cxx11 ``libtorchneuron.so`` library is used with the pre-cxx11 ``libtorch``: :: libtorchneuron.so: undefined reference to `c10::ivalue::ConstantString::create(std::__cxx11::basic_string, std::allocator >)' libtorchneuron.so: undefined reference to `torch::jit::parseSchemaOrName(std::__cxx11::basic_string, std::allocator > const&)' libtorchneuron.so: undefined reference to `c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >)' libtorchneuron.so: undefined reference to `c10::Error::Error(std::__cxx11::basic_string, std::allocator >, std::__cxx11::basic_string, std::allocator >, void const*)' libtorchneuron.so: undefined reference to `torch::jit::canonicalSchemaString[abi:cxx11](c10::FunctionSchema const&)' libtorchneuron.so: undefined reference to `torch::detail::class_base::class_base(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator >, std::type_info const&, std::type_info const&)' libtorchneuron.so: undefined reference to `c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string, std::allocator > const&)' libtorchneuron.so: undefined reference to `c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&)' libtorchneuron.so: undefined reference to `c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string, std::allocator >&&, std::__cxx11::basic_string, std::allocator >&&, c10::ArrayRef, c10::ArrayRef)' libtorchneuron.so: undefined reference to `torch::jit::parseSchema(std::__cxx11::basic_string, std::allocator > const&)' libtorchneuron.so: undefined reference to `c10::DeviceTypeName[abi:cxx11](c10::DeviceType, bool)' libtorchneuron.so: undefined reference to `c10::Symbol::fromQualString(std::__cxx11::basic_string, std::allocator > const&)' libtorchneuron.so: undefined reference to `unsigned short caffe2::TypeMeta::_typeMetaData, std::allocator > >()' libtorchneuron.so: undefined reference to `c10::ClassType::getMethod(std::__cxx11::basic_string, std::allocator > const&) const' libtorchneuron.so: undefined reference to `c10::Warning::warn(c10::SourceLocation const&, std::__cxx11::basic_string, std::allocator > const&, bool)' In either of these cases, the remedy is to ensure that the ABI of the ``libtorch`` distribution matches the ABI of the ``libtorchneuron.so`` distribution. The ``torch`` ABI must match the ``torch-neuron`` ABI or an error will occur. ================================================ FILE: archive/torch-neuron/setup/pytorch-install-prev-al2.rst ================================================ .. _pytorch-neuron-install-prev-al2: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install Previous PyTorch Neuron Releases for Amazon Linux (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 This section will assist you in installing previous Neuron releases. .. tab-set:: .. tab-item:: Neuron 2.18.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.18.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.17.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.17.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.16.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.16.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-install-prev-al2023.rst ================================================ .. _pytorch-neuron-install-prev-al2023: .. Install previous PyTorch Neuron releases for Amazon Linux 2023 - archived Use the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-install-prev-u20.rst ================================================ .. _pytorch-neuron-install-prev-u20: .. Install previous PyTorch Neuron releases for Ubuntu 20.04 - archived Use the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-install-prev-u22.rst ================================================ .. _pytorch-neuron-install-prev-u22: .. Install previous PyTorch Neuron releases for Ubuntu 22.04 - archived Use the tabs below to install a specific previous Neuron SDK release. Select the Neuron version you need. .. tab-set:: .. tab-item:: Neuron 2.21.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.21.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.20.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.20.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami .. tab-item:: Neuron 2.19.0 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --neuron-version=2.19.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-install-prev.rst ================================================ .. _install-prev-neuron-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install previous PyTorch Neuron releases (``torch-neuron``) ============================================================ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. toctree:: :maxdepth: 1 Neuron 2.5.0 Neuron 2.4.0 Neuron 2.3.0 Neuron 1.19.0 Neuron 1.18.0 Neuron 1.17.2 Neuron 1.16.3 Neuron 1.16.2 Neuron 1.16.1 Neuron 1.15.2 Neuron 1.15.1 Neuron 1.15.0 Neuron 1.14.2 ================================================ FILE: archive/torch-neuron/setup/pytorch-install.rst ================================================ .. _install-neuron-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Install PyTorch Neuron (``torch-neuron``) ========================================= .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.12.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.12.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.11.0 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.10.2 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.10.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami .. tab-item:: PyTorch 1.9.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.9.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-update-al2-dlami.rst ================================================ .. _pytorch-neuron-al2-update: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=dlami-framework ================================================ FILE: archive/torch-neuron/setup/pytorch-update-al2023.rst ================================================ .. _pytorch-neuron-al2023-update: .. Update PyTorch Neuron (torch-neuron) on Amazon Linux 2023 - archived If you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands. .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-update-u20-dlami.rst ================================================ .. _pytorch-neuron-u20-update: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest PyTorch Neuron (``torch-neuron``) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. If you already have a previous Neuron release installed, this section provide links that will assist you to update to latest Neuron release. .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=dlami-framework ================================================ FILE: archive/torch-neuron/setup/pytorch-update-u20.rst ================================================ .. _pytorch-neuron-u20-update: .. Update PyTorch Neuron (torch-neuron) on Ubuntu 20.04 - archived If you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands. .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-update-u22.rst ================================================ .. _pytorch-neuron-u22-update: .. Update PyTorch Neuron (torch-neuron) on Ubuntu 22.04 - archived If you already have a previous Neuron release installed, select the PyTorch version tab below to get the update commands. .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. include:: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/setup/pytorch-update.rst ================================================ .. _update-neuron-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Update to latest PyTorch Neuron (``torch-neuron``) ================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. include:: /setup/install-templates/inf1/note-setup-cntr.rst .. contents:: Table of contents :local: :depth: 2 Develop on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/develop_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Compile on compute instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/compile_mode.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=compile --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami Deploy on AWS ML accelerator instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/deploy_mode.rst .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. tab-set:: .. tab-item:: PyTorch 1.13.1 .. tab-set:: .. tab-item:: Ubuntu 20 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=non-dlami .. tab-item:: Amazon Linux 2 DLAMI Base .. include :: /setup/install-templates/inf1/note-setup-general.rst .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --mode=deploy --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=non-dlami ================================================ FILE: archive/torch-neuron/torch-neuron-dataparallel-example-default.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The default DataParallel use mode will replicate the model on all available NeuronCores in the current process. The inputs will be split on ``dim=0``. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image) # Create the DataParallel module model_parallel = torch.neuron.DataParallel(model_neuron) # Create a batched input batch_size = 5 image_batched = torch.rand([batch_size, 3, 224, 224]) # Run inference with a batched input output = model_parallel(image_batched) ================================================ FILE: archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. In this example we run DataParallel inference using four NeuronCores and ``dim = 2``. Because ``dim != 0``, dynamic batching is not enabled. Consequently, the DataParallel inference-time batch size must be four times the compile-time batch size. DataParallel will generate a warning that dynamic batching is disabled because ``dim != 0``. .. code-block:: python import torch import torch_neuron # Create an example model class Model(torch.nn.Module): def __init__(self): super().__init__() self.conv = torch.nn.Conv2d(3, 3, 3) def forward(self, x): return self.conv(x) + 1 model = Model() model.eval() # Compile with an example input image = torch.rand([1, 3, 8, 8]) model_neuron = torch.neuron.trace(model, image) # Create the DataParallel module using 4 NeuronCores and dim = 2 model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2, 3], dim=2) # Create a batched input # Note that image_batched.shape[dim] / len(device_ids) == image.shape[dim] batch_size = 4 * 8 image_batched = torch.rand([1, 3, batch_size, 8]) # Run inference with a batched input output = model_parallel(image_batched) ================================================ FILE: archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. In the following example, we use :func:`torch.neuron.DataParallel.disable_dynamic_batching` to disable dynamic batching. We provide an example of a batch size that will not work when dynamic batching is disabled as well as an example of a batch size that does work when dynamic batching is disabled. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image) # Create the DataParallel module and use 4 NeuronCores model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2, 3], dim=0) # Disable dynamic batching model_parallel.disable_dynamic_batching() # Create a batched input (this won't work) batch_size = 8 image_batched = torch.rand([batch_size, 3, 224, 224]) # This will fail because dynamic batching is disabled and # image_batched.shape[dim] / len(device_ids) != image.shape[dim] # output = model_parallel(image_batched) # Create a batched input (this will work) batch_size = 4 image_batched = torch.rand([batch_size, 3, 224, 224]) # This will work because # image_batched.shape[dim] / len(device_ids) == image.shape[dim] output = model_parallel(image_batched) ================================================ FILE: archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. In the following example, we use the :func:`torch.neuron.DataParallel` module to run inference using several different batch sizes without recompiling the Neuron model. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image) # Create the DataParallel module model_parallel = torch.neuron.DataParallel(model_neuron) # Create batched inputs and run inference on the same model batch_sizes = [2, 3, 4, 5, 6] for batch_size in batch_sizes: image_batched = torch.rand([batch_size, 3, 224, 224]) # Run inference with a batched input output = model_parallel(image_batched) ================================================ FILE: archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. The following example uses the ``device_ids`` argument to use the first three NeuronCores for DataParallel inference. .. code-block:: python import torch import torch_neuron from torchvision import models # Load the model and set it to evaluation mode model = models.resnet50(pretrained=True) model.eval() # Compile with an example input image = torch.rand([1, 3, 224, 224]) model_neuron = torch.neuron.trace(model, image) # Create the DataParallel module, run on the first three NeuronCores # Equivalent to model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=[0, 1, 2]) model_parallel = torch.neuron.DataParallel(model_neuron, device_ids=['nc:0', 'nc:1', 'nc:2']) # Create a batched input batch_size = 5 image_batched = torch.rand([batch_size, 3, 224, 224]) # Run inference with a batched input output = model_parallel(image_batched) ================================================ FILE: archive/torch-neuron/troubleshooting-guide.rst ================================================ .. _pytorch-neuron-inference-troubleshooting: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Troubleshooting Guide for PyTorch Neuron (``torch-neuron``) =========================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. Patching PyTorch version 1.13 for CVEs -------------------------------------- PyTorch version 1.13 has the following CVEs: - CVE-2025-32434 - CVE-2024-31580 - CVE-2024-31583 To patch PyTorch version 1.13, run the following on a CPU instance with Ubuntu 22 AMI (it takes 30 minutes on a c5.4xlarge): :: git clone --recursive https://github.com/pytorch/pytorch -b v1.13.1 cd pytorch git cherry-pick b5c3a17c2c207ebefcb85043f0cf94be9b2fef81 git cherry-pick 9c7071b0e324f9fb68ab881283d6b8d388a4bcd2 wget https://github.com/user-attachments/files/22013116/patch_v113.txt git apply patch_v113.txt To build the pip wheel, see `build steps `_. A condensed version is provided below. Install Miniconda by following `installation steps `_ and run the following commands: :: source ~/miniconda3/bin/activate conda create --name conda_py39 python=3.9 conda activate conda_py39 conda install astunparse numpy==1.19.5 ninja pyyaml setuptools cmake cffi typing_extensions future six requests dataclasses conda install mkl mkl-include# CUDA only: Add LAPACK support for the GPU if needed conda install -c pytorch magma-cuda110 # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo sudo apt install cmake g++ export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"} PYTORCH_BUILD_VERSION=1.13.2 PYTORCH_BUILD_NUMBER=1 python setup.py bdist_wheel # the PyTorch pip wheel will be in dist directory General Torch-Neuron issues --------------------------- If you see an error about "Unknown builtin op: neuron::forward_1" like below, please ensure that import line "import torch_neuron" (to register the Neuron custom operation) is in the inference script before using torch.jit.load. :: Unknown builtin op: neuron::forward_1. Could not find any similar ops to neuron::forward_1. This op may not exist or may not be currently supported in TorchScript. TorchVision related issues -------------------------- If you encounter an error like below, it is because latest torchvision version >= 0.7 is not compatible with Torch-Neuron 1.5.1. Please downgrade torchvision to version 0.6.1: :: E AttributeError: module 'torch.jit' has no attribute '_script_if_tracing' 2GB protobuf limit related issues --------------------------------- If you encounter an error like below, it is because the model size is larger than 2GB. To compile such large models, use the :ref:`separate_weights=True ` flag. Note, ensure that you have the latest version of compiler installed to support this flag. You can upgrade neuron-cc using :code:`python3 -m pip install neuron-cc[tensorflow] -U --force --extra-index-url=https://pip.repos.neuron.amazonaws.com` :: E google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef' torch.jit.trace issues ---------------------- The :doc:`Trace API ` uses the PyTorch :func:`torch.jit.trace` function to generate :class:`~torch.jit.ScriptModule` models for execution on Inferentia. Due to that, to execute your PyTorch model on Inferentia it must be torch-jit-traceable, otherwise you need to make sure your model is torch-jit-traceable. You can try modifying your underlying PyTorch model code to make it traceable. If it's not possible to change your model code, you can :ref:`write a wrapper around your model ` that makes it torch-jit-traceable to compile it for Inferentia. Please visit :func:`torch.jit.trace` to review the properties that a model must have to be torch-jit-traceable. The PyTorch-Neuron trace API :func:`torch_neuron.trace` accepts :code:`**kwargs` for :func:`torch.jit.trace`. For example, you can use the :code:`strict=False` flag to :ref:`compile models with dictionary outputs `. .. _wrapping-non-traceable-models: Compiling models with outputs that are not torch-jit-traceable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To enable compilation of models with non torch-jit-traceable outputs, you can use a technique that involves writing a wrapper that converts the model's output into a form that is torch-jit-traceable. You can then compile the wrapped model for Inferentia using :func:`torch_neuron.trace`. The following example uses a wrapper to compile a model with non torch-jit-traceable outputs. This model cannot be compiled for Inferentia in its current form because it outputs a list of tuples and tensors, which is not torch-jit-traceable. .. code-block:: python import torch import torch_neuron import torch.nn as nn class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): a = self.conv(x) + 1 b = self.conv(x) + 2 c = self.conv(x) + 3 # An output that is a list of tuples and tensors is not torch-traceable return [(a, b), c] model = Model() model.eval() inputs = torch.rand(1, 1, 3, 3) # Try to compile the model model_neuron = torch.neuron.trace(model, inputs) # ERROR: This cannot be traced, we must change the output format To compile this model for Inferentia, we can write a wrapper around the model to convert its outputs into a tuple of tensors, which is torch-jit-traceable. .. code-block:: python class NeuronCompatibilityWrapper(nn.Module): def __init__(self): super(NeuronCompatibilityWrapper, self).__init__() self.model = Model() def forward(self, x): out = self.model(x) # An output that is a tuple of tuples and tensors is torch-jit-traceable return tuple(out) Now, we can successfully compile the model for Inferentia using the :code:`NeuronCompatibilityWrapper` wrapper as follows: .. code-block:: python model = NeuronCompatibilityWrapper() model.eval() # Compile the traceable wrapped model model_neuron = torch.neuron.trace(model, inputs) If the model's outputs must be in the original form, a second wrapper can be used to transform the outputs after compilation for Inferentia. The following example uses the :code:`OutputFormatWrapper` wrapper to convert the compiled model's output back into the original form of a list of tuples and tensors. .. code-block:: python class OutputFormatWrapper(nn.Module): def __init__(self): super(OutputFormatWrapper, self).__init__() self.traceable_model = NeuronCompatibilityWrapper() def forward(self, x): out = self.traceable_model(x) # Return the output in the original format of Model() return list(out) model = OutputFormatWrapper() model.eval() # Compile the traceable wrapped model model.traceable_model = torch.neuron.trace(model.traceable_model, inputs) Compiling a submodule in a model that is not torch-jit-traceable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following example shows how to compile a submodule that is part of a non torch-jit-traceable model. In this example, the top-level model :code:`Outer` uses a dynamic flag, which is not torch-jit-traceable. However, the submodule :code:`Inner` is torch-jit-traceable and can be compiled for Inferentia. .. code-block:: python import torch import torch_neuron import torch.nn as nn class Inner(nn.Module) : def __init__(self): super().__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return self.conv(x) + 1 class Outer(nn.Module): def __init__(self): super().__init__() self.inner = Inner() def forward(self, x, add_offset: bool = False): base = self.inner(x) if add_offset: return base + 1 return base model = Outer() inputs = torch.rand(1, 1, 3, 3) # Compile the traceable wrapped submodule model.inner = torch.neuron.trace(model.inner, inputs) # TorchScript the model for serialization script = torch.jit.script(model) torch.jit.save(script, 'model.pt') loaded = torch.jit.load('model.pt') Alternatively, for usage scenarios in which the model configuration is static during inference, the dynamic flags can be hardcoded in a wrapper to make the model torch-jit-traceable and enable compiling the entire model for Inferentia. In this example, we assume the :code:`add_offset` flag is always :code:`True` during inference, so we can hardcode this conditional path in the :code:`Static` wrapper to remove the dynmaic behavior and compile the entire model for Inferentia. .. code-block:: python class Static(nn.Module): def __init__(self): super().__init__() self.outer = Outer() def forward(self, x): # hardcode `add_offset=True` output = self.outer(x, add_offset=True) return output model = Static() # We can now compile the entire model because `add_offset=True` is hardcoded in the Static wrapper model_neuron = torch.neuron.trace(model, inputs) ================================================ FILE: archive/torch-neuron/tutorials/neuroncore_pipeline_pytorch.rst ================================================ .. _pytorch-tutorials-neuroncore-pipeline-pytorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Using NeuronCore Pipeline with PyTorch Tutorial ================================================================ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of Contents :local: :depth: 2 Overview -------- In this tutorial we will benchmark latency of a Hugging Face Transformers model deployed in model pipeline paralle mode using the NeuronCore Pipeline feature. We will compare the results with the usual data parallel (multi-worker) deployment. We compile a pretrained BERT base model and run the benchmarking locally. To enable faster enviroment setup, We will run both compilation and deployment (inference) on an single inf1.6xlarge instance. You can take similar steps to recreate the benchmark on other instance sizes, such as inf1.xlarge. If you already have an Inf1 instance environment ready, this tutorial is availabe as a Jupyter notebook at :pytorch-neuron-src:`neuroncore_pipeline_pytorch.ipynb ` and instructions can be viewed at: .. toctree:: :maxdepth: 1 /src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb Instructions of how to setup the environment and run the tutorial are available in the next sections. .. _pytorch-neuroncore-pipeline-pytorch-env-setup: Setup The Environment --------------------- Launch an Inf1 instance by following the below steps, please make sure to choose an inf1.6xlarge instance. .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst .. _pytorch-neuroncore-pipeline-pytorch-run-tutorial: Run The Tutorial ---------------- After connecting to the instance from the terminal, clone the Neuron Github repository to the EC2 instance and then change the working directory to the tutorial directory: .. code:: git clone https://github.com/aws/aws-neuron-sdk.git cd aws-neuron-sdk/src/examples/pytorch The Jupyter notebook is available as a file with the name :pytorch-neuron-src:`neuroncore_pipeline_pytorch.ipynb `, you can either run the Jupyter notebook from a browser or run it as a script from terminal: * **Running tutorial from browser** * First setup and launch the Jupyter notebook on your local browser by following instructions at :ref:`Running Jupyter Notebook Browser` * Open the Jupyter notebook from the menu and follow the instructions You can also view the Jupyter notebook at: .. toctree:: :maxdepth: 1 /src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb .. _pytorch-neuroncore-pipeline-pytorch-cleanup-instances: Clean up your instance/s ------------------------ After you've finished with the instance/s that you created for this tutorial, you should clean up by terminating the instance/s, please follow instructions at `Clean up your instance `_. ================================================ FILE: archive/torch-neuron/tutorials/pytorch-tutorial-setup.rst ================================================ .. _pytorch-tutorial-setup: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch Tutorial Setup ====================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. #. Launch an Inf1.6xlarge Instance: .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst #. Set up a development environment: * Enable or install PyTorch-Neuron: :ref:`install-neuron-pytorch`. #. Run tutorial in Jupyter notebook: * Follow instruction at :ref:`Setup Jupyter notebook ` to: #. Start the Jupyter Notebook on the instance #. Run the Jupyter Notebook from your local browser * Connect to the instance from the terminal, clone the Neuron Github repository to the Inf1 instance and then change the working directory to the tutorial directory: .. code:: git clone https://github.com/aws/aws-neuron-sdk.git cd aws-neuron-sdk/src/examples/pytorch * Locate the tutorial notebook file (.ipynb file) under ``aws-neuron-sdk/src/examples/pytorch`` * From your local browser, open the tutorial notebook from the menu and follow the instructions. ================================================ FILE: archive/torch-neuron/tutorials/transformers-marianmt.rst ================================================ .. _pytorch-tutorials-marianmt: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 PyTorch HuggingFace MarianMT Tutorial ===================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of Contents :local: :depth: 2 Overview -------- In this tutorial you will compile and deploy the `HuggingFace MarianMT `_ model for sequence-to-seqeunce language translation on an Inf1 instance. To enable faster environment setup, you will run the tutorial on an inf1.6xlarge instance to enable both compilation and deployment (inference) on the same instance. In a production environment we encourage you to try different instance sizes to optimize to your specific deployment needs. If you have already launched an Inf1 instance and have Neuron pytorch DLAMI environment ready, tutorial is available as a Jupyter notebook at :pytorch-neuron-src:`transformers-marianmt.ipynb ` and instructions can be viewed at: .. toctree:: :maxdepth: 1 /src/examples/pytorch/transformers-marianmt.ipynb Instructions of how to setup Neuron pytorch environment and run the tutorial as a Jupyter notebook are available in the next sections. .. _pytorch-marianmt-env-setup: Setup The Environment --------------------- Launch an Inf1 instance by following the below steps, please make sure to choose an inf1.6xlarge instance. .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst .. _pytorch-marianmt-run-tutorial: Run The Tutorial ---------------- After connecting to the instance from the terminal, clone the Neuron Github repository to the EC2 instance and then change the working directory to the tutorial directory: .. code:: git clone https://github.com/aws/aws-neuron-sdk.git cd aws-neuron-sdk/src/examples/pytorch The Jupyter notebook is available as a file with the name :pytorch-neuron-src:`transformers-marianmt.ipynb ` that you can run from browser: * **Running tutorial from browser** * First setup and launch the Jupyter notebook on your local browser by following instructions at :ref:`Running Jupyter Notebook Browser` * Open the Jupyter notebook from the menu and follow the instructions You can also view the Jupyter notebook at: .. toctree:: :maxdepth: 1 /src/examples/pytorch/transformers-marianmt.ipynb .. _marianmt-cleanup-instances: Clean up your instance/s ------------------------ After you've finished with the instance/s that you created for this tutorial, you should clean up by terminating the instance/s, please follow instructions at `Clean up your instance `_. ================================================ FILE: archive/torch-neuron/tutorials/tutorial-libtorch.rst ================================================ .. _pytorch-tutorials-libtorch: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 LibTorch C++ Tutorial ========================= .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of Contents :local: :depth: 2 Overview -------- This tutorial demonstrates the use of `LibTorch `_ with Neuron, the SDK for Amazon Inf1, Inf2 and Trn1 instances. By the end of this tutorial, you will understand how to write a native C++ application that performs inference on EC2 Inf1, Inf2 and Trn1 instances. We will use an inf1.6xlarge and a pretrained BERT-Base model to determine if one sentence is a paraphrase of another. Verify that this tutorial is running in a virtual environement that was set up according to the `Torch-Neuronx Installation Guide ` or `Torch-Neuron Installation Guide ` Notes ----- The tutorial has been tested on Inf1, Inf2 and Trn1 instances on ubuntu instances. Run the tutorial ---------------- This tutorial is self contained. It produces similar output to :ref:`[html] ` :pytorch-neuron-src:`[notebook] `. Note: The tutorial will use about 8.5 GB of disk space. Ensure you have sufficient space before beginning. Right-click and copy :download:`this link address to the tutorial archive`. .. code:: bash wget tar xvf libtorch_demo.tar.gz Your directory tree should now look like this: :: libtorch_demo ├── bert_neuronx │ ├── compile.py │ └── detect_instance.py ├── clean.sh ├── core_count │ ├── build.sh │ └── main.cpp ├── example_app │ ├── build.sh │ ├── core_count.hpp │ ├── example_app.cpp │ ├── README.txt │ ├── utils.cpp │ └── utils.hpp ├── neuron.patch ├── run_tests.sh ├── setup.sh ├── tokenizer.json └── tokenizers_binding ├── build_python.sh ├── build.sh ├── remote_rust_tokenizer.h ├── run_python.sh ├── run.sh ├── tokenizer.json ├── tokenizer_test ├── tokenizer_test.cpp └── tokenizer_test.py This tutorial uses the `HuggingFace Tokenizers `_ library implemented in Rust. Install Cargo, the package manager for the Rust programming language. +----------------------------------+----------------------------------+ | Ubuntu | Amazon Linux 2023 | +----------------------------------+----------------------------------+ | .. code-block:: bash | .. code-block:: bash | | | | | sudo apt install -y cargo | sudo dnf install -y cargo | +----------------------------------+----------------------------------+ Run the setup script to download additional depdendencies and build the app. (This may take a few minutes to complete.) .. literalinclude:: tutorial_source_instructions/run_libtorch.sh :language: bash :lines: 6-7 :: ... + PATH_NEURON_LIB=/opt/aws/neuron/lib/ + g++ utils.cpp example_app.cpp -o ../example-app -O2 -D_GLIBCXX_USE_CXX11_ABI=0 -I../libtorch/include -L../tokenizers_binding/lib -L/opt/aws/neuron/lib/ -L../libtorch/lib -Wl,-rpath,libtorch/lib -Wl,-rpath,tokenizers_binding/lib -Wl,-rpath,/opt/aws/neuron/lib/ -ltokenizers -ltorchneuron -ltorch_cpu -lc10 -lpthread -lnrt ~/libtorch_demo Successfully completed setup .. _libtorch-benchmark: Benchmark --------- The setup script should have compiled and saved a PyTorch model compiled for neuron (bert_neuron_b6.pt). Run the provided sanity tests to ensure everything is working properly. .. literalinclude:: tutorial_source_instructions/run_libtorch.sh :language: bash :lines: 10 :: Running tokenization sanity checks. None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Tokenizing: 100%|██████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:00<00:00, 15021.69it/s] Python took 0.67 seconds. Sanity check passed. Begin 10000 timed tests. .......... End timed tests. C++ took 0.226 seconds. Tokenization sanity checks passed. Running end-to-end sanity check. The company HuggingFace is based in New York City HuggingFace's headquarters are situated in Manhattan not paraphrase: 10% paraphrase: 90% The company HuggingFace is based in New York City Apples are especially bad for your health not paraphrase: 94% paraphrase: 6% Sanity check passed. Finally, run the example app directly to benchmark the BERT model. .. note:: You can safely ignore the warning about ``None of PyTorch, Tensorflow >= 2.0, ...``. This occurs because the test runs in a small virtual environment that doesn't require the full frameworks. .. literalinclude:: tutorial_source_instructions/run_libtorch.sh :language: bash :lines: 13 :: Getting ready................ Benchmarking................ Completed 32000 operations in 43 seconds => 4465.12 pairs / second ==================== Summary information: ==================== Batch size = 6 Num neuron cores = 16 Num runs per neuron core = 2000 **Congratulations!** By now you should have successfully built and used a native C++ application with LibTorch. Troubleshooting --------------- * In the event of SIGBUS errors you may have insufficient disk space for the creation of temporary model files at runtime. Consider clearing space or mounting additional disk storage. * In the event of a neuron runtime failure, confirm that the Neuron kernel module is loaded using ``sudo modprobe neuron``. .. _libtorch-cleanup: ================================================ FILE: archive/torch-neuron/tutorials/tutorial-torchserve.rst ================================================ .. _pytorch-tutorials-torchserve: .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 BERT TorchServe Tutorial ======================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. contents:: Table of Contents :local: :depth: 2 Overview -------- This tutorial demonstrates the use of `TorchServe `_ with Neuron, the SDK for Amazon Inf1 instances. By the end of this tutorial, you will understand how TorchServe can be used to serve a model backed by EC2 Inf1 instances. We will use a pretrained BERT-Base model to determine if one sentence is a paraphrase of another. Verify that this tutorial is running in a virtual environement that was set up according to the `Torch-Neuronx Installation Guide ` or `Torch-Neuron Installation Guide ` .. _torchserve-compile: Run the tutorial ---------------- Open a terminal, log into your remote instance, and activate a Pytorch virtual environment setup (see the :ref:`Pytorch Installation Guide `). To complete this tutorial, you will need a compiled BERT model. If you have already completed the HuggingFace Pretrained BERT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` then you already have the necessary file. Otherwise, you can setup your environment as shown below and then run :download:`trace_bert_neuron.py ` to obtain a traced BERT model. You should now have a compiled ``bert_neuron_b6.pt`` file, which is required going forward. Open a shell on the instance you prepared earlier, create a new directory named ``torchserve``. Copy your compiled model from the previous tutorial into this new directory. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 4-6 :: bert_neuron_b6.pt Prepare a new Python virtual environment with the necessary Neuron and TorchServe components. Use a virtual environment to keep (most of) the various tutorial components isolated from the rest of the system in a controlled way. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 8 Install the system requirements for TorchServe. .. tab-set:: .. tab-item:: Amazon Linux 2023 DLAMI Base .. code-block:: bash sudo dnf install jq java-11-amazon-corretto-headless sudo alternatives --config java sudo alternatives --config javac .. tab-item:: Ubuntu 20 DLAMI Base .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 10 .. code:: bash java -version :: openjdk version "11.0.17" 2022-10-18 OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu218.04) OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu218.04, mixed mode, sharing) .. code:: bash javac -version :: javac 11.0.17 Verify that TorchServe is now available. .. code:: bash torchserve --version :: TorchServe Version is 0.7.0 .. _torchserve-setup: Setup TorchServe ---------------- During this tutorial you will need to download a few files onto your instance. The simplest way to accomplish this is to paste the download links provided above each file into a ``wget`` command. (We don't provide the links directly because they are subject to change.) For example, right-click and copy the download link for ``config.json`` shown below. .. literalinclude:: /src/examples/pytorch/torchserve/config.json :language: JSON :caption: :download:`config.json ` Now execute the following in your shell: .. code:: bash wget ls :: bert_neuron_b6.pt config.json Download the `custom handler script `_ that will eventually respond to inference requests. .. literalinclude:: /src/examples/pytorch/torchserve/handler_bert.py :language: python :caption: :download:`handler_bert.py ` :linenos: Next, we need to associate the handler script with the compiled model using ``torch-model-archiver``. Run the following commands in your terminal: .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 12-16 .. note:: If you modify your model or a dependency, you will need to rerun the archiver command with the ``-f`` flag appended to update the archive. The result of the above will be a ``mar`` file inside the ``model_store`` directory. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 18 :: bert-max_length128-batch_size6.mar This file is essentially an archive associated with a fixed version of your model along with its dependencies (e.g. the handler code). .. note:: The version specified in the ``torch-model-archiver`` command can be appended to REST API requests to access a specific version of your model. For example, if your model was hosted locally on port 8080 and named "bert", the latest version of your model would be available at ``http://localhost:8080/predictions/bert``, while version 1.0 would be accessible at ``http://localhost:8080/predictions/bert/1.0``. We will see how to perform inference using this API in Step 6. Create a `custom config `_ file to set some parameters. This file will be used to configure the server at launch when we run ``torchserve --start``. .. literalinclude:: /src/examples/pytorch/torchserve/torchserve.config :language: properties :caption: :download:`torchserve.config ` .. note:: This will cause TorchServe to bind on all interfaces. For security in real-world applications, you’ll probably want to use port 8443 and `enable SSL `_. .. _torchserve-run: Run TorchServe -------------- It's time to start the server. Typically we'd want to launch this in a separate console, but for this demo we’ll just redirect output to a file. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 20 Verify that the server seems to have started okay. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 22 :: { "status": "Healthy" } .. note:: If you get an error when trying to ping the server, you may have tried before the server was fully launched. Check ``torchserve.log`` for details. Use the Management API to instruct TorchServe to load our model. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 24-26 :: { "status": "Model \"bert-max_length128-batch_size6\" Version: 1.0 registered with 4 initial workers" } .. note:: Any additional attempts to configure the model after the initial curl request will cause the server to return a 409 error. You’ll need to stop/start/configure the server to realize any changes. The ``MAX_BATCH_DELAY`` is a timeout value that determines how long to wait before processing a partial batch. This is why the handler code needs to check the batch dimension and potentially add padding. TorchServe will instantiate the number of model handlers indicated by ``INITIAL_WORKERS``, so this value controls how many models we will load onto Inferentia in parallel. This tutorial was performed on an inf1.xlarge instance (one Inferentia chip), so there are four NeuronCores available. If you want to control worker scaling more dynamically, `see the docs `_. .. warning:: If you attempt to load more models than NeuronCores available, one of two things will occur. Either the extra models will fit in device memory but performance will suffer, or you will encounter an error on your initial inference. You shouldn't set ``INITIAL_WORKERS`` above the number of NeuronCores. However, you may want to use fewer cores if you are using the :ref:`neuroncore-pipeline` feature. It looks like everything is running successfully at this point, so it's time for an inference. Create the ``infer_bert.py`` file below on your instance. .. literalinclude:: /src/examples/pytorch/torchserve/infer_bert.py :language: python :caption: :download:`infer_bert.py ` :linenos: This script will send a ``batch_size`` number of requests to our model. In this example, we are using a model that estimates the probability that one sentence is a paraphrase of another. The script sends positive examples in the first half of the batch and negative examples in the second half. Execute the script in your terminal. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 28 :: 1 ['paraphrase'] 3 ['not paraphrase'] 4 ['not paraphrase'] 0 ['paraphrase'] 5 ['not paraphrase'] 2 ['paraphrase'] We can see that the first three threads (0, 1, 2) all report ``paraphrase``, as expected. If we instead modify the script to send an incomplete batch and then wait for the timeout to expire, the excess padding results will be discarded. .. _torchserve-benchmark: Benchmark TorchServe -------------------- We've seen how to perform a single batched inference, but how many inferences can we process per second? A separate upcoming tutorial will document performance tuning to maximize throughput. In the meantime, we can still perform a simple naïve stress test. The code below will spawn 64 worker threads, with each thread repeatedly sending a full batch of data to process. A separate thread will periodically print throughput and latency measurements. .. literalinclude:: /src/examples/pytorch/torchserve/benchmark_bert.py :language: python :caption: :download:`benchmark_bert.py ` :linenos: Run the benchmarking script. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 30 :: pid 28523: current throughput 0.0, latency p50=0.000 p90=0.000 pid 28523: current throughput 617.7, latency p50=0.092 p90=0.156 pid 28523: current throughput 697.3, latency p50=0.082 p90=0.154 pid 28523: current throughput 702.8, latency p50=0.081 p90=0.149 pid 28523: current throughput 699.1, latency p50=0.085 p90=0.147 pid 28523: current throughput 703.8, latency p50=0.083 p90=0.148 pid 28523: current throughput 699.3, latency p50=0.083 p90=0.148 ... **Congratulations!** By now you should have successfully served a batched model over TorchServe. You can now shutdown torchserve. .. literalinclude:: tutorial_source_instructions/run_torchserve_u20.sh :language: bash :lines: 32 ================================================ FILE: archive/torch-neuron/tutorials/tutorial_source_instructions/run_libtorch.sh ================================================ #!/bin/bash set -eExuo #Run the setup script cd aws-neuron-sdk/src/examples/pytorch sudo apt install -y cargo cd libtorch_demo chmod +x setup.sh && ./setup.sh #Run sanity checks ./run_tests.sh bert_neuron_b6.pt #Benchmark ./example-app bert_neuron_b6.pt ================================================ FILE: archive/torch-neuron/tutorials/tutorial_source_instructions/run_torchserve_u20.sh ================================================ #!/bin/bash set -eExuo cd aws-neuron-sdk/src/examples/pytorch cd torchserve python trace_bert_neuronx.py ls pip install transformers==4.52.* torchserve==0.7.0 torch-model-archiver==0.7.0 captum==0.6.0 sudo apt install openjdk-11-jdk -y mkdir model_store MAX_LENGTH=$(jq '.max_length' config.json) BATCH_SIZE=$(jq '.batch_size' config.json) MODEL_NAME=bert-max_length$MAX_LENGTH-batch_size$BATCH_SIZE torch-model-archiver --model-name "$MODEL_NAME" --version 1.0 --serialized-file ./bert_neuron_b6.pt --handler "./handler_bert_neuronx.py" --extra-files "./config.json" --export-path model_store ls model_store torchserve --start --ncs --model-store model_store --ts-config torchserve.config 2>&1 >torchserve.log sleep 10 curl http://127.0.0.1:8080/ping MAX_BATCH_DELAY=5000 # ms timeout before a partial batch is processed INITIAL_WORKERS=2 # Number from table above curl -X POST "http://localhost:8081/models?url=$MODEL_NAME.mar&batch_size=$BATCH_SIZE&initial_workers=$INITIAL_WORKERS&max_batch_delay=$MAX_BATCH_DELAY" python infer_bert.py python benchmark_bert.py torchserve --stop ================================================ FILE: archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Tutorials for Inference with torch-neuron (Inf1) ==================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. .. toctree:: :maxdepth: 1 :hidden: Computer Vision Tutorials Natural Language Processing (NLP) Tutorials Utilizing Neuron Capabilities Tutorials .. include:: /archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt ================================================ FILE: archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.txt ================================================ .. tab-set:: .. tab-item:: Computer Vision Tutorials * ResNet-50 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * PyTorch YOLOv4 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. tab-item:: Natural Language Processing (NLP) Tutorials * HuggingFace pretrained BERT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * HuggingFace pretrained BERT tutorial with shared weights :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * LibTorch C++ tutorial :ref:`[html] ` * TorchServe tutorial :ref:`[html] ` * HuggingFace MarianMT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. tab-item:: Utilizing Neuron Capabilities Tutorials * BERT TorchServe tutorial :ref:`[html] ` * NeuronCore Pipeline tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. note:: To use Jupyter Notebook see: * :ref:`setup-jupyter-notebook-steps-troubleshooting` * :ref:`running-jupyter-notebook-as-script` ================================================ FILE: archive/torch-neuron/tutorials/tutorials-torch-neuron-computervision.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Computer Vision Tutorials (``torch-neuron``) ============================================ .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * ResNet-50 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * PyTorch YOLOv4 tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` ================================================ FILE: archive/torch-neuron/tutorials/tutorials-torch-neuron-nlp.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Natural Language Processing (NLP) Tutorials (``torch-neuron``) ============================================================== .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * HuggingFace pretrained BERT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * HuggingFace pretrained BERT tutorial with shared weights :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * Bring your own HuggingFace pretrained BERT container to Sagemaker Tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` * LibTorch C++ tutorial :ref:`[html] ` * TorchServe tutorial :ref:`[html] ` * HuggingFace MarianMT tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. toctree:: :hidden: :maxdepth: 1 /src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb /src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb /src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb tutorial-libtorch tutorial-torchserve transformers-marianmt ================================================ FILE: archive/torch-neuron/tutorials/tutorials-utilizing-neuron-capabilities.rst ================================================ .. meta:: :noindex: :nofollow: :description: This content is archived and no longer maintained. :date-modified: 2026-03-11 Utilizing Neuron Capabilities Tutorials ======================================= .. warning:: This document is archived. torch-neuron (Inf1) is no longer officially supported by the AWS Neuron SDK. It is provided for reference only. For current framework support, see :doc:`/frameworks/index`. * BERT TorchServe tutorial :ref:`[html] ` * NeuronCore Pipeline tutorial :ref:`[html] ` :pytorch-neuron-src:`[notebook] ` .. toctree:: :hidden: tutorial-torchserve /src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb ================================================ FILE: archive/transformers-neuronx/api-reference-guide.rst ================================================ ================================================ FILE: archive/transformers-neuronx/api-reference-guide.txt ================================================ ================================================ FILE: archive/transformers-neuronx/developer-guide.rst ================================================ .. _tn_developer_guide: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers Neuron Developer Guide (``transformers-neuronx``) ============================================================== .. toctree:: :maxdepth: 1 :hidden: /archive/transformers-neuronx/transformers-neuronx-developer-guide /archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching .. include:: /libraries/transformers-neuronx/developer-guide.txt ================================================ FILE: archive/transformers-neuronx/developer-guide.txt ================================================ * :ref:`transformers_neuronx_developer_guide` ================================================ FILE: archive/transformers-neuronx/index.rst ================================================ .. _transformers_neuronx_archive_readme: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers NeuronX (``transformers-neuronx``) ============================================== .. toctree:: :maxdepth: 1 :hidden: Setup Developer Guide Tutorials Misc .. include:: /archive/transformers-neuronx/transformers-neuronx.txt ================================================ FILE: archive/transformers-neuronx/setup/index.rst ================================================ .. _transformers-neuronx-setup: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers NeuronX Setup (``transformers-neuronx``) ===================================================== If you already have setup your environment to run PyTorch NeuronX, you just need to install Transformers NeuronX library using the following instruction. .. code-block:: pip install transformers-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com If you are starting from scratch, Neuron Multi Framework DLAMI is recommended as it comes pre-installed with Transformers NeuronX virtual environment. You can refer to the :ref:`instructions to launch a Neuron instance using Multi Framework DLAMI ` ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-api-reference.rst ================================================ ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching.rst ================================================ .. _transformers_neuronx_developer_guide_for_cb: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers NeuronX (``transformers-neuronx``) Developer Guide for Continuous Batching ======================================================================================= Transformers NeuronX is integrated with vLLM to enable continuous batching for high-throughput LLM serving and inference. This guide aims to help users get started with continuous batching for Transformers NeuronX and vLLM by providing: - :ref:`Transformers NeuronX ` An overview of Transformers NeuronX. - :ref:`cb-overview` The continuous batching procedure implemented by Transformers NeuronX and vLLM. - :ref:`cb-install` Installation and usage instructions for Transformers NeuronX and vLLM. - :ref:`cb-release-221-features` A showcase of new features in Transformers NeuronX and vLLM. - :ref:`cb-faq` .. _cb-tnx-overview: Transformers NeuronX (``transformers-neuronx``) ----------------------------------------------- Transformers NeuronX for Trn1 and Inf2 is a software package that enables PyTorch users to perform large language model (LLM) :ref:`performant inference ` on second-generation Neuron hardware (See: :ref:`NeuronCore-v2 `). The :ref:`Neuron performance page ` lists expected inference performance for commonly used Large Language Models. .. _cb-overview: Continuous Batching with Transformers NeuronX and vLLM ------------------------------------------------------ Transformers NeuronX implements the following operational flow with vLLM for continuous batching support: 1. Context encode multiple prompts using virtual dynamic batching. 2. Decode all sequences simultaneously until a sequence generates an EOS token. 3. Evict the finished sequence and insert a new prompt encoding. 4. Resume the decoding process, repeating steps 2 and 3 until all sequences are decoded. .. _cb-supported-model-architectures: Supported Model Architectures ----------------------------- Transformers NeuronX supports continuous batching for models compatible with the following Hugging Face classes: - ``LlamaForCausalLM`` - ``MistralForCausalLM`` .. _cb-install: Install vLLM and Get Started with Offline Inference --------------------------------------------------- Neuron maintains a fork of vLLM (v0.6.2) that contains the necessary changes to support inference with Transformers NeuronX. Neuron is working with the vLLM community to upstream these changes to make them available in a future version. Install vLLM ^^^^^^^^^^^^ First install ``neuronx-cc`` and the ``transformers-neuronx`` packages. Then install the vLLM fork from source: .. code-block:: bash git clone -b v0.6.x-neuron https://github.com/aws-neuron/upstreaming-to-vllm.git cd upstreaming-to-vllm pip install -r requirements-neuron.txt VLLM_TARGET_DEVICE="neuron" && pip install -e . .. note:: Please note the vLLM ``pip`` package from PyPI is not compatible with Neuron. To work with Neuron, install vLLM using the source as outlined above. .. note:: The current supported version of Pytorch for Neuron installs ``triton`` version ``2.1.0``. This is incompatible with ``vllm >= 0.5.3``. You may see an error ``cannot import name 'default_dump_dir...``. To work around this, run ``pip install --upgrade triton==3.0.0`` after installing the vLLM wheel. If Neuron packages are detected correctly in the installation process, ``vllm-0.1.dev2830+g22c56ee.neuron216`` will be installed (The ``neuron`` version depends on the installed ``neuronx-cc`` version). Run Offline Batched Inference with Transformers NeuronX and vLLM ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the following example we demonstrate how to perform continuous batching with a Llama model. .. note:: Since Llama models are gated, please accept the Llama Community License Agreement and request access to the model. Then use a Hugging Face user access token to download the model. .. code-block:: python from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM( model="meta-llama/Meta-Llama-3.1-8B-Instruct", max_num_seqs=8, # The max_model_len and block_size arguments are required to be same as max sequence length, # when targeting neuron device. Currently, this is a known limitation in continuous batching # support in transformers-neuronx. max_model_len=128, block_size=128, # The device can be automatically detected when AWS Neuron SDK is installed. # The device argument can be either unspecified for automated detection, or explicitly assigned. device="neuron", tensor_parallel_size=2) # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") Run the API Server ^^^^^^^^^^^^^^^^^^ To run the OpenAI-compatible API server in vLLM, run either command below: .. code-block:: bash vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8 .. code-block:: bash python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8 .. _cb-release-221-features: New Features in Neuron Release 2.21 ----------------------------------- Neuron's vLLM integration with Transformers NeuronX is tested using a public fork of vLLM v0.6.2. New features and enhancements introduced in this fork will be described below. Neuron's intent is to upstream these features to vLLM as soon as possible after release. Prior to upstreaming, these features can be accessed in the AWS Neuron GitHub repository https://github.com/aws-neuron/upstreaming-to-vllm/tree/v0.6.x-neuron. **Neuron Release 2.21 Features for the v0.6.2 vLLM Neuron Fork** - :ref:`Sequence bucketing ` configuration for context encoding and token generation. - :ref:`Granular NeuronConfig control ` in vLLM entrypoints. - Inference support for :ref:`speculative decoding `. - Inference support for :ref:`EAGLE speculative decoding `. **Neuron Release 2.20 Features** - Multi-node inference support for larger models. Example scripts are included in `vLLM `_ . - Direct loading of Hugging Face-compatible checkpoints without creation of a ``-split`` directory. .. _cb-sequence-bucketing: Sequence Bucketing ^^^^^^^^^^^^^^^^^^ To configure buckets, set the following environment variables. Refer to the `developer guide `_ for details on how to configure the values. These environment variables need to be set before starting the vLLM server or instantiating the ``LLM`` object. - ``NEURON_CONTEXT_LENGTH_BUCKETS``: Bucket sizes for context encoding. - ``NEURON_TOKEN_GEN_BUCKETS``: Bucket sizes for token generation. For example: ``export NEURON_CONTEXT_LENGTH_BUCKETS="128,512,1024"`` .. _cb-neuron-config-override: NeuronConfig Override ^^^^^^^^^^^^^^^^^^^^^ The default ``NeuronConfig`` in vLLM uses the latest optimizations from the Neuron SDK. However, you can override the default values or add a new configuration from the `developer guide `_ by setting the ``override_neuron_config`` parameter while creating the ``LLM`` object. .. code-block:: python llm = LLM( model="meta-llama/Meta-Llama-3.1-8B-Instruct", max_num_seqs=8, max_model_len=128, block_size=128 device="neuron", tensor_parallel_size=32, #Override or update the NeuronConfig override_neuron_config={"shard_over_sequence":True}) While standing up the API server, set the ``override-neuron-config`` argument. For example: .. code-block:: bash python3 -m vllm.entrypoints.openai.api_server --model meta-llama/Meta-Llama-3.1-8B-Instruct --tensor-parallel-size 32 --max-num-seqs 4 --max-model-len 2048 --block-size 8 --override-neuron-config {\"shard_over_sequence\":\"True\"} .. _cb-quantization: Quantization ^^^^^^^^^^^^ To use `int8 weight storage `_ , set the environment variable ``NEURON_QUANT_DTYPE`` to ``s8``. .. _cb-speculative-decoding: Speculative Decoding ^^^^^^^^^^^^^^^^^^^^ Speculative decoding is a token generation optimization technique that uses a small draft model to generate ``K`` tokens autoregressively and a larger target model to determine which draft tokens to accept, all in a combined forward pass. For more information on speculative decoding, please see `[Leviathan, 2023] `_ and `[Chen et al., 2023] `_. Speculative decoding is now available for inference with Transformers NeuronX and vLLM: .. code-block:: python from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.8, top_p=0.95) # Create an LLM. llm = LLM( model="meta-llama/Meta-Llama-3.1-70B-Instruct", speculative_model="meta-llama/Llama-3.2-1B-Instruct", # The max_model_len, speculative_max_model_len, and block_size arguments are required to be same as max sequence length, # when targeting neuron device. Currently, this is a known limitation in continuous batching # support in transformers-neuronx. max_model_len=128, block_size=128, speculative_max_model_len=128, dtype="bfloat16", max_num_seqs=4, num_speculative_tokens=4, # The device can be automatically detected when AWS Neuron SDK is installed. # The device argument can be either unspecified for automated detection, or explicitly assigned. device="neuron", tensor_parallel_size=32, use_v2_block_manager=True, ) outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") .. note:: Please ensure that the selected target and draft model are from the same model family. For example, if the target model is an instruction-tuned Llama model, the draft model must also be a lower-capacity instruction-tuned Llama model. .. _cb-eagle-speculative-decoding: EAGLE Speculative Decoding ^^^^^^^^^^^^^^^^^^^^^^^^^^ Extrapolation Algorithm for Greater Language-model Efficiency (EAGLE) extends the speculative decoding technique described above by: - Utilizing a specially trained EAGLE draft model that predicts feature outputs through an Autoregression Head and next token outputs through an LM Head. - Reducing sampling uncertainty by using the next autoregressively sampled token and a current feature map as draft model inputs. For more information on EAGLE, please see `[Li et al., 2024] `_ EAGLE speculative decoding can be applied without changes to the speculative decoding code sample above. Transformers NeuronX and vLLM will recognize a draft model as an EAGLE draft when ``is_eagle: True`` is set in the model's Hugging Face ``config.json`` file. .. _cb-faq: Frequently Asked Questions -------------------------- **Is PagedAttention supported in the vLLM integration?** No, PagedAttention is not currently supported. It will be supported in a future Neuron release. ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-developer-guide.rst ================================================ .. _transformers_neuronx_developer_guide: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers NeuronX (``transformers-neuronx``) Developer Guide ================================================================ Transformers NeuronX for Trn1 and Inf2 is a software package that enables PyTorch users to perform large language model (LLM) :ref:`performant inference ` on second-generation Neuron hardware (See: :ref:`NeuronCore-v2 `).The :ref:`Neuron performance page ` lists expected inference performance for commonly used Large Language Models. Introduction ------------ The `Transformers NeuronX repository `_ contains the source code of the AWS Neuron Transformers integration project. As it stands now, it mainly serves the purpose of running transformer decoder inference (autoregressive sampling) workflows on the Neuron platform. Note: This project is **actively** in development. The Neuron team is still heavily modifying the Neuron optimized module classes. The functionality provided in this repository will not maintain long-term API stability until version >= 1.0.0. For applications willing to reuse code from this repository, we recommend treating the Neuron optimized module implementations as samples, and pin the version of the main library package ``torch-neuronx`` to avoid breaking interface changes as new features are developed. Checkpoint compatibility with HuggingFace Transformers ------------------------------------------------------ ``transformers-neuronx`` is checkpoint-compatible with HuggingFace Transformers. While the Neuron team reimplemented some HuggingFace Transformers models from scratch for the purpose of maximizing the execution efficiency of transformer decoders on Neuron, the implementations are done with maximizing compatibility in mind, meaning one can train transformer decoder models, say GPT2, using the standard HuggingFace Transformers library, and then construct an inference-optimized decoder model using transformers-neuronx's ``GPT2ForSampling`` class. If training was done with other libraries such as MegatronLM, then it is still possible to convert the obtained checkpoint to the standard HuggingFace Transformers checkpoint format, and then move on to transformers-neuronx's optimized decoder implementations. Neuron optimized transformer decoders implemented in XLA High Level Operations (HLO) ------------------------------------------------------------------------------------ Due to the stateful nature of the autoregressive sampling computation, an efficient implementation of autoregressive sampling using the Neuron SDK requires rewriting the model forward function into a pure-function computation running on fixed-shape tensors. Furthermore, we want the pure-function computation be implemented in a compiled language so that the Neuron compiler can perform extensive code analysis and optimization. We chose XLA High Level Operations (HLO) as the compiled language for implementing Neuron optimized transformer decoder classes. The source code of these classes contains Python functions written in a syntax called "PyHLO", name of a Neuron internal tool for writing/compiling the HLO language in Python. As an example, a "language model head" implemented in PyHLO may look like the following. :: class LmHeadHlo: ... def lm_head(self, scribe): dtype = self.dtype hidden_size = self.hidden_size n_active_tokens = self.n_active_tokens batch_size = self.batch_size vocab_size = self.vocab_size hidden = dtype[hidden_size, n_active_tokens, batch_size].Parameter(parameter_number=0) weight = dtype[hidden_size, vocab_size].Parameter(parameter_number=1) rhs_size = n_active_tokens * batch_size hidden = dtype[hidden_size, rhs_size].Reshape(hidden) dot_dims = dict(lhs_contracting_dimensions=[0], rhs_contracting_dimensions=[0]) logits = dtype[vocab_size, rhs_size].Dot(weight, hidden, dot_dimension_numbers=dot_dims) return dtype[vocab_size, n_active_tokens, batch_size].Reshape(logits) ... The ``transformers_neuronx.compiler.compile_py_func`` function can convert the Python ``lm_head`` function into ``HloModuleProto``, a valid input format for the ``neuronx-cc`` compiler. Tensor-parallelism support -------------------------- For transformer decoders used in large language models, tensor-parallelism is necessary as it provides a way to shard the models' large weight matrices onto multiple NeuronCores, and having NeuronCores working on the same matrix multiply operation collaboratively. transformers-neuronx's tensor-parallelism support makes heavy use of collective operations such as all-reduce, which is supported natively by the Neuron runtime. There are some principles for setting tensor-parallelism degree (number of NeuronCores participating in sharded matrix multiply operations) for Neuron-optimized transformer decoder models. 1. The number of attention heads needs to be divisible by the tensor-parallelism degree. 2. The total data size of model weights and key-value caches needs to be smaller than 16 GB times the tensor-parallelism degree. 3. Currently, the Neuron runtime supports tensor-parallelism degrees 1, 2, 8, and 32 on Trn1 and supports tensor-parallelism degrees 1, 2, 4, 8, and 24 on Inf2. Some examples: 1. ``facebook/opt-13b`` has 40 attention heads, and when running at batch size 1 and float16 precision the model requires ~29 GB memory, therefore a ``trn1.2xlarge`` with 32 GB device memory is sufficient. 2. ``facebook/opt-30b`` has 56 attention heads, and at batch size 1 and float16 precision the model requires ~66 GB memory, therefore it can run on 8 NeuronCores on one ``trn1.32xlarge`` using 128 GB device memory. 3. ``gpt2-xl`` has 25 attention heads and requires ~4 GB memory at bfloat16 precision. It runs without tensor-parallelism only. Features -------- Compile-time Configurations --------------------------- Transformers Neuron models support a variety of compile-time configurations that can be used to tune model performance. All models support the following configurations: - ``batch_size``: The batch size to compile a model for. Once the batch size has been set, this is the only size that is supported at inference time. Neuron uses ahead-of-time compilation to achieve high performance which requires that the compiled artifact shapes must be known at compilation time. - ``n_positions``: The maximum number of positions (or sequence length) to allow during generation. This parameter directly controls the width of the KV cache. This parameter should be set to the maximum expected sequence length for the end application. - ``tp_degree``: This parameter controls the number of tensor parallel shards to split the model into. Each shard will execute on a separate NeuronCore. To minimize latency, it is recommended to set the tensor parallelism to be equal to the number of NeuronCores that are available on an instance. - ``amp``: This allows a models weights and compute to be cast to a different type. The options are; ``'bf16'``, ``'f16'``, or ``'f32'``. For models trained in ``float32``, the 16-bit mixed precision options (``'bf16'``, ``'f16'``) generally provide sufficient accuracy while significantly improving performance. - ``context_length_estimate``: This parameter controls the maximum sequence length of the prompt/context handling compute graph. This parameter is not supported in ``GPTNeoXForSampling`` and ``GPTJForSampling``. .. code-block:: python from transformers_neuronx import NeuronAutoModelForCausalLM model = NeuronAutoModelForCausalLM.from_pretrained( 'gpt2', # Uses the GPT2 checkpoint from https://huggingface.co/gpt2 batch_size=1, # Allow inference with batch size 1 inputs n_positions=128, # Allow a maximum size of 128 prompt & output tokens tp_degree=2, # Shard the model weights & compute across 2 NeuronCores amp='f16', # Downcast the weights & compute to float16 context_length_estimate=64, # Build an optimized context encoding network for a maximum prompt size of 64 ) model.to_neuron() # Load/compile the model Checkpoint support and automatic model selection ------------------------------------------------ *New in release 2.18* Transformers Neuron now supports a greater variety of checkpoints including older pytorch binary checkpoints and newer `safetensors`_ checkpoints. For improved load speed and reduced host memory consumption, it is recommended to always use ``safetensors`` by default. Both regular and sharded variants of checkpoints are supported. It is no longer recommended to use the ``save_pretrained_split`` function which was used in older Transformers Neuron examples. In addition to supporting standard checkpoint formats, Transformers Neuron provides an AutoModel class ``NeuronAutoModelForCausalLM`` which can be used to load the correct model without explicitly importing the architecture-specific class. .. _safetensors: https://github.com/huggingface/safetensors .. code-block:: python from transformers_neuronx import NeuronAutoModelForCausalLM # Loads: https://huggingface.co/bigscience/bloom-560m bloom = NeuronAutoModelForCausalLM.from_pretrained('bigscience/bloom-560m') bloom.to_neuron() # Loads: https://huggingface.co/openlm-research/open_llama_3b_v2 llama = NeuronAutoModelForCausalLM.from_pretrained('openlm-research/open_llama_3b_v2') llama.to_neuron() # This is equivalent to the following: from transformers_neuronx import BloomForSampling model = BloomForSampling.from_pretrained('bigscience/bloom-560m') model.to_neuron() from transformers_neuronx import LlamaForSampling llama = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b_v2') llama.to_neuron() .. note:: Advanced features of huggingface hub access are not supported. This includes private repositories which require access tokens and branches. In order to support more advanced repository downloads, please download the model to a local directory and load it from there. Hugging Face generate() API support ----------------------------------- Transformers Neuron models support the Hugging Face `generate() `__ API via the ``HuggingFaceGenerationModelAdapter`` adapter class. In the following example we demonstrate how to run sampling with temperature using the ``GPT2`` model: .. code-block:: python import torch from transformers import AutoTokenizer, AutoConfig from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting, HuggingFaceGenerationModelAdapter # Create and compile the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2') model.to_neuron() # Use the `HuggingFaceGenerationModelAdapter` to access the generate API config = AutoConfig.from_pretrained('gpt2') model = HuggingFaceGenerationModelAdapter(config, model) # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained('gpt2') tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = 'left' text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt', padding=True) # Run inference using temperature with torch.inference_mode(): model.reset_generation() generated_sequence = model.generate( input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, do_sample=True, max_length=256, temperature=0.7, ) print([tokenizer.decode(tok) for tok in generated_sequence]) Note: As the Hugging Face generation API can expand the input's batch dimension based on different generation configurations, we need to compile the neuron model with different compile batch_size compared to the run time batch_size (batch dimension of inputs to generation API). - if ``do_sample=True``, ``compile_batch_size = runtime_batch_size x num_return_sequences x beam_size`` - otherwise, ``compile_batch_size = runtime_batch_size x num_return_sequences`` Neuron Persistent Cache ------------------------ The Neuron Persistent Cache is now enabled for Transformers Neuron by default. Model artifacts which have been compiled once will be cached and reused on successive runs when possible. Model artifacts will only be reused when compiling with the same compiler version (neuronx-cc), model configurations, and compiler flags. It also includes other features (i.e. using an S3 bucket as the cache backend). For more detailed information, see the :ref:`Persistent cache documentation ` .. _int8_weight_storage_support: int8 weight storage support --------------------------- Transformers Neuron supports int8 weight storage for the ``GPT2`` model class. int8 weight storage can be used to reduce memory bandwidth usage to improve model performance. int8 weight storage support for additional model classes will be added in an upcoming release. In the following example we demonstrate how to apply int8 weight storage to the ``GPT2`` model via the ``QuantizationConfig`` and ``NeuronConfig`` configs: .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting, NeuronConfig, QuantizationConfig # Set the weight storage config use int8 quantization and bf16 dequantization neuron_config = NeuronConfig( quant=QuantizationConfig(quant_dtype='s8', dequant_dtype='bf16'), ) # Create and compile the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained( 'gpt2', amp='bf16', # NOTE: When using quantization, amp type must match dequant type neuron_config=neuron_config ) model.to_neuron() # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained('gpt2') text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256, start_ids=None) print([tokenizer.decode(tok) for tok in generated_sequence]) Parallel Input Prompt Context Encoding -------------------------------------- Transformers Neuron supports parallel input prompt context encoding for the ``GPT2`` model class. Parallel context encoding can be used to significantly reduce the latency of the input prompt context encoding before the autoregressive decoder token generation loop. Parallel context encoding support for additional model classes will be added in an upcoming release. The ``GPT2ForSamplingWithContextBroadcasting`` class has a ``context_length_estimate`` variable that determines the number of input prompt tokens that will be processed in parallel. For optimal results, this should be set to a power of 2 that is closest to the most frequently seen input prompt length. In the following example we demonstrate how to apply parallel context encoding to the ``GPT2`` model via the ``GPT2ForSamplingWithContextBroadcasting`` class. In this example, we set the ``context_length_estimate`` to be 128, which is the closest power of 2 the length of the input prompt (97 tokens). .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting # Create and compile the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained( 'gpt2', context_length_estimate=256 # Create an optimized network which handles prompts up to 256 tokens ) model.to_neuron() # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained('gpt2') text = "Hello, I'm a generative AI language model. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is powered by large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, drive unprecedented levels of productivity, and transform your business. " encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256) print([tokenizer.decode(tok) for tok in generated_sequence]) The ``GPT2ForSamplingWithContextBroadcasting`` class can also process an input prompt that has a different batch size from the batch size of the autoregressive decoder output. For example, an input prompt with batch size = 1 can be used to produce an output of batch size = 5 to generate multiple suggestions for the same input prompt. The input prompt batch size can be specified using the ``prompt_batch_size`` argument and the autoregressive decoder output batch size can be specified using the ``batch_size`` argument. In the following example we demonstrate how to apply parallel context encoding to the ``GPT2`` model to generate 5 outputs for a single input. .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting # Create and compile the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained( 'gpt2', prompt_batch_size=1, # This allows prompt and output batch to vary batch_size=5, context_length_estimate=256 ) model.to_neuron() # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained('gpt2') text = "Hello, I'm a generative AI language model. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It is powered by large models that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs). With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, drive unprecedented levels of productivity, and transform your business. " encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256) for i, output in enumerate(generated_sequence): print('-' * 50) print(f'Batch {i} output:') print(tokenizer.decode(output)) Serialization support --------------------- Transformers NeuronX supports model serialization (model saving and loading) for all models except the ``GPTJForSampling`` and ``GPTNeoXForSampling``` model classes. In the following example we demonstrate how to save and load the compiled artifacts for the ``GPT2`` model: .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import GPT2ForSamplingWithContextBroadcasting # Create and compile the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2') model.to_neuron() # Save the compiled Neuron model model.save('gpt2-compiled-artifacts') # Load the Neuron model model = GPT2ForSamplingWithContextBroadcasting.from_pretrained('gpt2') # Load the compiled Neuron artifacts model.load('gpt2-compiled-artifacts') # Since prior artifacts are loaded, this skips compilation model.to_neuron() # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained('gpt2') text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model.sample(encoded_input.input_ids, sequence_length=256, start_ids=None) print([tokenizer.decode(tok) for tok in generated_sequence]) Transformers NeuronX also supports the serialization of presharded weights. This reduces future model load time by saving a transformed and sharded set of weights as a new safetensors checkpoint. When this checkpoint is loaded, sharding and transformations normally done by Transformers NeuronX will be skipped, reducing model load time significantly. The saving of presharded weights is only available when ``on_device_embedding`` is true. In the following example we demonstrate how to save and load presharded weights along with compiled artifacts on a Llama model: .. code-block:: python from transformers_neuronx import LlamaForSampling from transformers_neuronx import NeuronConfig from transformers import AutoTokenizer neuron_config = NeuronConfig(on_device_embedding=True) # Create and compile the Neuron model model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config) model_neuron.to_neuron() # save the presharded weights and compiled artifacts to a directory model_neuron.save('llama-artifacts', sharded_weights=True) del model_neuron # use the presharded checkpoint to reduce model load time model_neuron_presharded = LlamaForSampling.from_pretrained('llama-artifacts', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config) # load in the compiled artifcats to skip compilation model_neuron_presharded.load('llama-artifacts') model_neuron_presharded.to_neuron() CPU Compilation Support ----------------------- Transformers NeuronX now supports compilation on CPU. CPU compilation is compatible with model serialization and presharding weights, and is available for all models except the GPTJForSampling and GPTNeoXForSampling model classes. To compile on CPU, the initial call to to_neuron() is replaced with cpu_compile(). In the following example we demonstrate how to compile on CPU for the LLaMA model: .. code-block:: python from transformers_neuronx import LlamaForSampling from transformers_neuronx import NeuronConfig from transformers import AutoTokenizer neuron_config = NeuronConfig(on_device_embedding=True) # Create and compile the model on CPU model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config) model_neuron.cpu_compile() # instead of model_neuron.to_neuron() # save the weights and compiled artifacts to a directory model_neuron.save('llama-artifacts') To use the saved artifacts generated by CPU compilation on a Neuron device: .. code-block:: python from transformers_neuronx import LlamaForSampling from transformers_neuronx import NeuronConfig from transformers import AutoTokenizer neuron_config = NeuronConfig(on_device_embedding=True) # use the presharded checkpoint to reduce model load time model_neuron_presharded = LlamaForSampling.from_pretrained('llama-artifacts', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config) # load in the compiled artifacts to skip compilation model_neuron_presharded.load('llama-artifacts') # now, use CPU compiled artifacts to run the model model_neuron_presharded.to_neuron() Compilation worker count support -------------------------------- Transformers-neuronx supports providing compilation worker count for all models. This setting controls how many workers will execute HLO graph compilation tasks in parallel. A lower setting reduces CPU memory utilization when compiling a model, but increases the compilation time. This setting is useful to prevent out of CPU memory errors when compiling large models. By default, the number of workers used is equal to the total HLO graphs required for compilation. Compilation worker count integrates with both CPU compilation flow using ``cpu_compile()`` and neuron device compilation flow using ``to_neuron()``. To set the compilation worker count, use the ``compilation_worker_count`` argument in ``NeuronConfig``. The following sample shows how to compile the graphs one by one. .. code-block:: python neuron_config = NeuronConfig(compilation_worker_count=1) Grouped-query attention (GQA) support [Beta] --------------------------------------------- Transformers Neuron supports grouped-query attention (GQA) models for ``Llama`` and ``Mistral`` model classes. There are multiple sharding strategies for K/V cache, in order to satisfy different constraints. - ``GQA.SHARD_OVER_HEADS`` distributes K/V caches along head dimension. This can be only used when K/V heads is multiple of tensor-parallelism degree. This is the default configuration. - ``GQA.SHARD_OVER_BATCH`` distributes K/V caches along batch dimension. This can be only used when batch size is multiple of tensor-parallelism degree. This can be useful for large-batch inference. - ``GQA.REPLICATED_HEADS`` replicates K/V heads. This can be used when neither batch size nor K/V heads can be divisible by tensor-parallelism degree. This can be useful for low-latency small-batch inference. - ``GQA.ALL_GATHER_HEADS`` evenly splits the K/V heads across all NeuronCores. This is optimized for large-batch inference of GQA model without replication. .. _mistral_gqa_code_sample: In the following example we demonstrate how to configure these distributed inference strategies and perform inference with the ``Mistral`` model: .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import MistralForSampling, GQA, NeuronConfig # Set sharding strategy for GQA to be shard over heads neuron_config = NeuronConfig( group_query_attention=GQA.SHARD_OVER_HEADS ) # Create and compile the Neuron model model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16', neuron_config=neuron_config) model_neuron.to_neuron() # Get a tokenizer and exaple input tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2') text = "[INST] What is your favourite condiment? [/INST]" encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None) print([tokenizer.decode(tok) for tok in generated_sequence]) Repeated Ngram Filtering ------------------------ Repeated Ngram Filtering reduces redundant ngram phrases within the generated text. It uses the same API as `HuggingFace API for NoRepeatedNGram `__. Set the parameter no_repeat_ngram_size to the size of ngram phrases to be filtered and pass it to the sampling function as in the example ``model.sample(inputs_ids, no_repeat_ngram_size=3)`` On-device sampling support [Beta] -------------------------------------- Transformers-neuronx supports on-device sampling for all models except Mixtral models. The features can be enabled by setting ``on_device_generation`` in ``NeuronConfig`` to an instance of ``GenerationConfig``. In the following example, we demonstrate how to use on-device generation for a ``Llama`` model using ``top_k``, ``top_p``, ``top_p_min_tokens`` and ``temperature``. Top-K on-device sampling support [Beta] --------------------------------------- Transformers Neuron supports Top-K Sampling on-device for all models except Mixtral models. In the following example, we demonstrate how to use on-device Top-K for the ``Llama`` model via the ``GenerationConfig`` and ``NeuronConfig`` configs. .. code-block:: python import torch from transformers_neuronx import LlamaForSampling from transformers_neuronx.config import NeuronConfig, GenerationConfig from transformers import AutoTokenizer neuron_config = NeuronConfig( on_device_generation=GenerationConfig(max_length=128, top_k=10, top_p=0.9, top_p_min_tokens=1, temperature=0.9, do_sample=True) ) # Create and compile the Neuron model model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=1, tp_degree=8, n_positions=128, neuron_config=neuron_config) model_neuron.to_neuron() # Get a tokenizer and exaple input tokenizer = AutoTokenizer.from_pretrained('openlm-research/open_llama_3b') text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128, top_k=10) print([tokenizer.decode(tok) for tok in generated_sequence]) By default, transformers-neuronx uses the same, fixed sampling parameters for all sequences across all invocations of the model when on-device generation is enabled. It is possible to provide new sampling parameters per model invocation by enabling the ``dynamic`` feature in the ``GenerationConfig``. It is also possible to provide different sampling parameters for each sequence in the batch by using the ``per_batch_line`` feature. When using this feature, it is recommended to limit the number of tokens that are considered during sampling across all sequences by setting ``global_top_k`` to a reasonably low number e.g. 250 to prevent poor performance when computing ``top_p`` tokens over a large vocabulary without any prior filtering. When using ``per_batch_line``, ``top_k``, ``top_p``, ``top_p_min_tokens`` and ``temperature`` accept lists with value per sequence in the batch. In the following example, we demonstrate how to use the ``dynamic`` and ``per_batch_line`` features together. .. code-block:: python import torch from transformers_neuronx import LlamaForSampling from transformers_neuronx.config import NeuronConfig, GenerationConfig from transformers import AutoTokenizer batch_size = 2 generation_config = GenerationConfig( max_length=128, dynamic=True, per_batch_line=True, do_sample=True, top_k=[1] * batch_size, top_p=[1.0] * batch_size, top_p_min_tokens=[1] * batch_size, temperature=[1.0] * batch_size, global_top_k=256 ) neuron_config = NeuronConfig( on_device_generation=generation_config ) # Create and compile the Neuron model model_neuron = LlamaForSampling.from_pretrained('openlm-research/open_llama_3b', batch_size=2, tp_degree=8, n_positions=128, neuron_config=neuron_config) model_neuron.to_neuron() # Get a tokenizer and exaple input tokenizer = AutoTokenizer.from_pretrained('openlm-research/open_llama_3b') tokenizer.pad_token = tokenizer.eos_token text = ["Hello, I'm a language model,", "Hello, I'm also a language model,"] encoded_input = tokenizer(text, return_tensors='pt') # Run inference with torch.inference_mode(): generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128) print([tokenizer.decode(tok) for tok in generated_sequence]) # Use different settings for each sequence in the batch # Supported because we use `generation_config.per_batch_line = True` generation_config.top_k = [1, 20] generation_config.top_p = [1.0, 0.9] generation_config.top_p_min_tokens = [1, 1] generation_config.temperature = [1.0, 0.9] # Update the generation configuration dynamically # Supported because we use `generation_config.dynamic = True` model_neuron.update_generation_config(generation_config) generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=128) print([tokenizer.decode(tok) for tok in generated_sequence]) Running inference with multiple models -------------------------------------- Multiple transformers-neuronx models can be loaded at the same time as long as the total number of consumed NeuronCores is less than or equal to the total number of NeuronCores on the instance. For example, three tp-degree=8 models can be loaded and run in parallel on an inf2.48xlarge which has 24 NeuronCores. The ``NEURON_RT_NUM_CORES`` and ``NEURON_RT_VISIBLE_CORES`` environment variables can be used to allocate the necessary number of NeuronCores to each process to run multiple transformers-neuronx models in parallel. See the :ref:`torch_neuronx_core_placement_guide` section for additional information about how to use these environment variables. It is important to notice that when multiple models are used on a single instance, the number of threads should be reduced to avoid race condition on host side. Assume the neuron instance (i.e. trn1) has 192 CPU cores. If one of the models keeps all CPU cores busy, there would be significant performance degradation in the rest of models. As a result, the number of threads for each model should be limited to part of available cores. To do this, ``OMP_NUM_THREADS`` environment variable can be set. For example, if there are 192 CPU cores available and four tp-degree=8 models are used, one can export OMP_NUM_THREADS=48 to avoid race condition. Streamer ---------------------------- LLMs generate tokens in auto-regressive loop. A model.sample call waits till the end of full sequence generation before returning the generated response. It is possible to output an output token as soon as it is generated. To do this, a streamer object can be used. Streamer is an object which has 2 methods: put and end. There are several predefined streamer in transformers library such as TextIteratorStreamer. The following example shows how to define a streamer and use it in transformers-neuronx: .. code-block:: python import torch from transformers import AutoTokenizer from transformers_neuronx import MistralForSampling, GQA import transformers from time import time # Create a custom streamer inherited from transformers.generation.streamers.BaseStreamer class CustomStreamer(transformers.generation.streamers.BaseStreamer): def __init__(self) -> None: self.reset() def reset(self): self.token_latencies = [] self.iter = 0 self.now = time() def put(self, tokens): now = time() token_latency = now - self.now print(f"Iteration {self.iter:4d}: Latency [s] {token_latency:6.3f} -- Token {tokens}") self.now = now self.iter += 1 self.token_latencies.append(token_latency) def end(self): print("First 10 token latencies:", self.token_latencies[:10]) # Create and compile the Neuron model model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16') model_neuron.to_neuron() # Get a tokenizer and exaple input tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2') text = "[INST] What is your favourite condiment? [/INST]" encoded_input = tokenizer(text, return_tensors='pt') streamer = CustomStreamer() # Run inference with torch.inference_mode(): generated_sequence = model_neuron.sample(encoded_input.input_ids, sequence_length=256, start_ids=None, streamer=streamer) Stopping Criteria ------------------ We can define custom stopping criteria to stop autoregressive loop. For example, if we want to limit autoregressive loop after 0.5s, we can define and use stopping criteria class as follows: .. code-block:: python import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer from transformers_neuronx import MistralForSampling, GQA, NeuronConfig from transformers_neuronx.stopping_criteria import StoppingCriteria, StoppingCriteriaList from time import time from typing import List, Optional, Callable class MaxTimeCriteria(StoppingCriteria): """ This class can be used to stop generation whenever the full generation exceeds some amount of time. By default, the time will start being counted when you initialize this function. You can override this by passing an `initial_time`. Args: max_time (`float`): The maximum allowed time in seconds for the generation. initial_time (`float`, *optional*, defaults to `time()`): The start of the generation allowed time. """ def __init__(self, max_time: float, initial_timestamp: Optional[float] = None): self.max_time = max_time self.initial_timestamp = time() if initial_timestamp is None else initial_timestamp def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: dt = time() - self.initial_timestamp end_condition = dt > self.max_time if end_condition: print("Stopping!") return end_condition # Create a streamer. This can be a custom streamer too inherited from transformers.generation.streamers.BaseStreamer class CustomStreamer(transformers.generation.streamers.BaseStreamer): def __init__(self) -> None: self.reset() def reset(self): self.token_latencies = [] self.iter = 0 self.now = time() def put(self, tokens): now = time() token_latency = now - self.now print(f"Iteration {self.iter:4d}: Latency [s] {token_latency:6.3f} -- Token {tokens}") self.now = now self.iter += 1 self.token_latencies.append(token_latency) def end(self): pass # Create and compile the Neuron model model_neuron = MistralForSampling.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2', amp='bf16') model_neuron.to_neuron() # Get a tokenizer and exaple input tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.2') text = "[INST] What is your favourite condiment? [/INST]" encoded_input = tokenizer(text, return_tensors='pt') # Add stopping criteria to stop after 0.5 seconds stopping_criteria_list= StoppingCriteriaList([MaxTimeCriteria(0.5)]) streamer = CustomStreamer() # Run inference with torch.inference_mode(): model_neuron.sample(input_ids=encoded_input.input_ids, sequence_length=256, stopping_criteria_list=stopping_criteria_list, streamer=streamer) Speculative sampling [Beta] --------------------------- Transformers Neuron supports speculative sampling for the ``Llama`` and ``GPT2`` model classes. In speculative sampling, we use use a smaller draft model to speculate future tokens. These are then sent to the larger target model, which accepts or rejects these tokens. For more detailed information, see the original proposal by DeepMind titled `Accelerating Large Language Model Decoding with Speculative Sampling `__. Our implementation for speculative sampling is lossless. In addition to standalone draft models, we also support `Eagle draft models `__. Currently we only support Eagle v1. In the following example, we demonstrate how to perform speculative sampling using the ``Llama`` model. In this example, we are performing multinomial sampmling. .. code-block:: python import torch from transformers import LlamaTokenizer from transformers_neuronx import NeuronAutoModelForCausalLM, NeuronConfig, GenerationConfig from transformers_neuronx.fused_speculation import FusedSpeculativeDecoder # Specify path to draft and target draft = '/home/ubuntu/Llama-2-7b-chat-hf' target = '/home/ubuntu/Llama-2-70b-chat-hf' # Specify generation parameters gen_kwargs = { "top_k": 50, "top_p": 0.9, "do_sample": True, "temperature": 0.7, } # Load draft model draft_neuron_model = NeuronAutoModelForCausalLM.from_pretrained( draft, n_positions=1024, batch_size=1, tp_degree=32, amp='bf16', neuron_config=NeuronConfig( padding_side="right", attention_layout=Layout.BSH, collectives_layout="BSH", on_device_embedding=True, on_device_generation=GenerationConfig(**gen_kwargs), ), ) draft_neuron_model.to_neuron() # Load target model target_neuron_model = NeuronAutoModelForCausalLM.from_pretrained( target, n_positions=1024, batch_size=1, tp_degree=32, amp='bf16', neuron_config=NeuronConfig( padding_side="right", attention_layout=Layout.BSH, collectives_layout="BSH", on_device_embedding=True, on_device_generation=GenerationConfig(**gen_kwargs), ), ) target_neuron_model.to_neuron() # Compile the speculative sampling model # Here we set sepculation length to be 4 fsd = FusedSpeculativeDecoder( draft_neuron_model, target_neuron_model, 4, ) fsd.to_neuron() # Initialize tokenizer and text prompt tokenizer = LlamaTokenizer.from_pretrained(target) prompt = "Hello, I'm a generative AI language model." inputs = tokenizer(prompt, return_tensors="pt") # Call speculative sampling on given input response = fsd.sample( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, sequence_length=30, ) # Decode the response generated_text = tokenizer.decode(response[0]) print(f"\nDecoded tokens: {generated_text}") The following sample shows how to enable EAGLE speculation. To get the EAGLE draft model to work, manually copy the LM head weights from the target model to the draft model. Additionally, you need to rename the keys in the draft model's ``state_dict`` to match those in the target model. .. code-block:: python import torch from transformers import LlamaTokenizer from transformers_neuronx import NeuronAutoModelForCausalLM, NeuronConfig, GenerationConfig from transformers_neuronx.fused_speculation import FusedSpeculativeDecoder # Specify path to draft and target # The Eagle draft model can be downloaded from Eagle website draft = '/home/ubuntu/EAGLE-llama2-chat-70B' target = '/home/ubuntu/Llama-2-70b-chat-hf' # Specify generation parameters gen_kwargs = { "top_k": 50, "top_p": 0.9, "do_sample": True, "temperature": 0.7, } # Load draft model draft_neuron_model = NeuronAutoModelForCausalLM.from_pretrained( draft, n_positions=1024, batch_size=1, tp_degree=32, amp='bf16', neuron_config=NeuronConfig( is_eagle_draft=True, has_pre_attention_norm=False, # Need the above two configs for Eagle padding_side="right", attention_layout=Layout.BSH, collectives_layout="BSH", on_device_embedding=True, on_device_generation=GenerationConfig(**gen_kwargs), ), ) draft_neuron_model.to_neuron() # Load target model target_neuron_model = NeuronAutoModelForCausalLM.from_pretrained( target, n_positions=1024, batch_size=1, tp_degree=32, amp='bf16', neuron_config=NeuronConfig( is_eagle_target=True, # Need the above config for Eagle padding_side="right", attention_layout=Layout.BSH, collectives_layout="BSH", on_device_embedding=True, on_device_generation=GenerationConfig(**gen_kwargs), ), ) target_neuron_model.to_neuron() # Compile the speculative sampling model # Here we set sepculation length to be 4 fsd = FusedSpeculativeDecoder( draft_neuron_model, target_neuron_model, 4, ) fsd.to_neuron() # The rest are the same QKV Weight Fusion -------------------------------------- Concatenating a model's query, key and value weight matrices often achieves better performance because larger matrices allow for more efficient data movement and compute. QKV weight fusion can be enabled by setting ``fuse_qkv=True`` in the ``NeuronConfig``: .. code-block:: python neuron_config = NeuronConfig(fuse_qkv=True) Attention Layout -------------------------------------- The intermediate tensor layouts in a model's attention layer can impact the compiler's optimization opportunities and thus can impact a model's performance. Using ``(batch, sequence, hidden)`` (or ``BSH``) layout for attention often achieves better performance since it can enable better overlapping of compute with collectives and can reduce transposes. We intend to enable ``BSH`` attention by default in a future release. For now, ``BSH`` attention layout can be enabled by setting ``attention_layout="BSH"`` in the ``NeuronConfig``: .. code-block:: python neuron_config = NeuronConfig(attention_layout="BSH") Bucketing ------------------ LLM inference is a generate process that can produce variable length sequences. This poses a problem since the Neuron compiler produces executables which expect statically shaped inputs and outputs. To make LLM work with different shapes, transformers_neuronx generates buckets and applies padding wherever it is required. There are at least two set of buckets for each LLM inference that can be set by user: 1) Context encoding (pre-fill) buckets and 2) output token generation buckets. **Token generation buckets** In token generation, tokens are generated iteratively. At each token position, transformer need to attend to the previous tokens only. But in the naive implementation with static shapes, one may attend to all KV-cache (full sequence length). To solve this problem, we use token generation buckets. Token generation buckets determine the attention lengths. For instance, if the max sequence length is 1024 tokens and current token is at position 120, there is no need to attend to all 1024 tokens in the current step. We can use token generation buckets to attend to different portions of KV-cache. By default, token generation buckets which are powers of 2 starting from 128 tokens are used (i.e. 128, 256, 512, up to sequence length). In the example above, bucket 128 would be used for position 120 which would reduce the wasted compute significantly. User can change these buckets by setting a list for ``n_positions`` (see example below). Otherwise, if a number is given for ``n_positions`` (sequence length), instead of a list, then the powers of 2 buckets starting from 128 will be used. The last bucket would be ``n_positions`` (sequence length), even if it is not a power of 2. **Context encoding buckets** The prompt tokens can be processed in parallel. As a result, we need to set the bucket sizes for different estimated length of input prompts. We can specify these context bucket sizes using the ``context_length_estimate`` argument. In general, it is better to have all the bucket to be multiples of 256 tokens. But adding too many buckets would increase device memory consumption and add extra latency for bucket switching. Usually, the powers of 2 starting from 128 tokens are used for context encoding buckets. If the total sequence length (``n_positions``) is beyond 2048 tokens, it is desirable to add extra buckets with multiple of 512 or 1024 tokens. It is not recommended to add buckets of multiples of 256 tokens or smaller for context buckets beyond 2k to avoid bucket switching latency. At runtime, the smallest bucket which fits the input context will be used. By default, the context encoding buckets set to half of output-token buckets. Adding extra context buckets would reduce the wasted compute and improves performance. However, the extra executables would reduce memory space since executables require device memory space. Notice that the default output token generation buckets work well for wide range of applications. However, ideal context encoding buckets depends on the specific use case. For instance, if all the requests have a context length of about 1500 +/- 500 tokens, adding more buckets closer to 1500 might help context encoding time. In this example, adding buckets of 1024, 1280, 1536, 1792, 2048 tokens (distance of 256 tokens) could help. Moreover, the largest context encoding bucket should be larger than the largest context length. Otherwise, the performance would degrade significantly. To set context encoding and token generation buckets manually: .. code-block:: python context_length_estimate = [1024, 1280, 1536, 1792, 2048] # The best context estimate depends on the use case n_positions = [128, 256, 512, 1024, 2048, 3072] # Usually default buckets are appropriate model = NeuronAutoModelForCausalLM.from_pretrained( 'gpt2', batch_size=1, n_positions=n_positions, tp_degree=2, amp='f16', context_length_estimate=context_length_estimate, ) Multi-node inference support (TP/PP) --------------------------------------- Prerequisite: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html When models are too large to fit on single node, Transformers NeuronX multi-node inference (tensor parallel and pipeline parallel) can be used to shard model weights across multiple Neuron instances (only supported on Trn1 and Trn1n). Single node inference code can easily be extended to multi-node inference. Note that Transformers Neuronx currently doesn't support multi-node Tensor Parallel and Pipeline Parallel at same time, when Pipeline Parallel is used, the Tensor Parallel has to be within a node (TP<=32 on Trn1/Trn1n). In the below sections, we first outline the sample code for single node execution and then provide instructions to migrate the code to use multi-node tensor parallel or multi-node pipeline parallel. To start with, the code below is for single node script, running llama2-3b model with tensor parallel degree as 32. .. code-block:: python import torch from transformers import AutoTokenizer, AutoConfig from transformers_neuronx import LlamaForSampling, HuggingFaceGenerationModelAdapter # Create and compile the Neuron model model = LlamaForSampling.from_pretrained("openlm-research/open_llama_3b", tp_degree=32) model.to_neuron() # Use the `HuggingFaceGenerationModelAdapter` to access the generate API config = AutoConfig.from_pretrained("openlm-research/open_llama_3b") model = HuggingFaceGenerationModelAdapter(config, model) # Get a tokenizer and example input tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_3b") tokenizer.pad_token_id = tokenizer.eos_token_id tokenizer.padding_side = 'left' text = "Hello, I'm a language model," encoded_input = tokenizer(text, return_tensors='pt', padding=True) # Run inference using temperature with torch.inference_mode(): model.reset_generation() generated_sequence = model.generate( input_ids=encoded_input.input_ids, attention_mask=encoded_input.attention_mask, do_sample=True, max_length=256, temperature=0.7, ) print([tokenizer.decode(tok) for tok in generated_sequence]) command line: .. code-block:: bash python3 multi_node_dev_example.py **Multi-Node Tensor Parallel** Compared to single node tensor parallel, multi-node tensor parallel shards the model weights in the same way but having mores cores across nodes. In the meantime, it requires each node’s ``model.forward()`` receives the exact same input, otherwise there would be unexpected behaviors (runtime failure, wrong output). Configurations (environment variables to be configured on each node): - ``NEURON_RT_ROOT_COMM_ID``: the master node's ``:`` - ``NEURON_RANK_ID``: rank of the node, 0 means master node - ``NEURON_LOCAL_TP``: the local tensor parallel degree on each node example: Change the single node script to use ``tp=64`` (2 node). Set the ``torch.manual_seed`` to ensure the sampling loop running on each node will sample same token as next input. Node 1 command line: .. code-block:: bash NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=0 NEURON_LOCAL_TP=32 python3 multi_node_dev_example.py Node 2 command line (same as Node 1 but set ``NEURON_RANK_ID`` as 1): .. code-block:: bash NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=1 NEURON_LOCAL_TP=32 python3 multi_node_dev_example.py You can also refer to `Tutorial `__ to run lama 3.1 405b multinode 16k tutorial with multi-node tensor parallel. **Multi-Node Pipeline Parallel** While having the weight tensor sharded as tensor pararallel, one can utilize pipeline parallel to partition the layers across different node, the intermediate tensor (hidden) will be transferred from one pipeline stage (nodes) to the next pipeline stage (nodes). The final output will be sent from last pipeline stage back to first pipeline stage. Compared to multi-node tensor parallel, for non-zero rank, the ``model.forward`` in pipeline parallel will fallback to while loop and block on the input broadcasting from master. Configurations (environment variables to be configured on each node): - ``NEURON_RT_ROOT_COMM_ID``: the master node's ``:`` - ``CPU_COMM_ID``: similar to NEURON_RT_ROOT_COMM_ID , but need to set with different port - ``NEURON_RANK_ID``: rank of the node, 0 means master node - ``NEURON_PP_STAGES``: number of pipeline stages (nodes) example: Keep the original single node script with tp=32. Node 1 command line: .. code-block:: bash NEURON_PP_STAGES=2 CPU_COMM_ID=10.1.201.64:8989 NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=0 python3 multi_node_dev_example.py Node 2 command line (same as Node 1 but set ``NEURON_RANK_ID`` as 1): .. code-block:: bash NEURON_PP_STAGES=2 CPU_COMM_ID=10.1.201.64:8989 NEURON_RT_ROOT_COMM_ID=10.1.201.64:63423 NEURON_RANK_ID=1 python3 multi_node_dev_example.py Long Sequence length support up to 128k --------------------------------------- **Flash Attention** With the integration of FlashAttention kernel, developers can use longer sequence lengths for LLAMA models. The Flash Attention kernel is automatically used when the input sequence length is greater than 8k without any additional configuration. Refer to `Tutorial `__ for usage of 32k sequence length on a variation of LLAMA3-8B Model. **Flash Decoding** Flash Decoding (FD) is a technique that significantly speeds up attention during inference, especially for long-context tasks in large language models (LLMs) with GQA. .. image:: ./flash_decoding.gif :alt: Flash Decoding :width: 800px :align: center With integration of FD, developers can achieve faster inference with larger sequence and batch size by reducing the KV cache replication. Refer to `Tutorial `__ on flash decoding usage for 128k sequence length sampling. Flash decoding can be enabled by setting the flag `shard_over_sequence=True` in `NeuronConfig` .. code-block:: python neuron_config = NeuronConfig(shard_over_sequence=True) Note that you can skip the first Allgather introduced by flash decoding at the cost of duplicate Q weights, this is only recommended for relatively small models (i.e. 3B, 8B) and large batch size. .. code-block:: python neuron_config = NeuronConfig(shard_over_sequence=True, duplicate_q_weight_sos=True) **Known limitations and FAQs** - Flash decoding is expected to have performance degradation (PTL) for smaller sequence and batch sizes. We recommend flash decoding when **batch-size x sequence length > 16k** - Flash decoding support is not enabled for the following features - Speculative Decoding - Multi Head Attention (MHA) models ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-misc.rst ================================================ .. _transformers-neuronx-misc: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Misc (``transformers-neuronx``) =============================== ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-misc.txt ================================================ * :ref:`transformers-neuronx-rn` ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-tutorials.rst ================================================ .. _transformers_neuronx_tutorials: .. meta:: :noindex: :nofollow: :description: This topic is currently archived and not maintained. It is provided for reference only. Transformers NeuronX Tutorials =============================== .. toctree:: :maxdepth: 1 :hidden: Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1 Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1 Hugging Face facebook/opt-66b autoregressive sampling on Inf2 .. include:: /libraries/transformers-neuronx/transformers-neuronx-tutorials.txt ================================================ FILE: archive/transformers-neuronx/transformers-neuronx-tutorials.txt ================================================ * `Hugging Face meta-llama/Llama-2-13b autoregressive sampling on Inf2 & Trn1 `_ * `Hugging Face facebook/opt-13b autoregressive sampling on Inf2 & Trn1 `_ * `Hugging Face facebook/opt-30b autoregressive sampling on Inf2 & Trn1 `_ * `Hugging Face facebook/opt-66b autoregressive sampling on Inf2 `_ ================================================ FILE: archive/transformers-neuronx/transformers-neuronx.txt ================================================ .. dropdown:: Setup (``transformers-neuronx``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /libraries/transformers-neuronx/setup/index.rst .. dropdown:: Developer Guide (``transformers-neuronx``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /libraries/transformers-neuronx/developer-guide.txt .. dropdown:: Tutorials (``transformers-neuronx``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /libraries/transformers-neuronx/transformers-neuronx-tutorials.txt .. dropdown:: Misc (``transformers-neuronx``) :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /libraries/transformers-neuronx/transformers-neuronx-misc.txt ================================================ FILE: archive/tutorials/finetune_t5.rst ================================================ .. _torch-hf-t5-finetune: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Fine-tune T5 model on Trn1 ================================ .. note:: This page was archived on 7/31/2025. In this tutorial, we show how to fine-tune a Hugging Face (HF) T5 model using HF trainer API. This example fine-tunes a `T5 model for a text-summarization `__ task on CNN/DailyMail dataset. .. contents:: Table of Contents :local: :depth: 2 .. include:: /frameworks/torch/torch-neuronx/tutorials/note-performance.txt Setup and compilation --------------------- Before running the tutorial please follow the installation instructions at: :ref:`Install PyTorch Neuron on Trn1 ` Please set the storage of instance to *512GB* or more if you also want to run through the BERT pretraining and GPT pretraining tutorials. For all the commands below, make sure you are in the virtual environment that you have created above before you run the commands: .. code:: shell source ~/aws_neuron_venv_pytorch/bin/activate First we install a recent version of HF transformers, scikit-learn and evaluate packages in our environment as well as download the source matching the installed version. In this example, we chose version 4.26.0 and the text summarization example from HF transformers source: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_setup_code.sh :language: shell :lines: 5-9 Single-worker training ---------------------- We will run text-summarization fine-tuning task following the example in README.md located in the path `~/transformers/examples/pytorch/summarization.` We use full BF16 casting using `XLA_USE_BF16=1` to enable best performance. First, paste the following script into your terminal to create a “run.sh” file and change it to executable: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh :language: shell :lines: 7-46 We optionally precompile the model and training script using `neuron\_parallel\_compile `__ to warm up the persistent graph cache (Neuron Cache) such that the actual run has fewer compilations (faster run time): .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh :language: shell :lines: 49 Note: For these auto-regressive models, do not run the ``predict_with_generate`` method when doing the precompile step. This is because the ``neuron_parallel_compile`` utility will run the training script in graph extraction mode and no actual execution of the graph will be done. Hence, the outputs at each step are invalid. Since the auto-regressive generation at each step is dependent on output of previous step, the generate step would fail since the outputs from previous steps are invalid. Precompilation is optional and only needs to be done once unless hyperparameters such as batch size are modified. After the optional precompilation, the actual run will be faster with minimal additional compilations. .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh :language: shell :lines: 51 If precompilation was not done, the first execution of ./run.sh will be slower due to serial compilations. Rerunning the same script a second time would show quicker execution as the compiled graphs will be already cached in persistent cache. Running the above script will run the T5-small fine-tuning on a single process. **Note:** As you may have noticed, we are not running the ``predict_with_generate`` as part of training. This is because, ``predict_with_generate`` requires auto-regressive sampling where the inputs to the decoder are created by appending outputs of previous steps. This causes the inputs to the decoder to change shape and thereby resulting in a new graph. In other words, the current ``generate`` api provided by HF transformers leads to repeated compilations. We are working on building a Neuron friendly version of ``generate`` api and it will be made available as part of future release. This will enable us to run ``predict_with_generate`` as part of training script. As a workaround, we can run the ``predict_with_generate`` on CPU after the model is trained. Once training is completed, a trained checkpoint would be saved. We can load the trained model and run the ``predict_with_generate`` to compute the final accuracy. To do so, in run_summarization.py, add the following before ``transformers`` get imported. This can be done by adding the below lines before all the ``imports``: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh :language: python :lines: 55-59 You can now run the following and it should run the predict method on CPU device. .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_single_worker_training_code.sh :language: shell :lines: 67-78 Note: To run on CPU, we need to make sure that NEURON\_NUM\_DEVICES is set to 0. This will make sure no xla\_devices are created and the trainer would use the default device (CPU). .. _multi_worker_training: Multi-worker Training --------------------- The above script will run one worker on one NeuronCore. To run on multiple cores, first add these lines to top of run\_summarization.py to disable Distributed Data Parallel (DDP) when using torchrun (see Known issues and limitations section below): .. literalinclude:: tutorial_source_code/t5_finetuning/t5_modify_run_summarization_code.sh :language: python :lines: 8-10 Then launch the run\_summarization.py script with torchrun using --nproc\_per\_node=N option to specify the number of workers (N=2 for trn1.2xlarge, and N=2, 8, or 32 for trn1.32xlarge). The following example runs 2 workers. Paste the following script into your terminal to create a “run\_2w.sh” file and change it to executable: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh :language: shell :lines: 7-46 Again, we optionally precompile the model and training script using neuron\_parallel\_compile to warm up the persistent graph cache (Neuron Cache), ignoring the results from this precompile run as it is only for extracting and compiling the XLA graphs: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh :language: python :lines: 49 Precompilation is optional and only needs to be done once unless hyperparameters such as batch size are modified. After the optional precompilation, the actual run will be faster with minimal additional compilations. .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_multi_worker_training_code.sh :language: python :lines: 51 During run, you will notice that the “Total train batch size” is now 8 and the “Total optimization steps” is now half the number for one worker training. Also, if you open ``neuron-top`` in a separate terminal, you should see 2 cores been utilized. To train T5-large model, you can set the ``model_name_or_path`` argument to ``t5-large``. Please note, currently running ``t5-large`` on trn1-2xl machine can result in ``HOST OOM`` during compilation. Hence, it is recommended that you run a ``t5-large`` model training on a trn1-32xl machine. On a trn1-32xl machine, you can create a run_32w.sh on the terminal using the following commands: .. literalinclude:: tutorial_source_code/t5_finetuning/t5_finetuning_32_worker_training_code.sh :language: shell :lines: 7-46 You can now follow the same steps as listed above. This script would run a t5-large model by launching a training script using 32 data-parallel workers. .. _t5_known_issues: Known issues and limitations ---------------------------- The following are currently known issues: - Long compilation times: this can be alleviated with ``neuron_parallel_compile`` tool to extract graphs from a short trial run and compile them in parallel ahead of the actual run, as shown above. - T5-Large compilation causing processes to get killed on trn1-2xl: It is recommended to ``t5-large`` model training on a trn1-32xl machine, as it avoids CPU OOM and also provides faster training by making use of 32 data-parallel workers. ================================================ FILE: archive/tutorials/finetuning_llama2_7b_ptl.rst ================================================ .. _llama2_7b_tp_zero1_ptl_finetune_tutorial: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Fine-tuning Llama2 7B with tensor parallelism and ZeRO-1 optimizer using Neuron PyTorch-Lightning ================================================================================================= This tutorial shows how to fine-tune Llama2 7B with tensor parallelism and ZeRO-1 using Neuron PyTorch-Lightning APIs. For pre-training information and additional context, see the Llama2 7B Tutorial and :ref:`Neuron PT-Lightning Developer Guide `. Setting up the environment ^^^^^^^^^^^^^^^^^^^^^^^^^ For this experiment, we will use AWS ParallelCluster with at least four trn1.32xlarge compute nodes. To set up a cluster and prepare it for use, see `Train your model on ParallelCluster `__. To set up the packages on the head node of the cluster, see :ref:`Install PyTorch Neuron on Trn1 `. Install the ``neuronx-distributed`` package inside the virtual environment using the following command: .. code:: ipython3 python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com Next, download the scripts for fine-tuning. 1. Create a directory to hold the experiments. .. code:: ipython3 mkdir -p ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl cd ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl 2. Download training scripts for the experiments. .. code:: ipython3 wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/data_module.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/module_llama.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/tp_zero1_llama2_7b_hf_finetune_ptl.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/tp_zero1_llama2_7b_hf_finetune_ptl.sh wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lightning/finetune_config/config.json wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/lr.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/modeling_llama_nxd.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements.txt wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/requirements_ptl.txt wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/training_utils.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/main/examples/training/llama/convert_checkpoints.py 3. Install the additional requirements and give the right permissions to the shell script. .. code:: ipython3 python3 -m pip install -r requirements.txt python3 -m pip install -r requirements_ptl.txt # Currently we're supporting Lightning version 2.4.0 python3 -m pip install optimum-neuron==0.0.18 nltk # Additional dependencies for evaluation python3 -m pip install --no-warn-conflicts transformers==4.32.1 # Ping transformers version 4.32.1 chmod +x tp_zero1_llama2_7b_hf_finetune_ptl.sh Download the Llama2-7B pre-trained checkpoint from HuggingFace. 1. Create a Python script ``get_model.py`` with the following lines: .. code:: ipython3 import torch from transformers.models.llama.modeling_llama import LlamaForCausalLM model = LlamaForCausalLM.from_pretrained("NousResearch/Llama-2-7b-hf") torch.save(model.state_dict(), "llama-7b-hf-pretrained.pt") 2. Run the download script and conversion script to pull and convert the checkpoint, note that conversion scripts requires high memory so need to login to a compute node to do so: .. code:: ipython3 ssh compute1-dy-training-0-1 source ~/aws_neuron_venv_pytorch/bin/activate cd ~/examples/tp_zero1_llama2_7b_hf_finetune_ptl python3 get_model.py python3 convert_checkpoints.py --tp_size 8 --convert_from_full_model --config config.json --input_dir llama-7b-hf-pretrained.pt --output_dir llama7B-pretrained/pretrained_weight 3. (Optional) If you are loading checkpoint from different directory, set the checkpoint path by adding the following flag to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``: * ``--pretrained_ckpt``. This provides direction to the pre-trained checkpoint to be loaded. Then, set the dataset for the fine-tuning job. In this example, we will use Dolly, which is an open source dataset of instruction-following records on categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization. .. code-block:: json { "instruction": "Alice's parents have three daughters: Amy, Jessy, and what's the name of the third daughter?", "context": "", "response": "The name of the third daughter is Alice" } Configure the following flags in ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``: .. code:: ipython3 --data_dir "databricks/databricks-dolly-15k" \ --task "open_qa" At this point, you are all set to start fine-tuning. Running fine-tuning ^^^^^^^^^^^^^^^^^^^ By this step, the cluster is all set up for running experiments. Before running training, first pre-compile the graphs using the :ref:`neuron_parallel_compile `. Run the command below: .. code:: ipython3 sbatch --exclusive \ --nodes 1 \ --wrap="srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama2_7b_hf_finetune_ptl.sh" This script uses a tensor-parallel size of 8. This automatically sets the zero-1 sharding degree to 4 (32 workers / tensor_parallel_size). `Note`: You can use any number of nodes in this case by adjusting the number of nodes in the above Slurm command accordingly. Also, the number of nodes used in the parallel_compile command should be same as the number used in the actual training run. This is because, as the number of nodes change, the data-parallel degree changes too. This results in more workers participating in operations like `gradient all-reduce`, which results in new graphs getting created. After the graphs are compiled, you can run training and observe how the loss goes down. Before the actual fine-tune started, we need to prepare the dataset .. code:: ipython3 python3 -c "import nltk; nltk.download('punkt')" To run the training, run the above command without ``neuron_parallel_compile``: .. code:: ipython3 sbatch --exclusive \ --nodes 1 \ --wrap="srun bash $(pwd)/tp_zero1_llama2_7b_hf_finetune_ptl.sh" At the end of fine-tuning, run evaluation once with a test data split by generating sentences and calculating ROUGE scores. The final evaluation results and ROUGE score are then printed in your terminal. Checkpointing ^^^^^^^^^^^^^^ To enable checkpoint saving, add the following flags to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``: * ``--save_checkpoint`` Enables checkpoint saving. * ``--checkpoint_freq`` Number of steps to save a checkpoint. * ``--checkpoint_dir`` Direction to save the checkpoint. * ``--num_kept_checkpoint`` Number of checkpoints to save. Older checkpoint are deleted manually. Set to -1 to keep all saved checkpoints. * ``--save_load_xser`` Loads with torch_xla serialization to reduce time saving. We recommend enabling xser for significantly faster save and load times. Note that if the checkpoint is saved with xser, it can only be loaded with xser, and vice versa. To enable checkpoint loading, add the following flags to ``tp_zero1_llama2_7b_hf_finetune_ptl.sh``: * ``--resume_ckpt`` Resumes the checkpoint process. * ``--load_step`` The step to retrieve the checkpoint from. * ``--checkpoint_dir`` Direction to load the checkpoint from. * ``--save_load_xser`` Loads with torch_xla serialization to reduce time saving. We recommend enabling xser for significantly faster save and load times. Note that if the checkpoint is saved with xser, it can only be loaded with xser, and vice versa. ================================================ FILE: archive/tutorials/gpt3_neuronx_nemo_megatron_pretraining.rst ================================================ .. _gpt3_neuronx_nemo_megatron_pretraining: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Launch a GPT-3 pretraining job using neuronx-nemo-megatron ========================================================== Archived tutorials for gpt3 pretraining using neuronx-nemo-megatron * `Launch a GPT-3 23B pretraining job using neuronx-nemo-megatron `_ * `Launch a GPT-3 46B pretraining job using neuronx-nemo-megatron `_ * `Launch a GPT-3 175B pretraining job using neuronx-nemo-megatron `_ ================================================ FILE: archive/tutorials/megatron_gpt_pretraining.rst ================================================ .. _megatron_gpt_pretraining: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Megatron GPT Pretraining ======================== .. note:: This page was archived on 7/31/2025. In this example, we will compile and train a Megatron GPT model on a single instance or on multiple instances using ParallelCluster with the NxD Training library. The example has the following main sections: .. contents:: Table of contents :local: :depth: 2 Setting up the environment -------------------------- ParallelCluster Setup ^^^^^^^^^^^^^^^^^^^^^ In this example, we will use 8 instances with ParallelCluster, please follow the instructions here to create a cluster: `Train your model on ParallelCluster `_ ParallelCluster automates the creation of trn1 clusters, and provides the SLURM job management system for scheduling and managing distributed training jobs. Please note that the home directory on your ParallelCluster head node will be shared with all of the worker nodes via NFS. Install Dependencies ^^^^^^^^^^^^^^^^^^^^ Once you have launched a trn1 instance or ParallelCluster, please follow this guide on how to install the latest Neuron packages: `PyTorch Neuron Setup Guide `_. Next, we will need to install NxD Training and its dependencies. Please see the following installation guide for installing NxD Training: :ref:`NxDT Installation Guide ` Download the dataset -------------------- This tutorial makes use of a preprocessed Wikipedia dataset that is stored in S3. The dataset can be downloaded to your cluster or instance by running the following commands on the head node or your trn1 instance: .. code-block:: bash export DATA_DIR=~/examples_datasets/gpt2 mkdir -p ${DATA_DIR} && cd ${DATA_DIR} wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin . --no-sign-request aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx . --no-sign-request aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt . --no-sign-request Pre-compile the model --------------------- By default, PyTorch Neuron uses a just in time (JIT) compilation flow that sequentially compiles all of the neural network compute graphs as they are encountered during a training job. The compiled graphs are cached in a local compiler cache so that subsequent training jobs can leverage the compiled graphs and avoid compilation (so long as the graph signatures and Neuron version have not changed). An alternative to the JIT flow is to use the included ``neuron_parallel_compile`` command to perform ahead of time (AOT) compilation. In the AOT compilation flow, the compute graphs are first identified and extracted during a short simulated training run, and the extracted graphs are then compiled and cached using parallel compilation, which is considerably faster than the JIT flow. First, clone the open-source ``neuronx-distributed-training`` library .. code:: ipython3 git clone https://github.com/aws-neuron/neuronx-distributed-training cd neuronx-distributed-training/examples Now, ensure that you are using the proper config file in the ``conf/`` directory. In the ``train.sh`` file, ensure that the ``CONF_FILE`` variable is properly set to the config for the model you want to use. In our case, it will be ``megatron_gpt_config``. The default config here is a 6.7B parameter model, but users can also add their own ``conf/*.yaml`` files and run different configs and hyperparameters if desired. Please see :ref:`Config Overview ` for examples and usage for the ``.yaml`` config files. Next, run the following commands to launch an AOT pre-compilation job on your instance: .. code-block:: bash export COMPILE=1 ./train.sh The compile output and logs will be shown directly in the terminal and you will see a message similar to this: .. code-block:: bash 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0 Then, you know your compilation has successfully completed. .. note:: The number of graphs will differ based on package versions, models, and other factors. This is just an example. If you are using ParallelCluster, then you will need to update the ``conf/megatron_gpt_config.yaml`` with .. code-block:: yaml num_nodes: 8 Then to run the compile job: .. code-block:: bash export COMPILE=1 sbatch --exclusive \ --nodes 8 \ --cpus-per-task 128 \ --wrap="srun ./train.sh" Once you have launched the precompilation job, run the squeue command to view the SLURM job queue on your cluster. If you have not recently run a job on your cluster, it may take 4-5 minutes for the requested trn1.32xlarge nodes to be launched and initialized. Once the job is running, squeue should show output similar to the following: .. code-block:: bash JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10 compute1 wrap ubuntu R 5:11 8 compute1-dy-queue1-i1-[0-7] You can view the output of the precompilation job by examining the file named ``slurm-ZZ.out``, where ZZ represents the JOBID of your job in the squeue output above. .. code-block:: bash tail -f slurm-10.out Once the precompilation job is complete, just like the above output you should see a message similar to the following in the logs: .. code-block:: bash 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total graphs: 22 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total successful compilations: 22 2024-08-11 23:04:08.000738: INFO ||PARALLEL_COMPILE||: Total failed compilations: 0 At this point, you can press ``CTRL-C`` to exit the tail command. Training the model ------------------ The pre-training job is launched almost exactly the same as the compile job. We now turn off the ``COMPILE`` environment variable and run the same training script to start pre-training. On a single instance: .. code-block:: bash export COMPILE=0 ./train.sh If you are using ParallelCluster: .. code-block:: bash export COMPILE=0 sbatch --exclusive \ --nodes 8 \ --cpus-per-task 128 \ --wrap="srun ./train.sh" As outlined above, you can again use the ``squeue`` command to view the job queue, and also monitor the job in the same way with the ``tail`` command to see the training logs. Once the model is loaded onto the Trainium accelerators and training has commenced, you will begin to see output indicating the job progress: Example: .. code-block:: bash Epoch 0: 0%| | 189/301501 [59:12<1573:03:24, 18.79s/it, loss=7.75, v_num=3-16, reduced_train_loss=7.560, global_step=188.0, consumed_samples=24064.0] Epoch 0: 0%| | 190/301501 [59:30<1572:41:13, 18.79s/it, loss=7.74, v_num=3-16, reduced_train_loss=7.560, global_step=189.0, consumed_samples=24192.0] Epoch 0: 0%| | 191/301501 [59:48<1572:21:28, 18.79s/it, loss=7.73, v_num=3-16, reduced_train_loss=7.910, global_step=190.0, consumed_samples=24320.0] Monitoring Training ------------------- Tensorboard monitoring ^^^^^^^^^^^^^^^^^^^^^^ In addition to the text-based job monitoring described in the previous section, you can also use standard tools such as TensorBoard to monitor training job progress. To view an ongoing training job in TensorBoard, you first need to identify the experiment directory associated with your ongoing job. This will typically be the most recently created directory under ``~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/``. Once you have identifed the directory, cd into it, and then launch TensorBoard: .. code-block:: bash cd ~/neuronx-distributed-training/examples/nemo_experiments/megatron_gpt/ tensorboard --logdir ./ With TensorBoard running, you can then view the TensorBoard dashboard by browsing to ``http://localhost:6006`` on your local machine. If you cannot access TensorBoard at this address, please make sure that you have port-forwarded TCP port 6006 when SSH'ing into the head node, .. code-block:: bash ssh -i YOUR_KEY.pem ubuntu@HEAD_NODE_IP_ADDRESS -L 6006:127.0.0.1:6006 neuron-top / neuron-monitor / neuron-ls ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The `neuron-top `_ tool can be used to view useful information about NeuronCore utilization, vCPU and RAM utilization, and loaded graphs on a per-node basis. To use neuron-top during on ongoing training job, first SSH into one of your compute nodes from the head node (if using ParallelCluster), and then run ``neuron-top``: .. code-block:: bash ssh compute1-dy-queue1-i1-1 # to determine which compute nodes are in use, run the squeue command neuron-top Similarly, once you are logged into one of the active compute nodes, you can also use other Neuron tools such as `neuron-monitor `_ and `neuron-ls `_ to capture performance and utilization statistics and to understand NeuronCore allocation. Troubleshooting Guide --------------------- For issues with NxD Training, please see: :ref:`NxD Training Known Issues ` For ParallelCluster issues see: `AWS ParallelCluster Troubleshooting `_ ================================================ FILE: archive/tutorials/multinode-training-model-profiling.rst ================================================ .. meta:: :description: Learn how to use Neuron Explorer to analyze performance during multi-node training on AWS Trainium instances with SLURM job scheduling :date-modified: 12/02/2025 Profiling Multi-Node Training Jobs with Neuron Explorer ======================================================== This tutorial demonstrates how to use Neuron Explorer to analyze performance during multi-node training on AWS Trainium instances. We will run a scaled-down version of the :doc:`NxD Training Llama3 8B tutorial ` across 2 nodes, capture performance traces, and visualize them using Perfetto. we will run training across 2 nodes with reduced steps and layers so that compilation and profiling complete quickly. Prerequisites ------------- * Access to a multi-node Trainium cluster (4 nodes in this example) * Neuron SDK installed and configured along with :doc:`NxD Training library installation ` * Review of the :doc:`NxD Training Llama3 8B tutorial ` * Familiarity with SLURM job scheduling Setup and Configuration ----------------------- Step 1: Initial Setup ~~~~~~~~~~~~~~~~~~~~~~ A. Download the dataset script: .. code-block:: bash # Download get_dataset.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/get_dataset.py B. Create a directory for dataset and get the corresponding config file - .. code-block:: bash mkdir ~/examples_datasets/ && cd ~/examples_datasets/ # Download config.json wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/8B_config_llama3/config.json ~/ C. Get the tokenizer using the following code snippet - .. code-block:: python # tokenizer.py from huggingface_hub import login from transformers import AutoTokenizer login(token='YourHuggingFaceToken') tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B') tokenizer.save_pretrained(".") .. code-block:: bash python3 tokenizer.py D. Run the get_dataset.py - .. code-block:: bash python3 ~/get_dataset.py --llama-version 3 E. Clone neuronx-distributed-training git repo .. code-block:: bash cd ~ git clone https://github.com/aws-neuron/neuronx-distributed-training.git cd ~/neuronx-distributed-training/examples Step 2: Modify the Configuration Files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Update the training configuration to minimize runtime while still generating useful profiling data: 1. In ``hf_llama3_8B_config.yaml``, make the following changes: .. code-block:: yaml max_steps: 5 # Run only 5 steps for faster turnaround num_layers: 2 # Reduce model depth to 2 layers num_nodes: 2 # Run only 2 nodes global_batch_size: 32 # Set a relatively smaller GBS to avoid large trace volume These changes ensure the job compiles and runs quickly while still exercising the profiler. 2. In ``train.sh``, set the configuration file name: .. code-block:: bash CONF_FILE=hf_llama3_8B_config This ensures the job runs with your modified config. Step 3: Compile the Model ~~~~~~~~~~~~~~~~~~~~~~~~~ Before training, the model must be compiled into Neuron Executable Files (NEFFs). To do this: .. code-block:: bash export COMPILE=1 export CONF_FILE=hf_llama3_8B_config sbatch --exclusive \ --nodes=2 \ --cpus-per-task=128 \ --wrap="srun ./train.sh" * ``COMPILE=1`` tells the script to run in compile-only mode. * ``--nodes=2`` requests 2 Trainium nodes for compilation. * ``srun ./train.sh`` launches the job via Slurm across the allocated nodes. .. note:: The first compilation may take some time depending on the model size. Once compiled, NEFFs are cached for reuse in later training runs. Step 4: Run the Training Job with Profiling Enabled ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now that compilation is done, we can run the training job while enabling Neuron Explorer: .. code-block:: bash export COMPILE=0 export CONF_FILE=hf_llama3_8B_config NEURON_RT_INSPECT_DEVICE_PROFILE=1 NEURON_RT_INSPECT_ENABLE=1 \ NEURON_RT_INSPECT_OUTPUT_DIR=./output \ sbatch --exclusive \ --nodes=2 \ --cpus-per-task=128 \ --wrap="srun ./train.sh" Here's what's happening: * ``COMPILE=0``: Use precompiled NEFFs instead of recompiling. * ``NEURON_RT_INSPECT_ENABLE=1``: Turns on runtime inspection for profiling. * ``NEURON_RT_INSPECT_OUTPUT_DIR=./output``: All profiler logs will be saved into the ``./output`` directory. * Slurm runs the job across 2 nodes with 128 CPUs per task. At the end of this step, you should see an output directory containing runtime inspection logs from each node. Step 5: Generate a Perfetto Profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Neuron Explorer produces raw trace data. To visualize it, convert the logs into a Perfetto compatible trace file: 1. Run the Neuron Explorer CLI: .. code-block:: bash neuron-profile view -d ./output --output-format perfetto This command consolidates the logs and generates a Perfetto compatible trace file. Step 6: Visualize in Perfetto ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Download the generated trace file to your local machine. 2. Open the Perfetto UI. 3. Drag and drop the trace file into the browser window. You'll now see a timeline view of your training job, including kernel execution, operator scheduling, and activity across NeuronCores. This visualization helps you identify compute vs. memory bottlenecks, idle time, and overall efficiency of the training job. Step 7: Understanding the System Level Profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the profile is loaded in Perfetto, you'll see both nodes (2 in our case) along with their workers, listed on the left-hand side as process IDs (PIDs). Each worker captures the same trace, so expanding any one of them will give you the information you need. The key runtime event to focus on is the Neuron Runtime API call named ``nc_exec_running``. This API is responsible for executing a Neuron Executable File (NEFF) on the NeuronCores. If you hover over or click on one of these calls, Perfetto will display details about which NEFF is being executed. While you may see other runtime API calls, our primary interest is in ``nc_exec_running`` since it directly represents the model execution on Neuron hardware. .. image:: /tools/profiler/images/multinode-training-1.png In the example trace shown, the calls to ``nc_exec_running`` appear back-to-back with no significant delays in between. This indicates that, at a system level, the runtime is efficiently dispatching work to NeuronCores. The ``model_name`` field in the arguments section will display the name of the NEFF which is being used in the corresponding ``nc_exec_running``. Step 8: Linking to device level profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since we are able to see the NEFF name from ``nc_exec_running`` api call, we will now see how to visualize the profile for that NEFF. This effectively means how the model performance on a given Neuron core looks like. For this, on your trainium cluster, navigate to your compile cache directory (If you are following this tutorial it could be set as ``compiler_cache_url`` in config.yaml file). Navigate to the directory and search for the respective module directory based on the name, and you will see artifacts in that directory as shown below - .. code-block:: text ├── compile_flags.json ├── model.done ├── model.hlo_module.pb └── model.neff ================================================ FILE: archive/tutorials/nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/adamw_fp32_optim_params.py ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/get_dataset.py ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/requirements.txt ./ python3 -m pip install -r requirements.txt python3 get_dataset.py PATH=$PATH:/opt/slurm/bin/ sbatch --exclusive \ --nodes 4 \ --cpus-per-task 128 \ --wrap="srun neuron_parallel_compile bash $(pwd)/tp_dp_gpt_neox_20b_hf_pretrain.sh" sbatch --exclusive \ --nodes 4 \ --cpus-per-task 128 \ --wrap="srun bash $(pwd)/tp_dp_gpt_neox_20b_hf_pretrain.sh" ================================================ FILE: archive/tutorials/nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_6.9b_hf_pretrain/ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/adamw_fp32_optim_params.py ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/get_dataset.py ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/common/requirements.txt ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py ./ ln -sf ~/neuronx-distributed/examples/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/utils.py ./ python3 -m pip install -r requirements.txt python3 get_dataset.py PATH=$PATH:/opt/slurm/bin/ sbatch --exclusive \ --nodes 4 \ --wrap="srun neuron_parallel_compile bash $(pwd)/tp_dp_gpt_neox_6.9b_hf_pretrain.sh" sbatch --exclusive \ --nodes 4 \ --wrap="srun bash $(pwd)/tp_dp_gpt_neox_6.9b_hf_pretrain.sh" ================================================ FILE: archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_13b.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/llama/lightning chmod +x run_llama_13b_tp_pp_ptl.sh mkdir 13B_config cp ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain/13B_config_llama2/config.json ./13B_config sudo rm -rf /home/ubuntu/.cache/ pip install --upgrade filelock python3 get_dataset.py --llama-version 2 PATH=$PATH:/opt/slurm/bin/ sbatch --exclusive \ --nodes 32 \ --cpus-per-task 128 \ --wrap="srun neuron_parallel_compile bash $(pwd)/run_llama_13b_tp_pp_ptl.sh" sbatch --exclusive \ --nodes 32 \ --cpus-per-task 128 \ --wrap="srun bash $(pwd)/run_llama_13b_tp_pp_ptl.sh" ================================================ FILE: archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/llama/lightning chmod +x run_llama_70b_tp_pp_ptl.sh mkdir 70B_config cp ~/neuronx-distributed/examples/training/llama/tp_pp_llama_hf_pretrain/70B_config_llama2/config.json ./70B_config sudo rm -rf /home/ubuntu/.cache/ pip install --upgrade filelock python3 get_dataset.py --llama-version 2 PATH=$PATH:/opt/slurm/bin/ sbatch --exclusive \ --nodes 32 \ --cpus-per-task 128 \ --wrap="srun neuron_parallel_compile bash $(pwd)/run_llama_70b_tp_pp_ptl.sh" sbatch --exclusive \ --nodes 32 \ --cpus-per-task 128 \ --wrap="srun bash $(pwd)/run_llama_70b_tp_pp_ptl.sh" ================================================ FILE: archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/llama/tp_zero1_llama_hf_pretrain chmod +x tp_zero1_llama2_7B_hf_pretrain.sh ln -sf 7B_config_llama2/config.json ./ sudo rm -rf /home/ubuntu/.cache/ pip install --upgrade filelock python3 get_dataset.py --llama-version 2 PATH=$PATH:/opt/slurm/bin/ sbatch --exclusive \ --nodes 4 \ --cpus-per-task 128 \ --wrap="srun neuron_parallel_compile bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh" sbatch --exclusive \ --nodes 4 \ --cpus-per-task 128 \ --wrap="srun bash $(pwd)/tp_zero1_llama2_7B_hf_pretrain.sh" ================================================ FILE: archive/tutorials/nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh ================================================ #!/bin/bash set -eExuo cd ~/neuronx-distributed/examples/training/llama/lightning ln -sf ~/neuronx-distributed/examples/training/llama/get_dataset.py ./ ln -sf ~/neuronx-distributed/examples/training/llama/lr.py ./ ln -sf ~/neuronx-distributed/examples/training/llama/modeling_llama_nxd.py ./ ln -sf ~/neuronx-distributed/examples/training/llama/requirements.txt ./ ln -sf ~/neuronx-distributed/examples/training/llama/requirements_ptl.txt ./ ln -sf ~/neuronx-distributed/examples/training/llama/training_utils.py ./ python3 -m pip install -r requirements.txt python3 -m pip install -r requirements_ptl.txt # Currently we're supporting Lightning version 2.1.0 ================================================ FILE: archive/tutorials/ssd300_demo/requirements.txt ================================================ numpy>1.18.5 tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 tensorflow-serving-api==1.15.0 torch>=1.0,<2.0 torchvision<1.0 matplotlib<4.0 Cython<0.29 pycocotools==2.0.1 tensorflow-serving-api==1.15.0 ================================================ FILE: archive/tutorials/ssd300_demo/ssd300_demo.rst ================================================ .. _tensorflow-ssd300: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Running SSD300 with AWS Neuron ============================== .. note:: This page was archived on 7/31/2025. *Update 11/16: The model checkpoint link*\ https://api.ngc.nvidia.com/v2/models/nvidia/ssdpyt_fp32/versions/1/files/nvidia_ssdpyt_fp32_20190225.pt\ *is currently broken and the AWS Neuron team is working on providing an alternative source.* This demo shows a Neuron compatible SSD300 implementation that is functionally equivalent to open source SSD300 model. This demo uses TensorFlow-Neuron, PyTorch SSD300 model and checkpoint (https://pytorch.org/hub/nvidia_deeplearningexamples_ssd/) and also shows the performance achieved by the Inf1 instance. Table of Contents ----------------- 1. Launch EC2 instance and update AWS Neuron SDK software 2. Generating Neuron compatible SSD300 TensorFlow SavedModel - Convert open source PyTorch SSD300 model and checkpoint into Neuron compatible SSD300 TensorFlow SavedModel 3. Evaluate the generated SSD300 TensorFlow SavedModel for both accuracy and performance - Running threaded inference through the COCO 2017 validation dataset Launch EC2 instances and update tensorflow-neuron and neuron-cc --------------------------------------------------------------- For this demo, launch one inf1.xlarge EC2 instance. We recommend using the latest Ubuntu 18 Deep Learning AMI (DLAMI). Please configure your ubuntu16/ubuntu18/yum repo following the steps in the :ref:`install-neuron-tensorflow` in order to install ``tensorflow-model-server-neuron``. Generating Neuron compatible SSD300 TensorFlow SavedModel --------------------------------------------------------- First connect to your inf1.xlarge instance Compile open source PyTorch SSD300 model and checkpoint into Neuron compatible SSD300 TensorFlow SavedModel ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the same directory ssd300_demo, run the following: 1. Create venv and install dependencies .. code:: bash sudo apt update sudo apt install g++ python3-dev python3-venv unzip sudo apt install tensorflow-model-server-neuron python3 -m venv env source ./env/bin/activate pip install pip setuptools --upgrade pip install -r ./requirements.txt --extra-index-url=https://pip.repos.neuron.amazonaws.com 2. Clone NVIDIA's DeepLearningExamples repo that contains PyTorch SSD300. .. code:: bash git clone https://github.com/NVIDIA/DeepLearningExamples.git cd DeepLearningExamples git checkout a644350589f9abc91b203f73e686a50f5d6f3e96 cd .. 3. Download PyTorch SSD300 checkpoint file. .. code:: bash curl -LO https://api.ngc.nvidia.com/v2/models/nvidia/ssdpyt_fp32/versions/1/files/nvidia_ssdpyt_fp32_20190225.pt 4. Download COCO 2017 validation set and annotations. .. code:: bash curl -LO http://images.cocodataset.org/zips/val2017.zip unzip ./val2017.zip curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip unzip ./annotations_trainval2017.zip 5. Convert PyTorch SSD300 model and checkpoint into a Neuron-compatible TensorFlow SavedModel. .. code:: bash python ssd300_model.py --torch_checkpoint=./nvidia_ssdpyt_fp32_20190225.pt --output_saved_model=./ssd300_tf_neuron/1 This converts PyTorch SSD300 model and checkpoint to a Neuron-compatible TensorFlow SavedModel using tensorflow-neuron and neuron-cc. The compilation output is stored in ``./ssd300_tf_neuron``. 6. Launch the ``tensorflow-model-server-neuron`` gRPC server at default port 8500 in the background. .. code:: bash tensorflow_model_server_neuron --model_base_path=$(pwd)/ssd300_tf_neuron & 7. In client, evaluate the Neuron-compatible TensorFlow SavedModel for both accuracy and performance. Note that this client by default assumes a ``tensorflow-model-server-neuron`` listening at ``localhost:8500``. On inf1.xlarge, the expected throughput is 100 images/second once the server is fully warmed up, and the expected mean average precision (mAP) is 0.253. .. code:: bash python ssd300_evaluation_client.py --val2017=./val2017 --instances_val2017_json=./annotations/instances_val2017.json 8. After running the demo, please cleanup resources allocated in Neuron runtime by gracefully killing the ``tensorflow_model_server_neuron`` process, e. g., .. code:: bash killall tensorflow_model_server_neuron ================================================ FILE: archive/tutorials/ssd300_demo/ssd300_detection.py ================================================ import argparse import json import pkg_resources from distutils.version import LooseVersion import numpy as np from PIL import Image import matplotlib.pyplot as plt import matplotlib.patches as patches import tensorflow as tf import tensorflow.neuron as tfn def main(): parser = argparse.ArgumentParser() parser.add_argument('--image', required=True, help='Path to image that is to be detected. Support jpeg and png format.') parser.add_argument('--image_with_detections', required=True, help='Path to save image after detection (with bounding boxes drawn). Png format.') parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel') parser.add_argument('--score_threshold', type=float, default=0.15, help='Minimum required score for drawing a bounding box') parser.add_argument('--instances_val2017_json', default=None, help='Json file that contains labeling information') parser.add_argument('--save_results', default=None) parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if not args.disable_version_check: tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.0.1.0.1333.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) with open(args.image, 'rb') as f: img_jpg_bytes = f.read() model_feed_dict = {'batch_image': [img_jpg_bytes]} predictor = tf.contrib.predictor.from_saved_model(args.saved_model) results = predictor(model_feed_dict) if args.save_results is not None: np.savez(args.save_results, **results) boxes_np = results['boxes'] scores_np = results['scores'] classes_np = results['classes'] if args.instances_val2017_json is not None: with open(args.instances_val2017_json) as f: annotate_json = json.load(f) label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])} plt.switch_backend('agg') fig, ax = plt.subplots(1) ax.imshow(Image.open(args.image).convert('RGB')) wanted = scores_np[0] > args.score_threshold for xywh, label_no_bg in zip(boxes_np[0][wanted], classes_np[0][wanted]): rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none') ax.add_patch(rect) rx, ry = rect.get_xy() rx = rx + rect.get_width() / 2.0 if args.instances_val2017_json is not None: ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10, ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5)) plt.savefig(args.image_with_detections) plt.close(fig) if __name__ == '__main__': main() ================================================ FILE: archive/tutorials/ssd300_demo/ssd300_evaluation.py ================================================ import argparse import os import json import glob from concurrent import futures import time import pkg_resources from distutils.version import LooseVersion import numpy as np import tensorflow as tf import tensorflow.neuron as tfn from pycocotools.cocoeval import COCOeval from DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection def get_val_dataset(val_annotate, val_coco_root): dboxes = dboxes300_coco() val_trans = SSDTransformer(dboxes, (300, 300), val=True) val_coco = COCODetection(val_coco_root, val_annotate, val_trans) return val_coco def main(): parser = argparse.ArgumentParser() parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel') parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset') parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information') parser.add_argument('--num_sessions', type=int, default=1, help='Number of tensorflow sessions') parser.add_argument('--num_threads', type=int, default=4, help='Number of threads') parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput') parser.add_argument('--save_results', default=None) parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if not args.disable_version_check: tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.0.1.0.1333.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) predictor_list = [tf.contrib.predictor.from_saved_model(args.saved_model) for _ in range(args.num_sessions)] val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017) inv_map = {v: k for k, v in val_dataset.label_map.items()} model_feed_dict_list = [] for img_id in val_dataset.img_keys: img_path = os.path.join(args.val2017, val_dataset.images[img_id][0]) with open(img_path, 'rb') as f: img_jpg_bytes = f.read() model_feed_dict_list.append({'batch_image': [img_jpg_bytes]}) latency_list = [] throughput_list = [] def predict(pred, model_feed_dict): start = time.time() result = pred(model_feed_dict) latency_list.append(time.time() - start) return result def performance(): last_num_infer = len(latency_list) while len(latency_list) < len(model_feed_dict_list): current_num_infer = len(latency_list) throughput = (current_num_infer - last_num_infer) / args.throughput_interval throughput_list.append(throughput) p50 = 0.0 p90 = 0.0 if latency_list: p50 = np.percentile(latency_list, 50) p90 = np.percentile(latency_list, 90) print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90)) last_num_infer = current_num_infer time.sleep(args.throughput_interval) executor = futures.ThreadPoolExecutor(max_workers=(args.num_sessions*args.num_threads)+1) performance_future = executor.submit(performance) eval_futures = [] for idx, model_feed_dict in enumerate(model_feed_dict_list): eval_fut = executor.submit(predict, predictor_list[idx%len(predictor_list)], model_feed_dict) eval_futures.append(eval_fut) waited_results = [] for idx, eval_fut in enumerate(eval_futures): if idx % 100 == 0: print('evaluating image {}/{}'.format(idx, len(eval_futures))) waited_results.append(eval_fut.result()) eval_results = [] for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)): boxes = results['boxes'] for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]): res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]] # +1 to account for background eval_results.append(res) performance_future.result() coco_gt = COCO(annotation_file=args.instances_val2017_json) coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32)) coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox') coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() if args.save_results is not None: np.save(args.save_results, coco_eval.stats) if __name__ == '__main__': main() ================================================ FILE: archive/tutorials/ssd300_demo/ssd300_evaluation_client.py ================================================ import argparse import os import json import glob from concurrent import futures import time import subprocess from distutils.version import LooseVersion import numpy as np import tensorflow as tf import grpc from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from pycocotools.cocoeval import COCOeval from DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection def get_val_dataset(val_annotate, val_coco_root): dboxes = dboxes300_coco() val_trans = SSDTransformer(dboxes, (300, 300), val=True) val_coco = COCODetection(val_coco_root, val_annotate, val_trans) return val_coco def main(): parser = argparse.ArgumentParser() parser.add_argument('--server_address', default='localhost:8500', help='tensorflow-model-server-neuron grpc address') parser.add_argument('--model_name', default='default', help='Serving model name') parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset') parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information') parser.add_argument('--num_threads', type=int, default=4, help='Number of threads') parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput') parser.add_argument('--save_results', default=None) args = parser.parse_args() channel = grpc.insecure_channel(args.server_address) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017) inv_map = {v: k for k, v in val_dataset.label_map.items()} request_list = [] for img_id in val_dataset.img_keys: img_path = os.path.join(args.val2017, val_dataset.images[img_id][0]) with open(img_path, 'rb') as f: img_jpg_bytes = f.read() data = np.array([img_jpg_bytes], dtype=object) data = tf.contrib.util.make_tensor_proto(data, shape=data.shape) request = predict_pb2.PredictRequest() request.model_spec.name = args.model_name request.inputs['batch_image'].CopyFrom(data) request_list.append(request) latency_list = [] throughput_list = [] def predict(request): start = time.time() result = stub.Predict(request).outputs latency_list.append(time.time() - start) return result def performance(): last_num_infer = len(latency_list) while len(latency_list) < len(request_list): current_num_infer = len(latency_list) throughput = (current_num_infer - last_num_infer) / args.throughput_interval throughput_list.append(throughput) p50 = 0.0 p90 = 0.0 if latency_list: p50 = np.percentile(latency_list, 50) p90 = np.percentile(latency_list, 90) print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90)) last_num_infer = current_num_infer time.sleep(args.throughput_interval) executor = futures.ThreadPoolExecutor(max_workers=args.num_threads+1) performance_future = executor.submit(performance) eval_futures = [] for idx, request in enumerate(request_list): eval_fut = executor.submit(predict, request) eval_futures.append(eval_fut) waited_results = [] for idx, eval_fut in enumerate(eval_futures): if idx % 100 == 0: print('evaluating image {}/{}'.format(idx, len(eval_futures))) waited_results.append(eval_fut.result()) eval_results = [] for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)): results = {key: tf.make_ndarray(value) for key, value in results.items()} boxes = results['boxes'] for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]): res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]] # +1 to account for background eval_results.append(res) performance_future.result() coco_gt = COCO(annotation_file=args.instances_val2017_json) coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32)) coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox') coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() if args.save_results is not None: np.save(args.save_results, coco_eval.stats) if __name__ == '__main__': main() ================================================ FILE: archive/tutorials/ssd300_demo/ssd300_model.py ================================================ import sys import os import argparse import time import itertools from functools import partial from collections import Counter import json import shutil import pkg_resources from distutils.version import LooseVersion import numpy as np import tensorflow as tf from tensorflow.core.framework import attr_value_pb2 import tensorflow.neuron as tfn import torch def decode_jpeg_resize(input_tensor, image_size): # decode jpeg tensor = tf.image.decode_png(input_tensor, channels=3) # resize decoded_shape = tf.shape(tensor) tensor = tf.cast(tensor, tf.float32) decoded_shape_hw = decoded_shape[0:2] decoded_shape_hw_float32 = tf.cast(decoded_shape_hw, tf.float32) tensor = tf.image.resize(tensor, image_size) # normalize tensor -= np.array([0.485, 0.456, 0.406]).astype(np.float32) * 255.0 return tensor, decoded_shape_hw_float32[::-1] def preprocessor(input_tensor, image_size): with tf.name_scope('Preprocessor'): tensor, bbox_scale_hw = tf.map_fn( partial(decode_jpeg_resize, image_size=image_size), input_tensor, dtype=(tf.float32, tf.float32), back_prop=False, parallel_iterations=16) return tensor, bbox_scale_hw def tf_Conv2d(input_tensor, module, first_conv=False): np_dtype = input_tensor.dtype.as_numpy_dtype kernel_np = module.weight.detach().numpy().transpose([2, 3, 1, 0]) if first_conv: kernel_np /= (np.array([0.229, 0.224, 0.225]).astype(np.float32) * 255.0)[:, np.newaxis] kernel = tf.constant(kernel_np.astype(np_dtype)) if any(module.padding): pad_h, pad_w = module.padding padding = [[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]] input_tensor = tf.pad(input_tensor, padding) stride_h, stride_w = module.stride tensor = tf.nn.conv2d(input_tensor, kernel, strides=[1, stride_h, stride_w, 1], padding='VALID') if module.bias is not None: bias = tf.constant(module.bias.detach().numpy().astype(np_dtype)) tensor = tf.nn.bias_add(tensor, bias) return tensor def tf_BatchNorm2d(input_tensor, module): def _norm_np(ts): return ts.astype(input_tensor.dtype.as_numpy_dtype) mean = _norm_np(module.running_mean.detach().numpy()) offset = _norm_np(module.bias.detach().numpy()) inv_std = np.sqrt(module.running_var.detach().numpy() + module.eps) scale_inv_std = _norm_np(module.weight.detach().numpy() / inv_std) return scale_inv_std * (input_tensor - mean) + offset def tf_MaxPool2d(input_tensor, module): pad = module.padding tensor = tf.pad(input_tensor, [[0, 0], [pad, pad], [pad, pad], [0, 0]]) return tf.nn.max_pool2d(tensor, ksize=module.kernel_size, strides=module.stride, padding='VALID') def tf_Bottleneck(input_tensor, module): tensor = tf_Conv2d(input_tensor, module.conv1) tensor = tf_BatchNorm2d(tensor, module.bn1) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, module.conv2) tensor = tf_BatchNorm2d(tensor, module.bn2) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, module.conv3) tensor = tf_BatchNorm2d(tensor, module.bn3) if module.downsample is not None: input_tensor = tf_Conv2d(input_tensor, module.downsample[0]) input_tensor = tf_BatchNorm2d(input_tensor, module.downsample[1]) return tf.nn.relu(input_tensor + tensor) def tf_SequentialBottleneck(tensor, seq, resnet): with tf.name_scope('{}.Sequential'.format(seq)): for idx, module in enumerate(resnet[seq]): with tf.name_scope('{}.BasicBlock'.format(idx)): tensor = tf_Bottleneck(tensor, module) return tensor def tf_bbox_view(detection_feed, modules, ndim): results = [] for idx, (tensor, mod) in enumerate(zip(detection_feed, modules)): with tf.name_scope('branch{}'.format(idx)): tensor = tf_Conv2d(tensor, mod) tensor = tf.transpose(tensor, [0, 3, 1, 2]) tensor = tf.cast(tensor, tf.float32) shape = tensor.shape.as_list() batch_size = -1 if shape[0] is None else shape[0] new_shape = [batch_size, ndim, np.prod(shape[1:]) // ndim] results.append(tf.reshape(tensor, new_shape)) tensor = tf.concat(results, axis=-1) return tensor def tf_feature_extractor(input_tensor, resnet): with tf.name_scope('FeatureExtractor'): with tf.name_scope('0.Conv2d'): tensor = tf_Conv2d(input_tensor, resnet[0], first_conv=True) with tf.name_scope('1.BatchNorm2d'): tensor = tf_BatchNorm2d(tensor, resnet[1]) with tf.name_scope('2.ReLU'): tensor = tf.nn.relu(tensor) with tf.name_scope('3.MaxPool2d'): tensor = tf_MaxPool2d(tensor, resnet[3]) tensor = tf_SequentialBottleneck(tensor, 4, resnet) tensor = tf_SequentialBottleneck(tensor, 5, resnet) tensor = tf_SequentialBottleneck(tensor, 6, resnet) tensor = tf.cast(tensor, tf.float16) return tensor def tf_box_predictor(tensor, ssd300_torch): with tf.name_scope('BoxPredictor'): detection_feed = [tensor] for idx, block in enumerate(ssd300_torch.additional_blocks): with tf.name_scope('{}.Sequential'.format(idx)): tensor = tf_Conv2d(tensor, block[0]) tensor = tf_BatchNorm2d(tensor, block[1]) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, block[3]) tensor = tf_BatchNorm2d(tensor, block[4]) tensor = tf.nn.relu(tensor) detection_feed.append(tensor) with tf.name_scope('Boxes'): loc = tf_bbox_view(detection_feed, ssd300_torch.loc, ndim=4) with tf.name_scope('Probabilities'): conf = tf_bbox_view(detection_feed, ssd300_torch.conf, ndim=ssd300_torch.label_num) return loc, conf @tfn.fuse(batch_size=1, dynamic_batch_size=True) def tf_ssd300(input_tensor, ssd300_torch): with tf.name_scope('SSD300'): tensor = tf_feature_extractor(input_tensor, ssd300_torch.feature_extractor.feature_extractor) loc, conf = tf_box_predictor(tensor, ssd300_torch) return loc, conf def scale_back_batch(bboxes_in, scores_in, scale_xy, scale_wh, dboxes_xywh): """ Do scale and transform from xywh to ltrb suppose input Nx4xnum_bbox Nxlabel_numxnum_bbox """ with tf.name_scope('ScaleBackBatch'): bboxes_in = tf.transpose(bboxes_in, [0, 2, 1]) scores_in = tf.transpose(scores_in, [0, 2, 1]) bboxes_xy = bboxes_in[:, :, :2] bboxes_wh = bboxes_in[:, :, 2:] bboxes_xy *= scale_xy bboxes_wh *= scale_wh bboxes_xy = bboxes_xy * dboxes_xywh[:, :, 2:] + dboxes_xywh[:, :, :2] bboxes_wh = tf.exp(bboxes_wh) * dboxes_xywh[:, :, 2:] bboxes_wh_half = 0.5 * bboxes_wh bboxes_lt = bboxes_xy - bboxes_wh_half bboxes_rb = bboxes_xy + bboxes_wh_half bboxes_in = tf.concat([bboxes_lt, bboxes_rb], axis=-1) return bboxes_in, tf.nn.softmax(scores_in, axis=-1) def select_nms_outputs(input_tensors): boxes_xywh, scores, classes, valid_detections = input_tensors return boxes_xywh[:valid_detections], scores[:valid_detections], classes[:valid_detections] def postprocessor(ploc_ts, plabel_ts, bbox_scale_hw_ts, scale_xy, scale_wh, dboxes_xywh): with tf.name_scope('Postprocessor'): ploc_ts = tf.cast(ploc_ts, tf.float32) plabel_ts = tf.cast(plabel_ts, tf.float32) bboxes_ts, probs_ts = scale_back_batch(ploc_ts, plabel_ts, scale_xy, scale_wh, dboxes_xywh) bboxes_ts = bboxes_ts[:, :, tf.newaxis, :] probs_ts = probs_ts[:, :, 1:] nms_outputs = tf.image.combined_non_max_suppression( bboxes_ts, probs_ts, max_output_size_per_class=200, max_total_size=200, iou_threshold=0.5, score_threshold=0.05, pad_per_class=False, clip_boxes=False, name='CombinedNonMaxSuppression', ) nmsed_boxes_x0y0x1y1, nmsed_scores, nmsed_classes, valid_detections = nms_outputs nmsed_boxes_x0y0 = nmsed_boxes_x0y0x1y1[..., :2] nmsed_boxes_x1y1 = nmsed_boxes_x0y0x1y1[..., 2:] bbox_scale_hw_ts = bbox_scale_hw_ts[:, tf.newaxis, :] nmsed_boxes_xy = nmsed_boxes_x0y0 * bbox_scale_hw_ts nmsed_boxes_wh = (nmsed_boxes_x1y1 - nmsed_boxes_x0y0) * bbox_scale_hw_ts nmsed_boxes_xywh = tf.concat([nmsed_boxes_xy, nmsed_boxes_wh], axis=-1) nmsed_boxes_xywh, nmsed_scores, nmsed_classes = tf.map_fn( select_nms_outputs, (nmsed_boxes_xywh, nmsed_scores, nmsed_classes, valid_detections), dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16) return nmsed_boxes_xywh, nmsed_scores, nmsed_classes class DefaultBoxes(object): def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios, scale_xy=0.1, scale_wh=0.2): self.feat_size = feat_size self.fig_size = fig_size self.scale_xy_ = scale_xy self.scale_wh_ = scale_wh # According to https://github.com/weiliu89/caffe # Calculation method slightly different from paper self.steps = steps self.scales = scales fk = fig_size/np.array(steps) self.aspect_ratios = aspect_ratios self.default_boxes = [] # size of feature and number of feature for idx, sfeat in enumerate(self.feat_size): sk1 = scales[idx]/fig_size sk2 = scales[idx+1]/fig_size sk3 = np.sqrt(sk1*sk2) all_sizes = [(sk1, sk1), (sk3, sk3)] for alpha in aspect_ratios[idx]: w, h = sk1*np.sqrt(alpha), sk1/np.sqrt(alpha) all_sizes.append((w, h)) all_sizes.append((h, w)) for w, h in all_sizes: for i, j in itertools.product(range(sfeat), repeat=2): cx, cy = (j+0.5)/fk[idx], (i+0.5)/fk[idx] self.default_boxes.append((cx, cy, w, h)) self.dboxes = np.array(self.default_boxes) self.dboxes = self.dboxes.clip(min=0, max=1) # For IoU calculation self.dboxes_ltrb = self.dboxes.copy() self.dboxes_ltrb[:, 0] = self.dboxes[:, 0] - 0.5 * self.dboxes[:, 2] self.dboxes_ltrb[:, 1] = self.dboxes[:, 1] - 0.5 * self.dboxes[:, 3] self.dboxes_ltrb[:, 2] = self.dboxes[:, 0] + 0.5 * self.dboxes[:, 2] self.dboxes_ltrb[:, 3] = self.dboxes[:, 1] + 0.5 * self.dboxes[:, 3] @property def scale_xy(self): return self.scale_xy_ @property def scale_wh(self): return self.scale_wh_ def __call__(self, order="ltrb"): if order == "ltrb": return self.dboxes_ltrb if order == "xywh": return self.dboxes def dboxes300_coco(): figsize = 300 feat_size = [38, 19, 10, 5, 3, 1] steps = [8, 16, 32, 64, 100, 300] # use the scales here: https://github.com/amdegroot/ssd.pytorch/blob/master/data/config.py scales = [21, 45, 99, 153, 207, 261, 315] aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]] dboxes = DefaultBoxes(figsize, feat_size, steps, scales, aspect_ratios) return dboxes def main(): parser = argparse.ArgumentParser() parser.add_argument('--torch_checkpoint', required=True, help='Path to PyTorch SSD300 model checkpoint') parser.add_argument('--output_saved_model', required=True, help='Output TensorFlow SavedModel that runs on Inferentia') parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if os.path.exists(args.output_saved_model): raise OSError('SavedModel dir {} already exists'.format(args.output_saved_model)) if not args.disable_version_check: neuroncc_version = LooseVersion(pkg_resources.get_distribution('neuron-cc').version) if neuroncc_version < LooseVersion('1.0.18000'): raise RuntimeError( 'neuron-cc version {} is too low for this demo. Please upgrade ' 'by "pip install -U neuron-cc --index-url=https://pip.repos.neuron.amazonaws.com"'.format(neuroncc_version)) tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.3.1.0.1900.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) sys.path.append(os.getcwd()) from DeepLearningExamples.PyTorch.Detection.SSD.src import model as torch_ssd300_model ssd300_torch = torch_ssd300_model.SSD300() ckpt = torch.load(args.torch_checkpoint, map_location=torch.device('cpu')) ssd300_torch.load_state_dict(ckpt['model']) ssd300_torch.eval() input_tensor = tf.placeholder(tf.string, [None]) image_tensor, bbox_scale_hw_tensor = preprocessor(input_tensor, [300, 300]) dboxes = dboxes300_coco() dboxes_xywh = dboxes(order="xywh")[np.newaxis, ...] ploc_tensor, plabel_tensor = tf_ssd300(image_tensor, ssd300_torch) boxes_tensor, scores_tensor, classes_tensor = postprocessor( ploc_tensor, plabel_tensor, bbox_scale_hw_tensor, dboxes.scale_xy, dboxes.scale_wh, dboxes_xywh) outputs = { 'boxes': boxes_tensor, 'scores': scores_tensor, 'classes': classes_tensor, } sess = tf.Session() try: sess.run(outputs) except: pass for op in sess.graph.get_operations(): if op.type == 'NeuronOp': if not op.get_attr('executable'): raise AttributeError( 'Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly ' '("pip install neuron-cc --force --index-url=https://pip.repos.neuron.amazonaws.com" ' 'to force reinstall neuron-cc).') model_config = op.node_def.attr['model_config'].list if model_config.i: model_config.i[0] = 1 else: model_config.i.extend([1, 1, 1, 10]) op._set_attr('model_config', attr_value_pb2.AttrValue(list=model_config)) tf.saved_model.simple_save(sess, args.output_saved_model, {'batch_image': input_tensor}, outputs) if __name__ == '__main__': main() ================================================ FILE: archive/tutorials/training-gpt-neox-20b.rst ================================================ .. _gpt_neox_20b_tp_zero1_tutorial: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently unsupported and not maintained. It is provided for reference only. Training GPT-NeoX 20B with Tensor Parallelism and ZeRO-1 Optimizer ========================================================================================= In this section, we showcase to pretrain a GPT-NeoX 20B model by using the sequence parallel optimization of tensor parallelism in the ``neuronx-distributed`` package. Please refer to the `Neuron Samples repository `__ to view the files in this tutorial. This GPT-NeoX 20B tutorial differs from the :ref:`GPT-NeoX 6.9B tutorial` in the following ways: * sequence parallel optimization has been applied * parallel cross entropy has been applied * the model size has been increased from 6.9B to 20B * the TP degree has been increased from 8 to 32 Setting up environment is same as the :ref:`GPT-NeoX 6.9B tutorial`. **Let’s download the scripts for pretraining:** .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh :language: shell :lines: 4-8 Next let’s download and pre-process the dataset: .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh :language: shell :lines: 10 At this point, you are all set to start training. **Running training** We first pre-compile the graphs using the ``neuron_parallel_compile``. Let’s run the command below: .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh :language: shell :lines: 14-17 This script uses a tensor-parallel size of 32. This will automatically set the zero-1 sharding degree to 4 (4 * 32 workers / tensor_parallel_size). Once the graphs are compiled we can now run training and observe our loss goes down. To run the training, we just the above command but without ``neuron_parallel_compile``. .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_20b.sh :language: shell :lines: 19-22 **Sequence Parallel** We made the following model level modifications to enable sequence parallel: * turn on ``sequence_parallel_enabled`` of ``ColumnParallelLinear`` and ``RowParallelLinear`` in ``GPTNeoXAttention`` and ``GPTNeoXMLP``; * replace torch ``LayerNorm`` in ``GPTNeoXLayer`` and ``GPTNeoXModel`` with neuronx-distributed ``LayerNorm`` with ``sequence_parallel_enabled`` turned on; * dimension transposition of intermediate states in the forward function of ``GPTNeoXAttention``. * dimension transposition and collective communication of intermediate states in the forward function of ``GPTNeoXModel``. In the training training script level, we enable: * all-reduce sequence parallel gradients at the gradient accumulation boundary. Please check `modeling_gpt_neox_nxd.py `__ and `tp_dp_gpt_neox_20b_hf_pretrain.py `__ for details. **Parallel Cross Entropy** To enable parallel cross entropy, we made the following model level modeifincations: * replace the ``CrossEntropyLoss`` with neuronx-distributed ``parallel_cross_entropy`` in the forward function of ``GPTNeoXForCausalLM``. * use ``ColumnParallelLinear`` for the ``embed_out`` layer in ``GPTNeoXForCausalLM``. Please check ``modeling_gpt_neox_nxd.py`` for details. ================================================ FILE: archive/tutorials/training-gpt-neox.rst ================================================ .. _gpt_neox_tp_zero1_tutorial: .. meta:: :noindex: :nofollow: :description: This documentation for the AWS Neuron SDK is currently unsupported and not maintained. It is provided for reference only. Training GPT-NeoX 6.9B with Tensor Parallelism and ZeRO-1 Optimizer ========================================================================================= In this section, we showcase to pretrain a GPT-NeoX 6.9B model by using tensor parallelism and zero-1 optimizer in the ``neuronx-distributed`` package. Please refer to the `Neuron Samples repository `__ to view the files in this tutorial. **Setting up environment:** For this experiment, we will use a ParallelCluster with at least four trn1-32xl compute nodes. `Train your model on ParallelCluster `__ introduces how to setup and use a ParallelCluster. We need first to create and activate a python virtual env on the head node of the ParallelCluster. Next follow the instructions mentioned here: :ref:`Install PyTorch Neuron on Trn1 ` to install neuron python packages. We also need to install and clone the ``neuronx-distributed`` package using the following command: .. code:: ipython3 python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com git clone git@github.com:aws-neuron/neuronx-distributed.git Let’s download the scripts for pretraining. .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh :language: shell :lines: 4-10 Next let’s download and pre-process the dataset: .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh :language: shell :lines: 12 At this point, you are all set to start training. **Running training** We first pre-compile the graphs using the ``neuron_parallel_compile``. Let’s run the command below: .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh :language: shell :lines: 16-18 This script uses a tensor-parallel size of 8. This will automatically set the zero-1 sharding degree to 16 (4 * 32 workers / tensor_parallel_size). Once the graphs are compiled we can now run training and observe our loss goes down. To run the training, we just the above command but without ``neuron_parallel_compile``. .. literalinclude:: nxd-source-code/gpt_neox_tp_zero1/gpt_neox_6_9b.sh :language: shell :lines: 20-22 **ZeRO-1 Optimizer** The training script uses ZeRO-1 optimizer, where the optimizer states are partitioned across the ranks so that each rank updates only its partition. Below shows the code snippet of using ZeRO-1 optimizer in training script: .. code:: ipython3 from neuronx_distributed.optimizer import NeuronZero1Optimizer optimizer = NeuronZero1Optimizer( optimizer_grouped_parameters, AdamW_FP32OptimParams, lr=flags.lr, pin_layout=False, sharding_groups=parallel_state.get_data_parallel_group(as_list=True), grad_norm_groups=parallel_state.get_tensor_model_parallel_group(as_list=True), ) ================================================ FILE: archive/tutorials/training_codegen25_7b.rst ================================================ .. _codegen25_7b_tp_zero1_tutorial: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Training CodeGen2.5 7B with Tensor Parallelism and ZeRO-1 Optimizer ============================================================================================== In this tutorial, we showcase how to pretrain a CodeGen2.5 7B model for program synthesis. Since Codegen2.5's architecture is identical to the one of Llama2, you may want to take a look at our `Llama2 tutorial `__ first. After setting up the environment and installing ``neuronx-distributed``, we need to download a data set containing source code (in this case Java code) and then preprocess and tokenize it to match the code-infill format (more about this below). Use the following commands to download the required files. Note, that we reuse our llama2 training files. .. code:: bash mkdir -p ~/examples/tp_zero1_codegen25_7b_hf_pretrain cd ~/examples/tp_zero1_codegen25_7b_hf_pretrain wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/modeling_llama_nxd.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/tp_zero1_llama_hf_pretrain.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/llama/tp_zero1_llama_hf_pretrain/logger.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/tp_zero1_codegen25_7b_hf_pretrain.sh wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/get_dataset_infill.py wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/get_dataset_infill.sh wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/requirements.txt chmod +x tp_zero1_codegen25_7b_hf_pretrain.sh chmod +x get_dataset_infill.sh python3 -m pip install -r requirements.txt Data Preprocessing and Tokenization ------------------------------------ To tokenize the data, we will use the CodeGen2.5 tokenizer from the HuggingFace repository. Download it by cloning the repository. .. code:: bash cd ~/examples git clone https://huggingface.co/Salesforce/codegen25-7b-mono cd codegen25-7b-mono rm config.json # Need to use our config.json for some Trainium-specific settings wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed/master/examples/training/codegen25/config.json cd .. This tutorial makes use of a clean JAVA subset of the TheStack corpus and we preprocess it to fit the infill-format. The infill format samples a random number of spans and formats the input the following way: .. code:: Python def count_words(filename: str) -> Dict[str, int]: """Count the number of occurrences of each word in the file.""" with open(filename, 'r') as f: word_counts = {} for line in f: if word in word_counts: for word in line.split(): word_counts[word] += 1 else: word_counts[word] = 1 return word_counts becomes .. code:: Python def count_words(filename: str) -> Dict[str, int]: """Count the number of occurrences of each word in the file.""" with open(filename, 'r') as f: in word_counts: for word in line.split(): word_counts[word] += 1 else: word_counts[word] = 1 return word_counts<|endoftext|> word_counts = {} for line in f: if word For each span, we introduce two ```` tokens. One signals the model that a span is missing at this position, and one (at the end of the code) which is followed by the original code span. Lastly, each span is suffixed with an end of mask (````) token. You can preprocess and tokenize the dataset by running: .. code:: bash cd ~/examples/tp_zero1_codegen25_7b_hf_pretrain ./get_dataset_infill.sh This will preprocess and store the data in your home directory at ``~/example_datasets/bigcode-stack-java_tokenized_infill``. Starting Training ----------------- At this point, you are all set to start training. Per default, we use a tensor parallel degree of 8, a global batch size of 256, and train for 10k steps. Feel free to change these settings in the ``tp_zero1_codegen25_7b_hf_pretrain.sh`` script. We first pre-compile the graphs using the ``neuron_parallel_compile``. Let’s run the command below: .. code:: Python sbatch --exclusive \ --nodes 1 \ --wrap="srun neuron_parallel_compile bash $(pwd)/tp_zero1_codegen25_7b_hf_pretrain.sh" Once the graphs are compiled we can run training and observe our loss going down. To do so, we run the same command omitting ``neuron_parallel_compile``. .. code:: Python sbatch --exclusive \ --nodes 1 \ --wrap="srun bash $(pwd)/tp_zero1_codegen25_7b_hf_pretrain.sh" Happy training! ================================================ FILE: archive/tutorials/training_llama2_tp_pp_ptl.rst ================================================ .. _llama2_tp_pp_ptl_tutorial: .. meta:: :noindex: :nofollow: :description: This tutorial for the AWS Neuron SDK is currently archived and not maintained. It is provided for reference only. Training Llama-2-7B/13B/70B using Tensor Parallelism and Pipeline Parallelism with Neuron PyTorch-Lightning ============================================================================================================ In this section, we showcase to pretrain a Llama2 7B/13B/70B with Tensor Parallelism and Pipeline Parallel using Neuron PyTorch-Lightning APIs, please refer to the Llama2 13B/70B Tutorial and the Neuron PT-Lightning Developer Guide for more context. Setting up environment: ^^^^^^^^^^^^^^^^^^^^^^^ For this experiment, we will use AWS ParallelCluster with at least four trn1.32xlarge compute nodes(at least 32 nodes are needed for 13B/70B model size). `Train your model on ParallelCluster `__ introduces how to setup and use a ParallelCluster. To setup the packages on the headnode of the ParallelCluster, follow the instructions mentioned here: :ref:`Install PyTorch Neuron on Trn1 `. We also need to install the ``neuronx-distributed`` package inside the virtual env using the following command: .. code:: ipython3 python -m pip install neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com git clone git@github.com:aws-neuron/neuronx-distributed.git Let’s download the scripts for pretraining: 1. Navigate to a directory to hold our experiments .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh :language: shell :lines: 4 2. Link the training scripts for our experiments .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh :language: shell :lines: 5-10 If you want to pre-train Llama 7B, you would need to run the following steps - .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh :language: shell :lines: 5-8 If you want to pre-train Llama 13B, you would need to run the following steps - .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_13b.sh :language: shell :lines: 5-8 If you want to pre-train Llama 70B, you would need to run the following steps - .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh :language: shell :lines: 5-8 3. Installing the additional requirements and giving the right permissions to our shell script .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_tp_pp_ptl_setup.sh :language: shell :lines: 12-13 Next, we tokenize our dataset. ``Note``: To tokenize the data, we must request the tokenizer from `HuggingFace` and `Meta` by following the instructions at the following link: `HuggingFace Llama 2 7B Model `__ . Use of the Llama 2 model is governed by the Meta license. In order to download the model weights and tokenizer, please visit the above website and accept their License before requesting access. After access has been granted, you may use the download scripts provided by Meta to download the model weights and tokenizer to your cluster. Once you have downloaded the tokenizer and model weights, you can copy the ``tokenizer.model`` to the ``~/examples/llama2_lightning`` directory. Next let’s download and pre-process the dataset: .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh :language: shell :lines: 13 ``Note``: In case you see an error of the following form when downloading data: ``huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ubuntu/examples/llama2_lightning'. Use `repo_type` argument if needed.`` This could be because of a stale cache. Try deleting the cache using: .. code:: ipython3 sudo rm -rf /home/ubuntu/.cache/ At this point, you are all set to start training. Training Llama2-7B with Tensor Parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ By this step, the ParallelCluster is all setup for running experiments. Before we run training, we first pre-compile the graphs using the :ref:`neuron_parallel_compile `. Let’s run the command below: .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh :language: shell :lines: 17-20 This script uses a tensor-parallel size of 8. This will automatically set the zero-1 sharding degree to 16 (4 * 32 workers / tensor_parallel_size). ``Note``: You can use any number of nodes in this case, would just need to adjust the number of nodes in the above slurm command accordingly. Also, the number of nodes used in parallel_compile command should be same as the actual training run. This is because, as the number of nodes change, the data-parallel degree would change too. This would result in more workers participating in operations like `gradient all-reduce` which would result in new graphs getting created. Once the graphs are compiled we can now run training and observe our loss goes down. To run the training, we just run the above command but without ``neuron_parallel_compile``. .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh :language: shell :lines: 22-25 Training Llama2-13B/70B with Tensor Parallelism and Pipeline Parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here we use ``Llama70B`` as an example. To run 13B, simply change the script from ``run_llama_70b_tp_pp.sh`` to ``run_llama_13B_tp_pp.sh`` Before we run training, we first pre-compile the graphs using the :ref:`neuron_parallel_compile `. Let’s run the command below: Pre-compiling .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_70b.sh :language: shell :lines: 17-20 This script uses a tensor-parallel size of 8, pipeline-parallel size of 8 To run the training, we just use the above command but without ``neuron_parallel_compile``. .. literalinclude:: nxd-source-code/llama_tp_pp_ptl/llama_2_7b.sh :language: shell :lines: 22-25 Checkpointing: ^^^^^^^^^^^^^^ To enable checkpoint saving, add following flags to ``run_llama_7b_tp_ptl.sh``/ ``run_llama_13b_tp_pp.sh`` / ``run_llama_70B_tp_pp.sh``: * ``--save_checkpoint`` Add this flag to enable checkpoint saving * ``--checkpoint_freq`` Number of steps to save a checkpoint * ``--checkpoint_dir`` Direction to save the checkpoint * ``--num_kept_checkpoint`` Number of checkpoints to save, older checkpoint will be deleted manually, set to -1 to keep all saved checkpoints * ``--save_load_xser`` load with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load. Note that if the chekpoint is saved with xser, it can only be loaded with xser, vice versa. To enable checkpoint loading, add following flags to ``run_llama_7b_tp_ptl.sh``/ ``run_llama_13b_tp_pp.sh`` / ``run_llama_70B_tp_pp.sh``: * ``--resume_ckpt`` * ``--load_step`` Step to retrieve checkpoint from * ``--checkpoint_dir`` Direction to load the checkpoint from * ``--save_load_xser`` load with torch xla serialization to reduce time saving, it's recommended to enable xser for significantly faster save/load. Note that if the chekpoint is saved with xser, it can only be loaded with xser, vice versa. ================================================ FILE: archive/tutorials/tutorial_source_code/t5_finetuning/t5_finetuning_32_worker_training_code.sh ================================================ #!/bin/bash set -eExuo pipefail cd ~/transformers/examples/pytorch/summarization # Create run 32 worker script tee run_32w.sh > /dev/null < /dev/null < /dev/null < /dev/null <> temp_run_summarization.py mv temp_run_summarization.py run_summarization.py chmod +x run_summarization.py # Run run summarization to predict without generate NEURON_NUM_DEVICES=0 python3 ./run_summarization.py \ --model_name_or_path \ --dataset_name cnn_dailymail \ --dataset_config "3.0.0" \ --do_predict \ --predict_with_generate \ --source_prefix "summarize: " \ --per_device_eval_batch_size 4 \ --max_source_length 512 \ --pad_to_max_length \ --no_cuda \ --output_dir /tmp/tst-summarization |& tee log_run ================================================ FILE: archive/tutorials/tutorial_source_code/t5_finetuning/t5_modify_run_summarization_code.sh ================================================ #!/bin/bash set -eExuo pipefail cd ~/transformers/examples/pytorch/summarization # Insert code into run summarization to disable DDP for torchrun tee temp_run_summarization.py > /dev/null <> temp_run_summarization.py mv temp_run_summarization.py run_summarization.py chmod +x run_summarization.py ================================================ FILE: audit-report.md ================================================ # Frameworks Audit Report ## Orphaned Pages | File Path | Type | Reason | Action | |---|---|---|---| | frameworks/mxnet-neuron/container-sm-hosting-devflow.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/dlc-then-ec2-devflow.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/dlc-then-ecs-devflow.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/refman.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/rn.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/setup/mxnet-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/setup/mxnet-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/setup/mxnet-update-u22.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/tutorials/bert_mxnet/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/mxnet-neuron/tutorials/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/inference.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/container-sm-hosting-devflow.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/dlc-then-k8s-devflow.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/refman.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/rn.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/setup/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/tf1_faq.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v4_demo/code.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v4_demo/yolo_v4_demo.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-update.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuronx/tensorflow-neuron-quickstart.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuronx/tensorflow-neuron-supported-operators.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuronx/tutorials/inference/tensorflow-neuronx-serving-tutorial.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/inference.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuron/env-setup.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuron/setup/pytorch-update-al2.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuron/tutorials/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/note-setup-general.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.3.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.4.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.5.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.6.0-pytorch-install.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/pytorch-install-prev.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/pytorch-update.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/setup-inference.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/setup/setup-training.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/tutorials/inference/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/torch-neuronx/tutorials/training/index.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/torch/training.rst | .rst | Not in any toctree or cross-reference | Delete | | frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/uncased_L-24_H-1024_A-16.vocab.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/dropdown-neuron-setup.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/tab-inference-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/tab-training-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/torch-neuronx/api-reference-guide/training/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/torch-neuronx/programming-guide/inference/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/torch-neuronx/programming-guide/training/index.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | | frameworks/torch/torch-neuronx/setup/install-templates/pytorch-dev-install.txt | .txt (include fragment) | Not referenced by any .. include:: directive | Delete | ## Stale Pages | File Path | Staleness Indicators | Recommendation | |---|---|---| | frameworks/mxnet-neuron/misc-mxnet-neuron.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/mxnet-neuron/misc-mxnet-neuron.txt | References deprecated neuron-cc compiler | Will be archived | | frameworks/mxnet-neuron/rn.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/mxnet-neuron/setup/mxnet-install.rst | Amazon Linux 2 | Will be archived | | frameworks/mxnet-neuron/setup/mxnet-update.rst | Amazon Linux 2 | Will be archived | | frameworks/mxnet-neuron/tutorials/bert_mxnet/index.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuron/api-compilation-python-api.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuron/refman.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuron/rn.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.rst | Amazon Linux 2 | Will be archived | | frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-update.rst | Amazon Linux 2 | Will be archived | | frameworks/tensorflow/tensorflow-neuron/tf1_faq.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuron/tf2_faq.rst | Ubuntu 18.04 | Will be archived | | frameworks/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.rst | Amazon Linux 2 | Will be archived | | frameworks/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.rst | Amazon Linux 2 | Will be archived | | frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.rst | Amazon Linux 2 | Will be archived | | frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-update.rst | Amazon Linux 2 | Will be archived | | frameworks/torch/dropdown-neuron-setup.txt | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.rst | References deprecated neuron-cc compiler | Update or archive | | frameworks/torch/inference-torch-neuron.txt | References deprecated neuron-cc compiler | Update or archive | | frameworks/torch/torch-neuron/api-compilation-python-api.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/torch/torch-neuron/misc-inference-torch-neuron.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/torch/torch-neuron/misc-inference-torch-neuron.txt | References deprecated neuron-cc compiler | Will be archived | | frameworks/torch/torch-neuron/setup/pytorch-install.rst | Amazon Linux 2 | Will be archived | | frameworks/torch/torch-neuron/setup/pytorch-update.rst | Amazon Linux 2 | Will be archived | | frameworks/torch/torch-neuron/troubleshooting-guide.rst | References deprecated neuron-cc compiler | Will be archived | | frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace.rst | References deprecated neuron-cc compiler | Update or archive | | frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.rst | References deprecated neuron-cc compiler | Update or archive | | frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.rst | Ubuntu 20.04 | Update or archive | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.4.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.6.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.7.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.8.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.9.0-pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/pytorch-install.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/setup/pytorch-update.rst | Amazon Linux 2; torch-neuron setup/update with unsupported OS: Amazon Linux 2 | Update or archive | | frameworks/torch/torch-neuronx/training-troubleshooting.rst | Ubuntu 18.04; torch-neuron setup/update with unsupported OS: Ubuntu 18.04 | Update or archive | ================================================ FILE: build.sh ================================================ #!/bin/bash # build.sh - Docker + uv workflow for Neuron docs set -e IMAGE_NAME="neuron-docs" case "${1:-build}" in build) docker build -t "$IMAGE_NAME" . ;; html) docker run --rm -v "$(pwd):/docs" "$IMAGE_NAME" -c "sphinx-build -b html . _build/html -j auto" ;; shell) docker run --rm -it -v "$(pwd):/docs" "$IMAGE_NAME" ;; clean) rm -rf _build ;; *) echo "Usage: $0 {build|html|shell|clean}" exit 1 ;; esac ================================================ FILE: compiler/error-codes/EARG001.rst ================================================ .. _error-code-earg001: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EARG001. NCC_EARG001 =========== **Error message**: This error occurs when you attempt to use a Logical Neuron Core (LNC) configuration that is not supported by the target Neuron architecture. For example, a trn1 instance running the following code will run into this error: .. code-block:: python traced_model = torch_neuronx.trace( model, input, compiler_args=['--lnc', '2'] # ERROR: lnc=2 not supported on trn1 ) On trn1, only lnc=1 is supported. Physical Neuron Core: - Actual hardware compute unit on the chip - Has dedicated compute resources, memory, etc. Logical Neuron Core: - Software abstraction grouping multiple physical cores - Controlled via the NEURON_LOGICAL_NC_CONFIG environment variable or the --lnc flag (when using neuronx-cc directly) For more information: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/device-memory.html#logical-neuron-cores ================================================ FILE: compiler/error-codes/EBIR023.rst ================================================ .. _error-code-ebir023: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EBIR023. NCC_EBIR023 =========== **Error message**: MLP kernel intermediate size exceeds the maximum supported value of 4096. Consider tiling large intermediate tensors in your kernel to stay within the supported limit, or increase tensor parallelism to shard the intermediate dimension across more cores. ================================================ FILE: compiler/error-codes/EBVF030.rst ================================================ .. _error-code-ebvf030: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EBVF030. NCC_EBVF030 =========== **Error message**: The number of instructions generated exceeds the limit. Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/EHCA005.rst ================================================ .. _error-code-ehca005: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EHCA005. NCC_EHCA005 =========== **Error message**: The compiler encountered a custom call instruction with a target name that is not recognized. The Neuron compiler currently recognizes the following custom call targets: - AwsNeuronErf - AwsNeuronGelu - AwsNeuronGeluApprxTanh - AwsNeuronGeluBackward - AwsNeuronSilu - AwsNeuronSiluBackward - AwsNeuronRmsNorm - AwsNeuronSoftmax - AwsNeuronSoftmaxBackward - AwsNeuronCollectiveMatmul - AwsNeuronIntMatmult - AwsNeuronArgMax - AwsNeuronArgMin - AwsNeuronTopK - AwsNeuronDropoutMaskV1 - AwsNeuronCustomNativeKernel - AwsNeuronCustomOp - AwsNeuronDevicePrint - ResizeNearest - ResizeBilinear - ResizeNearestGrad - AwsNeuronLNCShardingConstraint - AwsNeuronTransferWithStaticRing - AwsNeuronModuleMarkerStart-Forward - AwsNeuronModuleMarkerStart-Backward - AwsNeuronModuleMarkerEnd-Forward - AwsNeuronModuleMarkerEnd-Backward - NeuronBoundaryMarker-Start - NeuronBoundaryMarker-End Erroneous code example: .. code-block:: python def lowering(ctx, x_val): result_type = ir.RankedTensorType(x_val.type) # This target name will not be recognized by HandleCustomCall return hlo.CustomCallOp( [result_type], [x_val], call_target_name="UNRECOGNIZED_TARGET", has_side_effect=ir.BoolAttr.get(False), ).results Use a supported custom call target: .. code-block:: python def lowering(ctx, x_val): result_type = ir.RankedTensorType(x_val.type) return hlo.CustomCallOp( [result_type], [x_val], call_target_name="AwsNeuronSilu", has_side_effect=ir.BoolAttr.get(False), backend_config=ir.StringAttr.get(""), api_version=ir.IntegerAttr.get(ir.IntegerType.get_signless(32), 2), ).results ================================================ FILE: compiler/error-codes/EOOM001.rst ================================================ .. _error-code-eoom001: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EOOM001. NCC_EOOM001 =========== **Error message**: The combined memory needed for the model tensors exceeds the high-bandwidth memory limit. The memory usage consists of: - I/O tensors: Input and output activation tensors - Internal allocations: Scratchpad memory for intermediate computations - SBUF spills: Data that cannot fit in on-chip SBUF memory and must spill to HBM There are several ways to potentially fix this issue. 1. Simply reduce the batch/tensor size if possible 2. Utilize pipeline/tensor parallelism via neuronx-distributed Short snippet of tensor parallelism: .. code-block:: python class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention): def __init__(self, config, position_embedding_type=None): super().__init__(config, position_embedding_type) self.query = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.key = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.value = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) # Since we shard the number of attention heads across tensor parallel # ranks, each rank would have a subset of heads, hence, we update # the num_attention_heads here. tp_size = parallel_state.get_tensor_parallel_size() self.num_attention_heads = self.num_attention_heads // tp_size self.all_head_size = self.all_head_size // tp_size For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/EOOM002.rst ================================================ .. _error-code-eoom002: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EOOM002. NCC_EOOM002 =========== **Error message**: The combined memory needed for the model tensors exceeds the high-bandwidth memory limit. The memory usage consists of: - I/O tensors: Input and output activation tensors - Internal allocations: Scratchpad memory for intermediate computations - SBUF spills: Data that cannot fit in on-chip SBUF memory and must spill to HBM There are several ways to potentially fix this issue. 1. Simply reduce the batch/tensor size if possible 2. Utilize pipeline/tensor parallelism via neuronx-distributed Short snippet of tensor parallelism: .. code-block:: python class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention): def __init__(self, config, position_embedding_type=None): super().__init__(config, position_embedding_type) self.query = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.key = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.value = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) # Since we shard the number of attention heads across tensor parallel # ranks, each rank would have a subset of heads, hence, we update # the num_attention_heads here. tp_size = parallel_state.get_tensor_parallel_size() self.num_attention_heads = self.num_attention_heads // tp_size self.all_head_size = self.all_head_size // tp_size For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/ESFH002.rst ================================================ .. _error-code-esfh002: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error ESFH002. NCC_ESFH002 =========== **Error message**: The compiler encountered a unsigned 64-bit integer constant with a value that cannot be safely converted to 32-bit representation. The Neuron hardware operates on 32-bit or narrower data types and attempts to convert 64-bit integers to 32-bit. 64-bit constants that exceed the 32-bit range and cannot be safely converted will fail compilation. Try to use uint32 for constants when possible and restructure code to avoid large constants. Erroneous code example: .. code-block:: python @jax.jit def foo(): # direct uint64 constant in arithmetic operation x = jnp.array([1, 2, 3], dtype=jnp.uint64) # large constant that exceeds uint32 max large_constant = jnp.uint64(5_000_000_000) return x + large_constant Use uint32 for constants when possible: .. code-block:: python @jax.jit def test(): x = jnp.array([1, 2, 3], dtype=jnp.uint32) large_constant = jnp.uint32(5_000_000_000) return x + large_constant ================================================ FILE: compiler/error-codes/ESPP004.rst ================================================ .. _error-code-espp004: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error ESPP004. NCC_ESPP004 =========== **Error message**: The compiler encountered a data type that is not supported for code generation. Erroneous code example: .. code-block:: python import numpy as np import jax.numpy as jnp import jax from jax._src import dtypes from jax._src.lax import lax as lax_internal # float4_e2m1fn type not supported dtype = np.dtype(dtypes.float4_e2m1fn) val = lax_internal._convert_element_type(0, dtype, weak_type=False) Use a supported data type: .. code-block:: python import numpy as np import jax.numpy as jnp import jax from jax._src import dtypes from jax._src.lax import lax as lax_internal # float4_e2m1fn type not supported dtype = jnp.bfloat16 val = lax_internal._convert_element_type(0, dtype, weak_type=False) More information on supported data types https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html ================================================ FILE: compiler/error-codes/ESPP047.rst ================================================ .. _error-code-espp047: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error ESPP047. NCC_ESPP047 =========== **Error message**: The compiler found usage of an unsupported 8-bit floating-point data type. Erroneous code example: .. code-block:: python class Model(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(10, 20) self.linear2 = nn.Linear(20, 10) def forward(self, x): x = self.linear1(x) x = torch.relu(x) x = self.linear2(x) return x # Unsupported 8-bit floating-point data type being used here input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fn) To fix this error: .. code-block:: python class Model(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(10, 20) self.linear2 = nn.Linear(20, 10) def forward(self, x): x = self.linear1(x) x = torch.relu(x) x = self.linear2(x) return x input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fn) # Convert to a supported type input_tensor = input_tensor.to(torch.float16) ================================================ FILE: compiler/error-codes/EUOC002.rst ================================================ .. _error-code-euoc002: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EUOC002. NCC_EUOC002 =========== **Error message**: An unsupported operator was used. Try using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation. Before: .. code-block:: python class Model(torch.nn.Module): def forward(self, A, b): return torch.triangular_solve(b, A) Possible workaround: .. code-block:: python class Model(torch.nn.Module): def forward(self, A, b): # Although slower than triangular_solve, this is mathematically equivalent A_inv = torch.inverse(A) return A_inv @ b ================================================ FILE: compiler/error-codes/EVRF001.rst ================================================ .. _error-code-evrf001: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF001. NCC_EVRF001 =========== **Error message**: An unsupported operator was used. Try using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation. Before: .. code-block:: python class Model(torch.nn.Module): def forward(self, A, b): return torch.triangular_solve(b, A) Possible workaround: .. code-block:: python class Model(torch.nn.Module): def forward(self, A, b): # Although slower than triangular_solve, this is mathematically equivalent A_inv = torch.inverse(A) return A_inv @ b ================================================ FILE: compiler/error-codes/EVRF004.rst ================================================ .. _error-code-evrf004: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF004. NCC_EVRF004 =========== **Error message**: Complex data types are not supported on the Neuron device. You cannot use complex data types (such as ``complex64``, ``complex128``, and others) on the Neuron device directly. One fix is to offload complex operations to CPU, like so: .. code-block:: python x = torch.tensor([1+2j, 3+4j], dtype=torch.complex64).to('cpu') .. note:: Since data transfer between CPU and device is expensive, this is best used when complex operations are rare. You can also address this error by manually emulating complex tensors using real and imaginary parts: .. code-block:: python real = x.real imag = x.imag ... # (a + bi) * (c + di) real_out = a_real * b_real - a_imag * b_imag imag_out = a_real * b_imag + a_imag * b_real ================================================ FILE: compiler/error-codes/EVRF005.rst ================================================ .. _error-code-evrf005: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF005. NCC_EVRF005 =========== **Error message**: The compiler found usage of F8E4M3FNUZ, F8E4M3B11FNUZ, or F8E5M2FNUZ data type which is not supported. Erroneous code example: .. code-block:: python class Model(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(10, 20) self.linear2 = nn.Linear(20, 10) def forward(self, x): x = self.linear1(x) x = torch.relu(x) x = self.linear2(x) return x input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fnuz) To fix this error: .. code-block:: python class Model(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(10, 20) self.linear2 = nn.Linear(20, 10) def forward(self, x): x = self.linear1(x) x = torch.relu(x) x = self.linear2(x) return x input_tensor = torch.randn(1, 10).to(torch.float8_e4m3fnuz) # Convert to a supported type input_tensor = input_tensor.to(torch.float16) * More information on supported data types: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html ================================================ FILE: compiler/error-codes/EVRF006.rst ================================================ .. _error-code-evrf006: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF006. NCC_EVRF006 =========== The compiler encountered a RNGBitGenerator operation using a random number generation algorithm other than RNG_DEFAULT. ----------------------------------------------------------------------------------------------------------------------- Ensure that you are using standard JAX/PyTorch random APIs and not explicity specifying an RNG algorithm. ================================================ FILE: compiler/error-codes/EVRF007.rst ================================================ .. _error-code-evrf007: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF007. NCC_EVRF007 =========== **Error message**: The number of instructions generated exceeds the limit. Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/EVRF009.rst ================================================ .. _error-code-evrf009: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF009. NCC_EVRF009 =========== **Error message**: The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. There are several ways to potentially fix this issue. 1. Simply reduce the batch/tensor size if possible 2. Utilize pipeline/tensor parallelism via neuronx-distributed Short snippet of tensor parallelism: .. code-block:: python class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention): def __init__(self, config, position_embedding_type=None): super().__init__(config, position_embedding_type) self.query = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.key = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.value = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) # Since we shard the number of attention heads across tensor parallel # ranks, each rank would have a subset of heads, hence, we update # the num_attention_heads here. tp_size = parallel_state.get_tensor_parallel_size() self.num_attention_heads = self.num_attention_heads // tp_size self.all_head_size = self.all_head_size // tp_size For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/EVRF010.rst ================================================ .. _error-code-evrf010: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF010. NCC_EVRF010 =========== **Error message**: The compiler encountered simultaneous use of input and kernel dilation, which is not supported. Erroneous code example: .. code-block:: python x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32) kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32) result = lax.conv_general_dilated( x, kernel, window_strides=(1, 1), padding=((2, 2), (2, 2)), lhs_dilation=(2, 2), # input dilation rhs_dilation=(2, 2), # kernel dilation dimension_numbers=('NHWC', 'HWIO', 'NHWC') ) If possible, use only only input or kernel dilation: .. code-block:: python x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32) kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32) result = lax.conv_general_dilated( x, kernel, window_strides=(1, 1), padding=((2, 2), (2, 2)), lhs_dilation=(1, 1), # no input dilation rhs_dilation=(2, 2), dimension_numbers=('NHWC', 'HWIO', 'NHWC') ) Or apply dilation manually and apply convolution to the remainder. ================================================ FILE: compiler/error-codes/EVRF011.rst ================================================ .. _error-code-evrf011: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF011. NCC_EVRF011 =========== **Error message**: The compiler encountered strided convolution combined with dilated input, which is not supported. Erroneous code example: .. code-block:: python x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32) kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32) result = lax.conv_general_dilated( x, kernel, window_strides=(2, 2), # strided convolution padding=((2, 2), (2, 2)), lhs_dilation=(2, 2), # and dilated input rhs_dilation=(1, 1), dimension_numbers=('NHWC', 'HWIO', 'NHWC') ) If possible, remove stride or input dilation: .. code-block:: python x = jnp.ones((1, 4, 4, 1), dtype=jnp.float32) kernel = jnp.ones((3, 3, 1, 1), dtype=jnp.float32) result = lax.conv_general_dilated( x, kernel, window_strides=(2, 2), padding=((2, 2), (2, 2)), lhs_dilation=(1, 1), # remove input dilation rhs_dilation=(1, 1), dimension_numbers=('NHWC', 'HWIO', 'NHWC') ) Or apply upsampling and downsampling separately. ================================================ FILE: compiler/error-codes/EVRF013.rst ================================================ .. _error-code-evrf013: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF013. NCC_EVRF013 =========== **Error message**: TopK does not support int32 or int64 input tensors. Erroneous code example: .. code-block:: python def forward(self, x): # assume x is an integer tensor # error: cannot call TopK on integer dtypes k = 5 values, indices = torch.topk(x, k=k, dim=-1) return values, indices To fix this error, you can cast your tensor to a supported floating point dtype. .. code-block:: python def forward(self, x): x = x.float() k = 5 values, indices = torch.topk(x, k=k, dim=-1) return values, indices ================================================ FILE: compiler/error-codes/EVRF015.rst ================================================ .. _error-code-evrf015: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF015. NCC_EVRF015 =========== **Error message**: The compiler encountered a custom call instruction with a target name that is not recognized. The Neuron compiler currently recognizes the following custom call targets: - AwsNeuronErf - AwsNeuronGelu - AwsNeuronGeluApprxTanh - AwsNeuronGeluBackward - AwsNeuronSilu - AwsNeuronSiluBackward - AwsNeuronRmsNorm - AwsNeuronSoftmax - AwsNeuronSoftmaxBackward - AwsNeuronCollectiveMatmul - AwsNeuronIntMatmult - AwsNeuronArgMax - AwsNeuronArgMin - AwsNeuronTopK - AwsNeuronDropoutMaskV1 - AwsNeuronCustomNativeKernel - AwsNeuronCustomOp - AwsNeuronDevicePrint - ResizeNearest - ResizeBilinear - ResizeNearestGrad - AwsNeuronLNCShardingConstraint - AwsNeuronTransferWithStaticRing - AwsNeuronModuleMarkerStart-Forward - AwsNeuronModuleMarkerStart-Backward - AwsNeuronModuleMarkerEnd-Forward - AwsNeuronModuleMarkerEnd-Backward - NeuronBoundaryMarker-Start - NeuronBoundaryMarker-End Erroneous code example: .. code-block:: python def lowering(ctx, x_val): result_type = ir.RankedTensorType(x_val.type) # This target name will not be recognized by HandleCustomCall return hlo.CustomCallOp( [result_type], [x_val], call_target_name="UNRECOGNIZED_TARGET", has_side_effect=ir.BoolAttr.get(False), ).results Use a supported custom call target: .. code-block:: python def lowering(ctx, x_val): result_type = ir.RankedTensorType(x_val.type) return hlo.CustomCallOp( [result_type], [x_val], call_target_name="AwsNeuronSilu", has_side_effect=ir.BoolAttr.get(False), backend_config=ir.StringAttr.get(""), api_version=ir.IntegerAttr.get(ir.IntegerType.get_signless(32), 2), ).results ================================================ FILE: compiler/error-codes/EVRF016.rst ================================================ .. _error-code-evr016: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVR016. NCC_EVRF016 =========== The NCC_EVRF016 error is raised when the Neuron compiler detects that you are trying to use an integer or boolean type with one of the restricted reduction functions. **Error message**: The scatter-reduce operation cannot perform reduction logic if the data being scattered or the destination tensor is using an integer or boolean data type. The hardware instructions used on the Neuron device for these specific scatter-and-reduce functions are optimized for and limited to floating-point arithmetic. When the compiler detects that you are trying to use an integer or boolean type with one of the restricted reduction functions, it stops the compilation process to prevent a hardware crash or incorrect calculation. **Example of the error** The following example shows the **NCC\_EVRF016** error because the :code:`input_tensor` is defined using an integer data type (:code:`torch.int32`) while being used with a reduction function (:code:`reduce='sum'`) in the :code:`scatter_reduce_` operation. .. code-block:: python def forward(self, input_tensor, indices_tensor, src_tensor): output = input_tensor.clone() output.scatter_reduce_( dim=1, index=indices_tensor, src=src_tensor, reduce='sum', ) return output # ERROR: using integer dtype with scatter-reduce input_tensor = torch.zeros(BATCH_SIZE, DIM_SIZE, dtype=torch.int32) ... **How to fix** To fix this error, you must cast your input and source tensors to a floating-point data type (e.g., torch.float32 or torch.bfloat16). .. code-block:: python def forward(self, input_tensor, indices_tensor, src_tensor): output = input_tensor.clone() output.scatter_reduce_( dim=1, index=indices_tensor, src=src_tensor, reduce='sum', ) return output # FIXED: changed to float32 # now works with scatter-reduce input_tensor = torch.zeros(BATCH_SIZE, DIM_SIZE, dtype=torch.float32) ... ================================================ FILE: compiler/error-codes/EVRF017.rst ================================================ .. _error-code-evrf017: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF017. NCC_EVRF017 =========== **Error message**: The compiler encountered a reduce-window operation with base dilation (input dilation) greater than 1, which is not supported. Erroneous code example: .. code-block:: python result = lax.reduce_window( x, -jnp.inf, lax.max, window_dimensions=(1, 1, 1, 1), window_strides=(1, 1, 1, 1), padding='VALID', base_dilation=(1, 2, 1, 1) # ERROR: applying base dilation of 2 in dimension 1 ) If possible, change base dilation to be all 1s: .. code-block:: python result = lax.reduce_window( x, -jnp.inf, lax.max, window_dimensions=(1, 1, 1, 1), window_strides=(1, 1, 1, 1), padding='VALID', base_dilation=(1, 1, 1, 1) # FIXED: all values are 1 (no dilation) ) Or consider manual dilation if necessary. ================================================ FILE: compiler/error-codes/EVRF018.rst ================================================ .. _error-code-evrf018: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF018. NCC_EVRF018 =========== **Error message**: The compiler encountered a reduce-window operation with window dilation greater than 1, which is not supported. Erroneous code example: .. code-block:: python result = lax.reduce_window( jnp.ones((1, 4, 4, 1)), -jnp.inf, lax.max, window_dimensions=(1, 2, 2, 1), window_strides=(1, 1, 1, 1), padding='VALID', window_dilation=(1, 2, 2, 1) # 2 is greater than 1 ) If possible, remove window_dilation or change values to be all 1s: .. code-block:: python result = lax.reduce_window( jnp.ones((1, 4, 4, 1)), -jnp.inf, lax.max, window_dimensions=(1, 2, 2, 1), window_strides=(1, 1, 1, 1), padding='VALID', window_dilation=(1, 1, 1, 1) ) Or consider manual dilation if necessary. ================================================ FILE: compiler/error-codes/EVRF019.rst ================================================ .. _error-code-evrf019: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF019. NCC_EVRF019 =========== **Error message**: The compiler encountered a reduce-window operation with more or less than 2 operands. Support for reduce_window is available for exactly one input tensor and one initial value for reduction. Erroneous code example: .. code-block:: python # reduce-window operation with more or less than 2 operands is not supported # 4 operands are being provided instead of 2 lax.reduce_window( (x, x), # ERROR: a tuple of two input tensors (-jnp.inf, jnp.inf), # ERROR: a tuple of two initial values lambda a, b: (jnp.maximum(a[0], b[0]), jnp.minimum(a[1], b[1])), window_dimensions=(1, 2, 2, 1), window_strides=(1, 2, 2, 1), padding='VALID' ) If possible, split multi-operand reduce_window with multiple single-operand reduce_window operations. .. code-block:: python # For max pooling # 2 operands are correctly being provided max_pool = lax.reduce_window( x, # FIXED: a single input tensor -jnp.inf, # FIXED: a single initial value lax.max, window_dimensions=(1, 2, 2, 1), window_strides=(1, 2, 2, 1), padding='VALID' ) # For min pooling # 2 operands are correctly being provided min_pool = lax.reduce_window( x, # FIXED: a single input tensor jnp.inf, # FIXED: a single initial value lax.min, window_dimensions=(1, 2, 2, 1), window_strides=(1, 2, 2, 1), padding='VALID' ) ================================================ FILE: compiler/error-codes/EVRF022.rst ================================================ .. _error-code-evrf022: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF022. NCC_EVRF022 =========== **Error message**: Shift-right-arithmetic operation on non 32-bit inputs is not supported. Cast the first argument's data type to be S32, U32, or F32. Erroneous code example: .. code-block:: python def forward(self, input, other): return torch.bitwise_right_shift(input, other) # This will be the first argument and must be 32-bit input = torch.tensor([16, 32, 64], dtype=torch.int16) # The second argument can be non 32-bit other = torch.tensor([1, 2, 3], dtype=torch.int16) To fix this error: .. code-block:: python def forward(self, input, other): return torch.bitwise_right_shift(input, other) # Correctly setting the first argument to be 32-bit input = torch.tensor([16, 32, 64], dtype=torch.int32) other = torch.tensor([1, 2, 3], dtype=torch.int16) ================================================ FILE: compiler/error-codes/EVRF031.rst ================================================ .. _error-code-evrf031: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EVRF031. NCC_EVRF031 =========== **Error message**: The compiler encountered a scatter out-of-bounds error. The indices created via iota instruction contain values that are beyond the size of the operand dimension. Erroneous code example: .. code-block:: python # size 3 in dimension 0 operand = jnp.zeros((3, 4), dtype=jnp.float32) # iota generates indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] indices = lax.iota(jnp.int32, 10) # ERROR: size 10 > operand dimension 3 indices = indices.reshape(10, 1) updates = jnp.ones((10, 4), dtype=jnp.float32) # ERROR: 10 updates but operand only has 3 rows result = lax.scatter( operand, indices, # ERROR: index values in [0, 10) but operand dimension only allows indices in [0, 3) updates, lax.ScatterDimensionNumbers( update_window_dims=(1,), inserted_window_dims=(0,), scatter_dims_to_operand_dims=(0,) ) ) Ensure that the iota size matches the operand dimension size: .. code-block:: python N = 3 D = 4 operand = jnp.zeros((N, D), dtype=jnp.float32) # FIXED: match iota size to operand dimension indices = lax.iota(jnp.int32, N) # size N is same as operand dimension indices = indices.reshape(N, 1) # FIXED: updates size matches operand dimension updates = jnp.ones((N, D), dtype=jnp.float32) result = lax.scatter( operand, indices, # FIXED: indices now in valid range [0, 3) updates, lax.ScatterDimensionNumbers( update_window_dims=(1,), inserted_window_dims=(0,), scatter_dims_to_operand_dims=(0,) ) ) ================================================ FILE: compiler/error-codes/EXSP001.rst ================================================ .. _error-code-exsp001: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EXSP001. NCC_EXSP001 =========== The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. ------------------------------------------------------------------------------------------------------ There are several ways to potentially fix this issue. 1. Simply reduce the batch/tensor size if possible 2. Utilize pipeline/tensor parallelism via neuronx-distributed Short snippet of tensor parallelism: .. code-block:: python class ParallelSelfAttention(transformers.models.bert.modeling_bert.BertSelfAttention): def __init__(self, config, position_embedding_type=None): super().__init__(config, position_embedding_type) self.query = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.key = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) self.value = ColumnParallelLinear(config.hidden_size, self.all_head_size, gather_output=False) # Since we shard the number of attention heads across tensor parallel # ranks, each rank would have a subset of heads, hence, we update # the num_attention_heads here. tp_size = parallel_state.get_tensor_parallel_size() self.num_attention_heads = self.num_attention_heads // tp_size self.all_head_size = self.all_head_size // tp_size For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/EXTP004.rst ================================================ .. _error-code-extp004: .. meta:: :description: AWS Neuron SDK Graph Compiler error code documentation for error EXTP004. NCC_EXTP004 =========== **Error message**: The number of instructions generated exceeds the limit. Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. For more information: - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html#api-guide - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html ================================================ FILE: compiler/error-codes/index.rst ================================================ .. meta:: :description: "Neuron Compiler error code documentation home." :date-modified: 12/02/2025 .. _ncc-errors-home: Neuron Compiler Error Codes ============================ This page lists the error codes you can encounter while developing with the Neuron Compiler. For more details on any individual error, click the link for that error code in the table below. .. list-table:: :header-rows: 1 * - Error Code - Error Message - Recommendation * - :ref:`NCC_EARG001 ` - Unsupported Logical Neuron Core (LNC) configuration. - You attempted to use a Logical Neuron Core configuration that is not supported by the target Neuron architecture. * - :ref:`NCC_EBIR023 ` - MLP kernel intermediate size exceeds the maximum supported value of 4096. - Consider tiling large intermediate tensors in your kernel to stay within the supported limit, or increase tensor parallelism to shard the intermediate dimension across more cores. * - :ref:`NCC_EBVF030 ` - The number of instructions generated exceeds the limit. - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. * - :ref:`NCC_EHCA005 ` - The compiler encountered a custom call instruction with a target name that is not recognized. - Use a supported custom call target from the list of recognized targets. * - :ref:`NCC_EOOM001 ` - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed. * - :ref:`NCC_EOOM002 ` - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed. * - :ref:`NCC_ESFH002 ` - The compiler encountered a unsigned 64-bit integer constant with a value that cannot be safely converted to 32-bit representation. - Try to use uint32 for constants when possible and restructure code to avoid large constants. * - :ref:`NCC_ESPP004 ` - The compiler encountered a data type that is not supported for code generation. - Use a supported data type as listed in the Neuron documentation. * - :ref:`NCC_ESPP047 ` - Unsupported 8-bit floating-point data type. - The compiler found usage of an unsupported 8-bit floating-point data type. Convert to a supported type like torch.float16. * - :ref:`NCC_EUOC002 ` - An unsupported operator was used. - Try using alternative operators from the full list of supported operators via `neuronx-cc list-operators --framework XLA` to workaround the limitation. * - :ref:`NCC_EVRF001 ` - An unsupported operator was used. - Try using alternative operators from the full list of supported operators to workaround the limitation. * - :ref:`NCC_EVRF004 ` - Complex data types are not supported on the Neuron device. - You cannot use complex data types (such as ``complex64``, ``complex128``, and others) on the Neuron device directly. * - :ref:`NCC_EVRF005 ` - Unsupported F8E4M3FNUZ, F8E4M3B11FNUZ, or F8E5M2FNUZ data type. - The compiler found usage of unsupported 8-bit floating-point data types. Convert to a supported type like torch.float16. * - :ref:`NCC_EVRF006 ` - The compiler encountered a RNGBitGenerator operation using a random number generation algorithm other than RNG_DEFAULT. - Ensure that you are using standard JAX/PyTorch random APIs and not explicity specifying an RNG algorithm. * - :ref:`NCC_EVRF007 ` - The number of instructions generated exceeds the limit. - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. * - :ref:`NCC_EVRF009 ` - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed. * - :ref:`NCC_EVRF010 ` - The compiler encountered simultaneous use of input and kernel dilation, which is not supported. - If possible, use only input or kernel dilation, not both simultaneously. * - :ref:`NCC_EVRF011 ` - The compiler encountered strided convolution combined with dilated input, which is not supported. - If possible, remove stride or input dilation, or apply upsampling and downsampling separately. * - :ref:`NCC_EVRF013 ` - TopK does not support integer input tensors (int32, int64). - The TopK operation cannot be performed on integer data types. * - :ref:`NCC_EVRF015 ` - The compiler encountered a custom call instruction with a target name that is not recognized. - Use a supported custom call target from the list of recognized targets. * - :ref:`NCC_EVRF016 ` - The scatter-reduce operation cannot perform reduction logic if the data being scattered or the destination tensor is using an integer or boolean data type. - Cast your input and source tensors to a floating-point data type (e.g., torch.float32 or torch.bfloat16). * - :ref:`NCC_EVRF017 ` - Reduce-window operation with base dilation greater than 1 is not supported. - Change base dilation to be all 1s or consider manual dilation if necessary. * - :ref:`NCC_EVRF018 ` - Reduce-window operation with window dilation greater than 1 is not supported. - Remove window_dilation or change values to be all 1s, or consider manual dilation if necessary. * - :ref:`NCC_EVRF019 ` - The compiler encountered a reduce-window operation with more or less than 2 operands. - If possible, split multi-operand reduce_window with multiple single-operand reduce_window operations. * - :ref:`NCC_EVRF022 ` - Shift-right-arithmetic operation on non 32-bit inputs is not supported. Cast the first argument's data type to be S32, U32, or F32. - You need to use 32-bit data types for shift operations. Cast inputs to int32, uint32, or float32. - Reduce batch/tensor size or utilize tensor parallelism via neuronx-distributed. * - :ref:`NCC_EVRF031 ` - The compiler encountered a scatter out-of-bounds error. - Ensure that the iota size matches the operand dimension size. * - :ref:`NCC_EXSP001 ` - The combined memory needed for the model's activation tensors exceeds the high-bandwidth memory limit. - You may need to reduce batch/tensor size or utilize pipeline/tensor parallelism via neuronx-distributed. * - :ref:`NCC_EXTP004 ` - The number of instructions generated exceeds the limit. - Consider applying model parallelism as partitioning the model will help break large computational graphs into smaller subgraphs. .. toctree:: :hidden: :maxdepth: 1 EARG001 EBIR023 EBVF030 EHCA005 EOOM001 EOOM002 ESFH002 ESPP004 ESPP047 EUOC002 EVRF001 EVRF004 EVRF005 EVRF006 EVRF007 EVRF009 EVRF010 EVRF011 EVRF013 EVRF015 EVRF016 EVRF017 EVRF018 EVRF019 EVRF022 EVRF031 EXSP001 EXTP004 ================================================ FILE: compiler/index.rst ================================================ .. _neuron_cc: Neuron Graph Compiler ====================== The Neuron Graph Compiler is a sophisticated compilation system that transforms Machine Learning models from various frameworks (TensorFlow, MXNet, PyTorch, XLA HLO) into highly optimized code for AWS Neuron accelerators. It performs deep analysis of model structure, applies hardware-specific optimizations, and generates executable code tailored for maximum performance on Neuron hardware. The Neuron compiler is available in two versions to support different AWS ML accelerator architectures: * **neuronx-cc**: The newer XLA-based compiler supporting NeuronCores v2 architecture (Trn1, Inf2, Trn1n, Trn2). This compiler leverages the XLA (Accelerated Linear Algebra) framework to provide advanced optimizations for modern ML workloads. * **neuron-cc**: The TVM-based compiler supporting NeuronCores v1 architecture (Inf1). This compiler uses the TVM (Tensor Virtual Machine) framework as its foundation. Key capabilities of the Neuron Graph Compiler include: * **Performance optimization**: Intelligently converts FP32 operations to more efficient formats (BF16/FP16/TF32/FP8) with configurable precision-performance tradeoffs. By default, the compiler automatically casts FP32 matrix multiplication operations to BF16 for optimal performance while maintaining accuracy. * **Model-specific optimizations**: Provides specialized optimizations for different model architectures: * **Generic**: Applies general optimizations suitable for all model types * **Transformer**: Implements specific optimizations for transformer-based architectures like BERT, GPT, and other attention-based models * **U-Net**: Applies specialized memory optimizations for U-Net architectures to prevent performance-impacting data transfers * **Distributed training support**: Enables efficient large language model (LLM) training through distribution strategies that shard parameters, gradients, and optimizer states across data-parallel workers. * **Advanced memory management**: Optimizes memory usage for large models through techniques like model sharding across multiple NeuronCores, with configurable logical NeuronCore settings to control sharding degree. * **Optimization levels**: Provides multiple optimization levels (1-3) to balance compilation time against runtime performance, allowing users to choose the appropriate tradeoff for their workflow. * **Mixed precision support**: Offers fine-grained control over precision and performance through auto-casting options, supporting multiple numeric formats (FP32, TF32, FP16, BF16, FP8) with different strengths in dynamic range and numeric precision. The compilation process is typically transparent to users, as the compiler is invoked automatically within ML frameworks through Neuron Framework plugins. Models are analyzed, optimized, and compiled into a NEFF file (Neuron Executable File Format), which is then loaded by the :doc:`Neuron Runtime ` for execution on Neuron devices. .. grid:: 1 :gutter: 3 .. grid-item-card:: Neuron Graph Compiler Component Release Notes :link: /release-notes/components/compiler :link-type: doc Review the Neuron Graph Compiler release notes for all versions of the Neuron SDK. .. tab-set:: .. tab-item:: Neuron Graph Compiler (neuronx-cc) for Trn1 & Inf2 .. grid:: 1 :gutter: 3 .. grid-item-card:: CLI Reference Guide :link: neuron-compiler-cli-reference-guide :link-type: ref Neuron Compiler CLI Reference Guide .. grid-item-card:: Graph Compiler Developer Guide :link: neuronx-cc-training-mixed-precision :link-type: ref Mixed precision training guide .. grid-item-card:: Graph Compiler Error Code Reference :link: ncc-errors-home :link-type: ref Error code reference .. grid-item-card:: How To Convolute Kernels in UNet Training Models :link: implement-convolution-kernels-unet :link-type: ref Learn how to modify UNet training models to use convolution kernels with the AWS Neuron SDK. .. grid-item-card:: Graph Compiler FAQ :link: neuronx_compiler_faq :link-type: ref Frequently asked questions .. tab-item:: Neuron Graph Compiler (neuron-cc) for Inf1 .. grid:: 1 :gutter: 3 .. grid-item-card:: Graph Compiler API Reference Guide :link: neuron-compiler-cli-reference :link-type: ref Neuron Compiler CLI Reference .. grid-item-card:: Graph Compiler Developer Guide :link: neuron-cc-training-mixed-precision :link-type: ref Mixed precision training guide .. grid-item-card:: Graph Compiler FAQ :link: neuron_compiler_faq :link-type: ref Frequently asked questions .. toctree:: :maxdepth: 2 :hidden: /compiler/neuronx-cc /compiler/neuron-cc Error Codes
Release Notes ================================================ FILE: compiler/neuron-cc/api-reference-guide.rst ================================================ API Reference Guide =================== .. toctree:: :maxdepth: 1 /compiler/neuron-cc/command-line-reference ================================================ FILE: compiler/neuron-cc/command-line-reference.rst ================================================ .. _neuron-compiler-cli-reference: Neuron compiler CLI Reference Guide (``neuron-cc``) =================================================== This document describes the command line interface of the Neuron compiler. This reference is not relevant for applications that run neuron-cc from within a machine learning framework (TensorFlow-Neuron for example) since these options are passed from the framework directly to neuron-cc. Using neuron-cc on the command line may be desirable for applications that do not use a framework, or customize existing frameworks. It is also possible to supply CLI commands to the framework as options to be passed through to the compiler. Usage -------- Optional parameters are shown in square brackets. See the individual framework guides for the correct syntax. .. _neuron_cli: .. rubric:: Neuron Compiler CLI .. program:: neuron-cc .. option:: neuron-cc [options] [parameters] Common options for the Neuron CLI: - :option:`--verbose` (string) default=“WARN”: Valid values: - :option:`DEBUG` - :option:`INFO` - :option:`WARN` - :option:`ERROR` Use :option:`neuron-cc --help` for information on a specific command. Available Commands: ~~~~~~~~~~~~~~~~~~~ - :option:`compile` - :option:`list-operators` .. option:: neuron-cc compile [parameters] Compile a model for use on the AWS Inferentia Machine Learning Accelerator. .. code-block:: neuron-cc compile --framework --io-config [--neuroncore-pipeline-cores ] [--enable-saturate-infinity] [--enable-fast-loading-neuron-binaries] [--enable-fast-context-switch] [--fp32-cast cast-method] [--fast-math cast-method] [--output ] **Compile Parameters:** - :option:``: Input containing model specification. The number of arguments required varies between frameworks: - **TENSORFLOW**: A local filename or URI of a TensorFlow Frozen GraphDef (.pb); or the name of a local directory containing a TensorFlow SavedModel. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto for the associated .proto schema for TensorFlow Frozen GraphDefs. See https://www.tensorflow.org/guide/saved_model for more information on the SavedModel format. - **MXNET**: List of local filenames or URIs where input architecture .json file and parameter .param file are stored. These contains information related to the architecture of your graph and associated parameters, respectively. - :option:`--framework` (string): Framework in which the model was trained. Valid values: - :option:`TENSORFLOW` - :option:`MXNET` - :option:`XLA` - :option:`--neuroncore-pipeline-cores` (int) (default=1): Number of neuron cores to be used in "NeuronCore Pipeline" mode. This is different from data parallel deployment (same model on multiple neuron cores). Refer to Runtime/Framework documentation for data parallel deployment options. Compile for the given number of neuron cores so as to leverage NeuronCore Pipeline mode. .. note:: This is not used to define the number of Neuron Cores to be used in a data parallel deployment (ie the same model on multiple Neuron Cores). That is a runtime/framework configuration choice. - :option:`--output` (string) (default=“out.neff”): Filename where compilation output (NEFF archive) will be recorded. - :option:`--io-config` (string): Configuration containing the names and shapes of input and output tensors. The io-config can be specified as a local filename, a URI, or a string containing the io-config itself. The io-config must be formatted as a JSON object with two members “inputs” and “outputs”. “inputs” is an object mapping input tensor names to an array of shape and data type. “outputs” is an array of output tensor names. Consider the following example: .. code-block:: json { "inputs": { "input0:0": [[1,100,100,3], "float16"], "input1:0": [[1,100,100,3], "float16"] }, "outputs": ["output:0"] } - :option:`--enable-saturate-infinity` : Convert +/- infinity values to MAX/MIN_FLOAT for certain computations that have a high risk of generating Not-a-Number (NaN) values. There is a potential performance impact during model execution when this conversion is enabled. - :option:`--enable-fast-loading-neuron-binaries` : Write the compilation output (NEFF archive) in uncompressed format which results in faster loading of the archive during inference. - :option:`--enable-fast-context-switch` : Optimize for faster model switching rather than inference latency. This results in overall faster system performance when your application switches between models frequently on the same neuron core (or set of cores). The optimization triggered by this option for example defers loading some weight constants until the start of inference. - :option:`--fast-math` : Controls tradeoff between performance and accuracy for fp32 operators. See more suggestions on how to use this option with the below arguments in :ref:`neuron-cc-training-mixed-precision`. - ``all`` (Default): enables all optimizations that improve performance. This option can potentially lower precision/accuracy. - ``none`` : Disables all optimizations that improve performance. This option will provide best precision/accuracy. - Tensor transpose options - ``fast-relayout``: Only enables fast relayout optimization to improve performance by using the matrix multiplier for tensor transpose. The data type used for the transpose is either FP16 or BF16, which is controlled by the ``fp32-cast-xxx`` keyword. - ``no-fast-relayout``: Disables fast relayout optimization which ensures that tensor transpose is bit-accurate (lossless) but slightly slower. - Casting options - ``fp32-cast-all`` (Default): Cast all FP32 operators to BF16 to achieve highest performance and preserve dynamic range. Same as setting ``--fp32-cast all``. - ``fp32-cast-all-fp16``: Cast all FP32 operators to FP16 to achieve speed up and increase precision versus BF16. Same setting as ``--fp32-cast all-fp16``. - ``fp32-cast-matmult``: Only cast FP32 operators that use Neuron Matmult engine to BF16 while using FP16 for matmult-based transpose to get better accuracy. Same as setting ``--fp32-cast matmult``. - ``fp32-cast-matmult-bf16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to BF16 to preserve dynamic range. Same as setting ``--fp32-cast matmult-bf16``. - ``fp32-cast-matmult-fp16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to fp16 to better preserve precision. Same as setting ``--fp32-cast matmult-fp16``. .. important :: * ``all`` and ``none`` are mutually exclusive * ``all`` is equivalent to using ``fp32-cast-all fast-relayout`` (best performance) * ``none`` is equivalent to using ``fp32-cast-matmult-bf16 no-fast-relayout`` (best accuracy) * ``fp32-cast-*`` options are mutually exclusive * ``fast-relayout`` and ``no-fast-relayout`` are mutually exclusive * The ``fp32-cast-*`` and ``*-fast-relayout`` options will overwrite the default behavior in ``all`` and ``none``. * For backward compatibility, the ``--fp32-cast`` option has higher priority over ``--fast-math``. It will overwrite the FP32 casting options in any of the ``--fast-math`` options if ``--fp32-cast`` option is present explicitly. - :option:`--fp32-cast` : Refine the automatic casting of fp32 tensors. This is being replaced by a newer --fast-math. .. important :: * ``--fp32-cast`` option is being deprecated and ``--fast-math`` will replace it in future releases. * ``--fast-math`` is introducing the ``no-fast-relayout`` option to enable lossless transpose operation. The ``--fp32-cast`` is an interface for controlling the performance and accuracy tradeoffs. Many of the ``--fast-math`` values invoke (override) it. - ``all`` (default): Cast all FP32 operators to BF16 to achieve speed up and preserve dynamic range. - ``matmult``: Cast only FP32 operators that use Neuron Matmult engine to BF16 while using fp16 for matmult-based transpose to get better accuracy. - ``matmult-fp16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to fp16 to better preserve precision. - ``matmult-bf16``: Cast only FP32 operators that use Neuron Matmult engine (including matmult-based transpose) to BF16 to preserve dynamic range. - ``all-fp16``: Cast all FP32 operators to FP16 to achieve speed up and better preserve precision. **Log Levels:** Logs at levels “trace”, “debug”, and “info” will be written to STDOUT. Logs at levels “warn”, “error”, and “fatal” will be written to STDERR. **Exit Status** **0** - Compilation succeeded **>0** - An error occurred during compilation. **Examples** Compiling a saved TensorFlow model: .. code-block:: shell neuron-cc compile test_graph_tfmatmul.pb --framework TENSORFLOW --io-config test_graph_tfmatmul.config Compiling a MXNet model: .. code-block:: shell neuron-cc compile lenet-symbol.json lenet-0001.params --framework MXNET --neuroncore-pipeline-cores 2 --output file.neff Compiling an XLA HLO: .. code-block:: shell neuron-cc compile bert-model.hlo --framework XLA --output file.neff .. _neuron-cc-list-operators: .. option:: neuron-cc list-operators [parameters] .. _description-1: Returns a newline ('n') separated list of operators supported by the NeuronCore. - **TENSORFLOW**: Operators will be formatted according to the value passed to the associated REGISTER_OP(“OperatorName”) macro. See https://www.tensorflow.org/guide/create_op#define_the_op_interface for more information regarding operator registration in TensorFlow. - **MXNET**: Operator names will be formatted according to the value passed to the associated NNVM_REGISTER_OP(operator_name) macro. - **XLA**: Operator names will be formatted according to the value used by XLA compiler in XlaBuilder. See https://www.tensorflow.org/xla/operation_semantics for more information regarding XLA operator semantics in XLA interface. .. code-block:: shell neuron-cc list-operators --framework .. _options-1: - :option:`--framework` (string): Framework in which the operators were registered. Valid values: - :option:`TENSORFLOW` - :option:`MXNET` - :option:`XLA` **Exit Status** **0** - Call succeeded **>0** - An error occurred **Example** .. code-block:: shell $ neuron-cc list-operators --framework TENSORFLOW AddN AdjustContrastv2 CheckNumbers ... ================================================ FILE: compiler/neuron-cc/developer-guide.rst ================================================ Developer Guide =================== .. toctree:: :maxdepth: 1 /about-neuron/appnotes/neuron-cc/mixed-precision ================================================ FILE: compiler/neuron-cc/faq.rst ================================================ .. _neuron_compiler_faq: Neuron Compiler FAQ (``neuron-cc``) =================================== .. contents:: Table of contents :local: :depth: 1 Where can I compile to Neuron? --------------------------------- The one-time compilation step from the standard framework-level model to NEFF binary may be performed on any EC2 instance or even on-premises. We recommend using a high-performance compute server of choice (C5 or z1d instance types), for the fastest compile times and ease of use with a prebuilt `DLAMI `__. Developers can also install Neuron in their own environments; this approach may work well for example when building a large fleet for inference, allowing the model creation, training and compilation to be done in the training fleet, with the NEFF files being distributed by a configuration management application to the inference fleet. My current Neural Network is based on FP32, how can I use it with Neuron? ------------------------------------------------------------------------- Developers who want to train their models in FP32 for best accuracy can compile and deploy them with Neuron. The Neuron compiler automatically converts FP32 to internally supported datatypes, such as FP16 or BF16. You can find more details about FP32 data type support and performance and accuracy tuning in :ref:`neuron-cc-training-mixed-precision`. The Neuron compiler preserves the application interface - FP32 inputs and outputs. Transferring such large tensors may become a bottleneck for your application. Therefore, you can improve execution time by casting the inputs and outputs to FP16 or BF16 in the ML framework prior to compilation for Inferentia. What are some of the important compiler defaults I should be aware of? ----------------------------------------------------------------------- The compiler compiles the input graph for a single NeuronCore by default. Using the :option:`--neuroncore-pipeline-cores` option directs the compiler to partition so as to run on a specified number of NeuronCores. This number can be less than the total available NeuronCores on an instance. See :ref:`inferentia-arch` for more information on NeuronCores. Which operators does Neuron support? --------------------------------------- see :ref:`neuron-supported-operators`. You can also use the "neuron-cc list-operators" command on the cli to list the operators. See :ref:`neuron-cc-list-operators` If your model contains operators missing from the above list, and you can't reach your performance goals, please post a message on the Neuron developer forum or open a github issue to let us know. Any operators that Neuron doesn't support? --------------------------------------------- Models with control-flow and dynamic shapes are not supported. You will need to partition the model using the framework prior to compilation. See the :ref:`neuron-cc`. Will I need to recompile again if I updated runtime/driver version? ---------------------------------------------------------------------- The compiler and runtime are committed to maintaining compatibility for major version releases with each other. The versioning is defined as major.minor, with compatibility for all versions with the same major number. If the versions mismatch, an error notification is logged and the load will fail. This will then require the model to be recompiled. I have a NEFF binary, how can I tell which compiler version ----------------------------------------------------------- generated it?** We will bring a utility out to help with this soon. How long does it take to compile? ------------------------------------ It depends on the model and its size and complexity, but this generally takes a few minutes. ================================================ FILE: compiler/neuron-cc.rst ================================================ .. _neuron-cc-index: Neuron Compiler for Inf1 ======================== .. toctree:: :maxdepth: 1 API Reference Guide CLI Reference Developer Guide FAQ ================================================ FILE: compiler/neuronx-cc/api-reference-guide/index.rst ================================================ .. _neuron-compiler-cli-reference-guide: Neuron Compiler CLI Reference Guide (``neuronx-cc``) ==================================================== This document describes the command line interface of the Neuron Compiler. This reference is not relevant for applications that run the Neuron Compiler from within a machine learning framework (:ref:`PyTorch-Neuron ` for example) since these options are passed from the framework directly to the compiler. Using the compiler command line may be desirable for applications that do not use a framework or customize existing frameworks. It is also possible to specify compiler options within the framework which will forward these options to the compiler using :ref:`NEURON_CC_FLAGS `. .. contents:: Table of Contents :local: :depth: 3 Usage ----- *Optional parameters are shown in square brackets.* .. _neuron_cli: .. rubric:: Neuron Compiler Command-Line Interface .. program:: neuronx-cc .. option:: neuronx-cc [parameters] Available Commands ------------------ - ``compile`` - ``list-operators`` Common parameters for the Neuron CLI: - ``--help``: Display a usage message of compiler options. Use ``neuronx-cc --help`` for information on a specific command. .. _neuronx-cc-compile: 'compile' Command ----------------- .. option:: neuronx-cc compile [parameters] .. _description-1: Compile a model for use on the AWS Machine Learning Accelerator. .. code-block:: shell neuronx-cc compile --framework --target [--model-type ] [--auto-cast ] [--auto-cast-type ] [--distribution-strategy ] [--logical-nc-config ], or [-lnc ] [--optlevel ], or [-O ] [--enable-mixed-precision-accumulation] [--enable-saturate-infinity] [--enable-fast-context-switch] [--enable-fast-loading-neuron-binaries] [--logfile ] [--output ] [--verbose ] Parameters ~~~~~~~~~~ - ````: Input containing model specification. The number of arguments required varies between frameworks: - **XLA**: A local filename of a HLO file (hlo.pb) generated via XLA. See `hlo.proto `_ for the .proto description and `inspect-compiled-programs `_ for more information on how to generate such files. - ``--framework ``: Framework used to generate training model. Valid values: - ``XLA`` - ``--target ``: Name of the Neuron instance family on which the compiled model will be run. Valid values: - ``inf2`` - ``trn1`` - ``trn1n`` - ``trn2`` - ``--model-type ``: Permit the compiler to attempt model-specific optimizations based upon type of model being compiled. (Default: ``generic``) Valid values: - ``generic``: Perform optimizations applicable to all types of inference and training models. - ``transformer``: Perform optimizations specific to `Transformer `_ models. - ``unet-inference``: Perform optimizations specific to certain `U-Net `_ model architectures when performing inference. U-Net models often have certain structures that result in excessive performance-impacting data transfers; this option allows the compiler to apply additional memory optimizations to prevent these data transfers and also allows the compiler to map larger normalization operators which would otherwise not successfully execute. - ``--auto-cast ``: Controls how the compiler makes tradeoffs between performance and accuracy for FP32 operations. (Default: ``none``) Valid values: - ``none``: (default) Leave all data types as defined in the model. Do not apply auto-casting data type optimizations. - ``matmult``: Only cast FP32 operations that use the Neuron matrix-multiplication engine. - ``all``: Cast all FP32 operations to achieve highest performance. This option can potentially lower precision/accuracy. A more complete discussion on how to use this option and its arguments is in :ref:`Mixed Precision and Performance-accuracy Tuning for Training `. .. note:: If the ``--auto-cast`` option is specified, the ``--auto-cast-type`` compiler flag can be optionally set to define which lower-precision data type the compiler should use. - ``--auto-cast-type ``: When auto-cast mode is enabled, cast the FP32 operators to the lower-precision data type specified by this option. (Default: ``bf16``) Valid values: - ``bf16``: Cast the FP32 operations selected via the ``--auto-cast`` option to BF16 to achieve highest performance and preserve dynamic range. - ``fp16``: Cast the FP32 operations selected via the ``--auto-cast`` option to FP16 to achieve improved performance relative to FP32 and increased precision relative to BF16. - ``tf32``: Cast the FP32 operations selected via the ``--auto-cast`` option to TensorFloat-32. - ``fp8_e4m3``: Cast the FP32 operations selected via the ``--auto-cast`` option to a signed 8-bit floating point represented as a 4-bit exponent and 3-bit mantissa. .. note:: If multiple competing options are specified then the option right-most on the command line will supercede previous options. - ``--distribution-strategy ``: Permit the compiler to attempt model-specific optimizations based upon type of model being compiled. (Default: ``generic``) Valid values: - ``llm-training``: Enable the compiler to perform optimizations applicable to large language model (LLMS) training runs that shard parameters, gradients, and optimizer states across data-parallel workers. This is equivalent to the previously documented option argument value of ``NEMO``, which will be deprecated in a future release. - ``--logical-nc-config ``: Instructs the compiler to shard the input graph across physical NeuronCore accelerators. Possible numeric values are {1, 2}. (Only available on trn2; Default: ``2``) Valid values: - ``1``: instructs the compiler to shard the input graph across 1 physical NeuronCore, i.e., do not perform any input graph sharding. - ``2``: [default on trn2] instructs the compiler to shard the input graph across 2 physical NeuronCores. - ``--optlevel ``: Specify the level of optimization the compiler should perform. Possible numeric values are {1, 2, 3}. (Default: ``2``) Valid values: - ``1``: enables the core performance optimizations in the compiler, while also minimizing compile time. - ``2``: [default] provides the best balance between model performance and compile time. - ``3``: may provide additional model execution performance but may incur longer compile times and higher host memory usage during model compilation. .. note:: This option supercedes, and deprecates, the ``—enable-experimental-O1`` option introduced in an earlier release. - ``--enable-mixed-precision-accumulation``: **Enabled by default**. Set to ``true`` by default. Perform intermediate calculations of accumulation operators (such as softmax and layernorm) in FP32 and cast the result to the model-designated datatype. This improves the operator's resulting accuracy. - ``--disable-mixed-precision-accumulation``: Disables mixed precision accumulation. Mixed precision accumulation is enabled by default; use this flag to disable it. Disabling mixed precision accumulation may improve performance at the cost of reduced accuracy for certain operators. - ``--enable-saturate-infinity``: Convert +/- infinity values to MAX/MIN_FLOAT for compiler-introduced matrix-multiply transpose computations that have a high risk of generating Not-a-Number (NaN) values. There is a potential performance impact during model execution when this conversion is enabled. (Only needed on trn1; while the trn2 compiler will accept this flag for compatibility reasons, it has no effect on the compilation.) - ``--enable-fast-context-switch``: Optimize for faster model switching rather than execution latency. This option will defer loading some weight constants until the start of model execution. This results in overall faster system performance when your application switches between models frequently on the same Neuron Core (or set of cores). - ``--enable-fast-loading-neuron-binaries``: Save the compilation output file in an uncompressed format. This creates executable files which are larger in size but faster for the Neuron Runtime to load into memory during model execution. - ``--logfile ``: Filename where compiler writes log messages. (Default: “log-neuron-cc.txt”). - ``--output ``: Filename where compilation output (NEFF archive) will be recorded. (Default: "file.neff”) - ``--verbose ``: Specify the level of output produced by the compiler. (Default: ``warning``) Valid values: - ``info``: Informational messages regarding the progress of model compilation (written to stdout). - ``warning``: Diagnostic messages that report model code that is not inherently erroneous but may be risky or suggest there may have been an error (written to stderr). - ``error``: The compiler detected a condition causing it not complete the compilation successfully (written to stderr). - ``critical``: The compiler encountered an unrecoverable error terminates immediately (written to stderr). - ``debug``: Extensive information regarding the compiler's internal execution phases (written to stdout). *Example*: Compiling an XLA HLO: .. code-block:: shell neuronx-cc compile bert-model.hlo —-framework XLA -—target trn1 —-model-type transformer —-output bert.neff .. _neuronx-cc-list-operators: 'list-operators' Command ------------------------ .. option:: neuronx-cc list-operators [parameters] .. _description-1: Returns a newline (‘\\n’) separated list of operators supported by the Neuron Compiler. .. code-block:: shell neuronx-cc list-operators --framework Parameters ~~~~~~~~~~ - ``--framework ``: Framework in which the operators were registered. Valid values: - ``XLA``: Operator names will be formatted according to the value used by XLA compiler in XlaBuilder. *Example*: .. code-block:: shell neuronx-cc list-operators —framework XLA ... Compiler Exit Statuses ---------------------- - **0**: Compilation succeeded - **<>0**: An error occurred during compilation. ================================================ FILE: compiler/neuronx-cc/developer-guide.rst ================================================ .. meta:: :description: Developer guides for the Neuron Compiler (neuronx-cc), including mixed precision training, performance tuning, and custom kernel implementation for AWS Trainium and Inferentia. :keywords: neuronx-cc, Neuron Compiler, mixed precision, BF16, FP16, TF32, auto-cast, convolution kernels, UNet, performance optimization, Trainium, Inferentia Developer Guide =================== Learn how to optimize your models with the Neuron Compiler (neuronx-cc). These guides cover mixed precision training, performance-accuracy tuning, and custom kernel implementations for AWS Trainium and Inferentia instances. .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Mixed Precision and Performance-Accuracy Tuning :link: /about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision :link-type: doc Learn how to use FP32, TF32, FP16, and BF16 data types with the Neuron Compiler's auto-cast options to balance performance and accuracy. Understand the tradeoffs between different data types and how to configure compiler settings for optimal model execution. .. grid-item-card:: How to Use Convolution Kernels in UNet Training Models :link: /compiler/neuronx-cc/how-to-convolution-in-unet :link-type: doc Modify UNet training models to use custom convolution kernels with NKI (Neuron Kernel Interface). This implementation helps avoid out-of-memory errors when training convolution-heavy models on Trainium instances. .. toctree:: :hidden: :maxdepth: 1 /about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision /compiler/neuronx-cc/how-to-convolution-in-unet ================================================ FILE: compiler/neuronx-cc/faq.rst ================================================ .. _neuronx_compiler_faq: Neuron Compiler FAQ (``neuronx-cc``) ==================================== .. contents:: Table of contents :local: :depth: 1 Where can I compile to Neuron? --------------------------------- The one-time compilation step from the standard framework-level model to NEFF binary may be performed on any EC2 instance or even on-premises. We recommend using a high-performance compute server of choice (C5 or z1d instance types), for the fastest compile times and ease of use with a prebuilt `DLAMI `__. Developers can also install Neuron in their own environments; this approach may work well for example when building a large fleet for inference, allowing the model creation, training and compilation to be done in the training fleet, with the NEFF files being distributed by a configuration management application to the inference fleet. .. _neuron-vs-neuronx: What is the difference between ``neuron-cc`` and ``neuronx-cc``? ---------------------------------------------------------------- * ``neuron-cc`` is the Neuron Compiler with TVM front-end, ``neuron-cc`` supports only :ref:`neuroncores-v1-arch`. * ``neuronx-cc`` is the Neuron Compiler with XLA front-end, ``neuronx-cc`` currently supports :ref:`neuroncores-v2-arch`, ``neuronx-cc`` support of :ref:`neuroncores-v1-arch` is currently a :ref:`Roadmap Item `. Should I use ``neuron-cc`` or ``neuronx-cc``? --------------------------------------------- See :ref:`neuron-vs-neuronx` My current neural network is based on FP32, how can I use it with Neuron? ------------------------------------------------------------------------- Developers who want to train their models in FP32 for best accuracy can compile and deploy them with Neuron. The Neuron compiler automatically converts FP32 to internally supported datatypes, such as FP16 or BF16. You can find more details about FP32 data type support and performance and accuracy tuning in :ref:`neuronx-cc-training-mixed-precision` or :ref:`neuron-cc-training-mixed-precision`. The Neuron compiler preserves the application interface - FP32 inputs and outputs. Transferring such large tensors may become a bottleneck for your application. Therefore, you can improve execution time by casting the inputs and outputs to FP16 or BF16 in the ML framework prior to compilation. Which operators does Neuron support? --------------------------------------- You can use the ``neuronx-cc list-operators`` command on the cli to list the operators. See :ref:`neuron-compiler-cli-reference-guide`. To request support for new operators, open an issue on our `GitHub forum `_. Any operators that Neuron Compiler doesn't support? --------------------------------------------------- Models with control-flow and dynamic shapes are not supported now. You will need to partition the model using the framework prior to compilation. .. note:: Starting with :ref:`neuroncores-v2-arch` Neuron supports control-flow and dynamic shapes. Stay tuned and follow the :ref:`Neuron Roadmap `. Will I need to recompile again if I updated runtime/driver version? ---------------------------------------------------------------------- The compiler and runtime are committed to maintaining compatibility for major version releases with each other. The versioning is defined as major.minor, with compatibility for all versions with the same major number. If the versions mismatch, an error notification is logged and the load will fail. This will then require the model to be recompiled. I have a NEFF binary, how can I tell which compiler version generated it? ------------------------------------------------------------------------- ** We will bring a utility out to help with this soon. How long does it take to compile? ------------------------------------ It depends on the model and its size and complexity, but this generally takes a few minutes. Why is my model producing different results compared to CPU/GPU? ---------------------------------------------------------------- :ref:`neuroncores-v2-arch` supports multiple casting modes for floating point numbers, each with associated implications for performance and accuracy. The default casting mode is a pragmatic balance between performance and accuracy, however on some models it may result in loss of precision. See the :option:`--auto-cast` and :option:`--auto-cast-type` options in :ref:`neuron-compiler-cli-reference-guide` for details on how to adjust the casting mode. Do you support model **? ------------------------------------------- ``neuronx-cc`` has explicit support for select model families using the :option:`--model-type` option, though many other model types are supported. You can also inspect supported operators using the :option:`list-operators` sub-command. See th :ref:`neuron-compiler-cli-reference-guide` for details. More generally, support for new operators and models is continually being added. See our :ref:`neuron_roadmap` for details. ================================================ FILE: compiler/neuronx-cc/how-to-convolution-in-unet.rst ================================================ .. meta:: :description: Learn how to modify UNet training models to use convolution kernels with AWS Neuron SDK :date_updated: 2025-09-09 .. _implement-convolution-kernels-unet: ======================================================= How to Use Convolution Kernels in UNet Training Models ======================================================= Task overview ------------- This topic discusses how to modify UNet training models to use convolution kernels with the AWS Neuron SDK. This implementation helps avoid out-of-memory errors seen when performing training on the convolution-heavy UNet model. Prerequisites ------------- - AWS Neuron SDK 2.26 or later: Required for kernel implementation support - trn1.32xlarge instance: Needed for model training - Existing UNet implementation: Base model to be modified - PyTorch-Neuron environment: Required for neural network operations Instructions ------------ **1: Import required dependencies** .. code-block:: python import torch import torch.nn as nn import torch.nn.functional as F from torch.autograd import Function import neuronxcc.nki as nki import neuronxcc.nki.language as nl from neuronxcc.nki._private_kernels.conv import conv2d_dw_fb01_io01_01bf_rep_nhwc_Pcinh **2: Create the convolution wrapper function** .. code-block:: python @nki.jit def conv_wrap(img_ref, filter_ref, out_shape): out_arr = nl.ndarray(shape=out_shape, dtype=img_ref.dtype, buffer=nl.hbm) conv2d_dw_fb01_io01_01bf_rep_nhwc_Pcinh(img_ref, filter_ref, out_arr, **{ 'input': img_ref.shape, 'filter': filter_ref.shape, 'output': out_shape, 'in_perm': [0, 1, 2, 3], 'kern_perm': [0, 1, 2, 3], 'out_perm': [0, 1, 2, 3], 'stride': (1, 1), 'padding': ((1, 1), (1, 1))}) return out_arr **3: Implement the custom Conv2d module** .. code-block:: python class BwdConv2dWithKernel(nn.Module): def __init__(self, in_channels, out_channels, kernel_size, padding, bias): super().__init__() assert padding == 1 assert bias == False self.in_channels = in_channels self.out_channels = out_channels self.kernel_size = kernel_size self.weight = nn.Parameter(torch.randn(out_channels, in_channels, kernel_size, kernel_size)) nn.init.kaiming_uniform_(self.weight, a=0.0, mode='fan_in', nonlinearity='leaky_relu') **4: Replace standard convolutions in the UNet model** .. code-block:: python class DoubleConvWithKernel(nn.Module): def __init__(self, in_channels, out_channels, mid_channels=None): super().__init__() if not mid_channels: mid_channels = out_channels self.double_conv = nn.Sequential( BwdConv2dWithKernel(in_channels, mid_channels, kernel_size=3, padding=1, bias=False), nn.BatchNorm2d(mid_channels), nn.ReLU(inplace=True), BwdConv2dWithKernel(mid_channels, out_channels, kernel_size=3, padding=1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) **5: Update the UNet model initialization** .. code-block:: python def __init__(self, n_channels, n_classes, bilinear=False): super().__init__() self.n_channels = n_channels self.n_classes = n_classes self.bilinear = bilinear self.inc = (DoubleConvWithKernel(n_channels, 64)) # ... rest of initialization Confirm your work ----------------- To confirm successful implementation, verify the following: .. code-block:: bash Expected training output Training Device=xla:0 Epoch=1 Step=20 Loss=0.30803 Training Device=xla:0 Epoch=2 Step=560 Loss=0.01826 Check for: - No out-of-memory errors during execution - Decreasing loss values across epochs Common issues ------------- .. rubric:: Memory Errors - Solution: Verify all standard convolutions are replaced with BwdConv2dWithKernel implementations .. rubric:: Compilation Errors - Solution: Confirm Neuron SDK version is 2.26 or later .. rubric:: Kernel Errors - Solution: Use the kernel for supported configurations. The kernel will error out in unsupported scenarios. Related information ------------------- .. toctree:: :maxdepth: 1 * `UNet training sample `_ - Sample UNet training implementation ================================================ FILE: compiler/neuronx-cc.rst ================================================ .. _neuronx-cc-index: NeuronX Compiler for Trn1 & Inf2 ================================= .. toctree:: :maxdepth: 1 API Reference Guide How-to: Convolution Developer Guide FAQ ================================================ FILE: conf.py ================================================ # Configuration file for the Sphinx documentation builder. # # This file only contains a selection of the most common options. For a full # list see the documentation: # https://www.sphinx-doc.org/en/master/usage/configuration.html # -- Path setup -------------------------------------------------------------- # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. import datetime import os import sys sys.path.append(os.path.abspath("./_ext")) sys.path.append(os.path.abspath("./nki/api")) sys.path.append(os.path.abspath("./nki/_ext")) sys.path.append(os.path.abspath("./frameworks/torch/torch-neuron/")) sys.path.append(os.path.abspath("./_static")) # get environment variables def get_env_vars_from_gh(): project_name = os.environ.get("GIT_PROJECT_NAME", "aws-neuron-sdk") branch_name = os.environ.get("GIT_BRANCH_NAME", "master") branch_name = "master" if branch_name == "latest" else branch_name return project_name, branch_name def get_env_vars_from_rtd(): branch_name = os.environ.get("READTHEDOCS_VERSION_NAME", "master") branch_name = "master" if branch_name == "latest" else branch_name project_name = "aws-neuron-sdk" if os.environ.get("READTHEDOCS_PROJECT") == "awsdocs-neuron-staging": project_name = "private-aws-neuron-sdk-staging" return project_name, branch_name def get_env_vars(): """Configure project and branch names based on environment""" if os.environ.get("READTHEDOCS") == "True": return get_env_vars_from_rtd() return get_env_vars_from_gh() project_name, branch_name = get_env_vars() # -- Project information ----------------------------------------------------- project = "AWS Neuron" copyright = "{}, Amazon.com".format(datetime.datetime.now().year) author = "AWS" master_doc = "index" html_title = "AWS Neuron Documentation" # -- General configuration --------------------------------------------------- # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ "sphinxcontrib.contentui", "nbsphinx", "sphinx.ext.extlinks", "sphinx.ext.intersphinx", "sphinx_plotly_directive", "df_tables", "sphinxcontrib.programoutput", "neuron_tag", "sphinx_design", "ablog", "sphinx.ext.viewcode", "sphinx.ext.napoleon", "sphinx.ext.autodoc", "sphinx.ext.autosummary", "local_documenter", "archive", "sphinx_copybutton", "nki_directives", "sphinxcontrib.googleanalytics", "sphinxcontrib.datatemplates", "sphinxcontrib.spelling", "sphinx_tabs.tabs", ] html_sidebars = { "**": [ "navbar-logo.html", "search-field.html", "sbt-sidebar-nav.html", ], "about-neuron/announcements/*": [ "navbar-logo.html", "search-field.html", "ablog/postcard.html", "ablog/recentposts.html", "ablog/tagcloud.html", "ablog/categories.html", "ablog/archives.html", "sbt-sidebar-nav.html", ], } # Add any paths that contain templates here, relative to this directory. templates_path = [ "_templates", "nki/_templates/", "_content-types/", "libraries/nxd-inference/_templates", ] # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. exclude_patterns = ['_build', '_backup-rn', '_backup-setup', '_content-types','**.ipynb_checkpoints','.venv','_utilities', 'nki/_templates'] html_extra_path = ['static'] # remove bash/python/ipython/jupyter prompts and continuations copybutton_prompt_text = r">>> |\.\.\. |\$ |In \[\d*\]: | {2,5}\.\.\.: | {5,8}: " copybutton_prompt_is_regexp = True # nbsphinx_allow_errors = True nbsphinx_execute = "never" html_logo = "images/Site-Merch_Neuron-ML-SDK_Editorial.png" napoleon_google_docstring = True # Turn on figure/table numbering numfig = True # -- autodoc/autosummary options ------------------------------------------------- autosummary_generate = True # Turn on sphinx.ext.autosummary # -- more options ------------------------------------------------- projectblob = project_name + "/blob/" + branch_name projecttree = project_name + "/tree/" + branch_name extlinks = { "mxnet-neuron": ( "https://github.com/aws-neuron/" + projectblob + "/neuron-guide/neuron-frameworks/mxnet-neuron/%s", "", ), "pytorch-neuron": ( "https://github.com/aws-neuron/" + projectblob + "/neuron-guide/neuron-frameworks/pytorch-neuron/%s", "", ), "tensorflow-neuron": ( "https://github.com/aws-neuron/" + projectblob + "/neuron-guide/neuron-frameworks/tensorflow-neuron/%s", "", ), "neuron-deploy": ( "https://github.com/aws-neuron/" + projectblob + "/neuron-deploy/%s", "", ), "neuron-tools-tree": ( "https://github.com/aws-neuron/" + projecttree + "/neuron-guide/neuron-tools/%s", "", ), "mxnet-neuron-src": ( "https://github.com/aws-neuron/" + projectblob + "/src/examples/mxnet/%s", "", ), "pytorch-neuron-src": ( "https://github.com/aws-neuron/" + projectblob + "/src/examples/pytorch/%s", "", ), "tensorflow-neuron-src": ( "https://github.com/aws-neuron/" + projectblob + "/src/examples/tensorflow/%s", "", ), "neuron-gatherinfor-src": ( "https://github.com/aws-neuron/" + projectblob + "/src/examples/neuron-gatherinfo/%s", "", ), "neuron-monitor-src": ( "https://github.com/aws-neuron/" + projectblob + "/src/examples/neuron-monitor/%s", "", ), "compile-pt": ( "https://github.com/aws-neuron/" + projectblob + "/archive/src/benchmark/pytorch/%s_compile.py", "", ), "benchmark-pt": ( "https://github.com/aws-neuron/" + projectblob + "/archive/src/benchmark/pytorch/%s_benchmark.py", "", ), "llama-sample": ( "https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/%s.ipynb", "", ), 'github':(f'https://github.com/aws-neuron/{project_name}/blob/{branch_name}/%s', '') } intersphinx_mapping = { "python": ("https://docs.python.org/3", None), "numpy": ("https://numpy.org/doc/stable/", None), "torch": ("https://pytorch.org/docs/master/", None), "transformers": ("https://huggingface.co/docs/transformers/master/en/", None), } # -- Options for Theme ------------------------------------------------- top_banner_message = "Neuron 2.29.0 is released! Check the What's New and Release Notes for more details." html_theme = "sphinx_book_theme" html_theme_options = { "repository_url": "https://github.com/aws-neuron/" + project_name, "use_issues_button": True, "use_repository_button": True, "use_download_button": True, "use_fullscreen_button": True, "use_edit_page_button": True, "home_page_in_toc": False, "repository_branch": branch_name, "announcement": top_banner_message, # "navbar_persistent": [], } html_additional_pages = { "search-google": "search-google.html", } html_context = { # ... "default_mode": "light" } # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. # # html_theme = 'sphinx_rtd_theme' # html_theme_options = { # # 'navigation_depth': 3 # } # html_theme = "pydata_sphinx_theme" # html_theme_options = { # "use_edit_page_button": True, # } # html_context = { # "github_url": "https://github.com", # "github_user": "aws-neuron", # "github_repo": "private-aws-neuron-sdk-staging", # "github_version": "master", # "doc_path": "/", # } # -- Options for HTML output ------------------------------------------------- html_css_files = ["css/custom.css", "styles/sphinx-book-theme.css"] # def setup(app): # app.add_css_file('css/custom.css') # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ["_static"] plotly_include_source = False plotly_html_show_source_link = False plotly_html_show_formats = False plotly_include_directive_source = False # -- ABlog config ------------------------------------------------- blog_path = "about-neuron/announcements/index" blog_post_pattern = "about-neuron/appnotes/*.rst" blog_feed_length = 5 fontawesome_included = True post_show_prev_next = False post_auto_image = 1 post_auto_excerpt = 2 execution_show_tb = "READTHEDOCS" in os.environ # --- Google Analytics Sphinx extension --- googleanalytics_id = "G-2Q13EGB80H" # --- for neuron-tag directive --- rst_prolog = """ .. neuron-tag:: """ rst_epilog = """ .. neuron-tag:: """ # Exclude private github from linkcheck. Readthedocs only exposes the ssh-agent to the 'checkout' build step, which is too early for the linkchecker to run. linkcheck_ignore = [ r"http://localhost:\d+/", r"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html", r"https://github\.com/aws-neuron/private-aws-neuron-sdk-staging/", r"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html", r"https://awsdocs-neuron-staging.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html#install-tensorflow-neuronx", r"https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#inference", r"https://github.com/aws-neuron/aws-neuron-samples/tree/master/torch-neuronx#training", r"https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers", r"https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker", r"https://github.com/awslabs/multi-model-server/blob/master/docs/management_api.md", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/dp_bert_hf_pretrain/run_dp_bert_large_hf_pretrain_bf16_s128.sh", r" https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py", r"https://github.com/pytorch/xla/blob/v1.10.0/TROUBLESHOOTING.md", r"https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/saved_model.md", r"https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/index.md", r"https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/meta-llama-2-13b-sampling.ipynb", r"https://github.com/aws-neuron/aws-neuron-sdk/blob/master/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb", r"https://github.com/aws-neuron/aws-neuron-parallelcluster-samples/blob/master/examples/jobs/neuronx-nemo-megatron-llamav2-job.md", r"https://github.com/pytorch/PiPPy/blob/main/pippy/IR.py#L697", r"https://github.com/pytorch/pytorch/blob/main/torch/fx/_symbolic_trace.py#L241", r"https://github.com/pytorch/xla/blob/master/torch_xla/utils/checkpoint.py#L129", r"https://github.com/aws-neuron/neuronx-distributed/blob/main/src/neuronx_distributed/parallel_layers/layer_norm.py#L32", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py#L273C1-L289C55", r"https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html#pytorch-neuronx-install", r"https://github.com/google-research/bert#user-content-pre-trained-models", r"https://github.com/google-research/bert#user-content-sentence-and-sentence-pair-classification-tasks", r"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-retirement.html", r"https://repost.aws/knowledge-center/eventbridge-notification-scheduled-events", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/modeling_gpt_neox_nxd.py", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/training/tp_dp_gpt_neox_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain/tp_dp_gpt_neox_20b_hf_pretrain.py", r"https://github.com/aws-neuron/aws-neuron-samples/blob/master/torch-neuronx/transformers-neuronx/inference/llama-3-8b-32k-sampling.ipynb", ] linkcheck_exclude_documents = [ r"src/examples/.*", "about-neuron/announcements/neuron1.x/announcements", r"release-notes/.*", r"containers/.*", ] nitpicky = False ================================================ FILE: containers/container-deployment-flows.rst ================================================ .. _container-deployment-flows: Container Deployment Flows ========================== You can also choose one of the following combinations for running the neuron container: .. toctree:: :maxdepth: 1 dlc-then-ec2-devflow dlc-then-ecs-devflow dlc-then-eks-devflow container-sm-hosting-devflow ================================================ FILE: containers/container-sm-hosting-devflow.rst ================================================ .. _containers-byoc-hosting-devflow: .. include:: /devflows/inference/byoc-hosting-devflow.rst ================================================ FILE: containers/developerflows.rst ================================================ Containers - Developer Flows ============================ .. toctree:: :maxdepth: 1 :hidden: /containers/dlc-then-ec2-devflow /containers/dlc-then-ecs-devflow /containers/dlc-then-eks-devflow /containers/container-sm-hosting-devflow /containers/dlc-then-customize-devflow .. include:: /containers/developerflows.txt ================================================ FILE: containers/developerflows.txt ================================================ .. tab-set:: .. tab-item:: Inference * :ref:`containers-dlc-then-ec2-devflow` * :ref:`containers-dlc-then-ecs-devflow` * :ref:`containers-dlc-then-eks-devflow` * :ref:`containers-byoc-hosting-devflow` * :ref:`containers-dlc-then-customize-devflow` ================================================ FILE: containers/dlc-then-customize-devflow.rst ================================================ .. _containers-dlc-then-customize-devflow: .. include:: /devflows/dlc-then-customize-devflow.rst ================================================ FILE: containers/dlc-then-ec2-devflow.rst ================================================ .. _containers-dlc-then-ec2-devflow: .. include:: /devflows/inference/dlc-then-ec2-devflow.rst ================================================ FILE: containers/dlc-then-ecs-devflow.rst ================================================ .. _containers-dlc-then-ecs-devflow: .. include:: /devflows/inference/dlc-then-ecs-devflow.rst ================================================ FILE: containers/dlc-then-eks-devflow.rst ================================================ .. _containers-dlc-then-eks-devflow: .. include:: /devflows/inference/dlc-then-eks-devflow.rst ================================================ FILE: containers/dlc-then-k8s-devflow.rst ================================================ .. _containers-dlc-then-k8s-devflow: .. include:: /devflows/inference/dlc-then-k8s-devflow.rst ================================================ FILE: containers/docker-example/Dockerfile.device-plugin ================================================ FROM amazonlinux:2 RUN echo $'[neuron] \n\ name=Neuron YUM Repository \n\ baseurl=https://yum.repos.neuron.amazonaws.com \n\ enabled=1' > /etc/yum.repos.d/neuron.repo RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB RUN dnf install -y aws-neuron-k8-plugin RUN dnf install -y tar gzip ENV PATH="/opt/aws/neuron/bin/k8s-neuron-device-plugin:${PATH}" CMD k8s-neuron-device-plugin ================================================ FILE: containers/docker-example/index.rst ================================================ Example: Run containerized neuron application ============================================= Introduction: ------------- With this example you will learn how to run a Neuron application using docker containers. Prerequisites: -------------- - Please ensure the steps from the guide on :ref:`tensorflow-serving` were completed successfully before continuing. Steps: ------ Step 1: Start neuron-rtd container: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You may choose to use the following neuron-rtd image: [790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:latest], or build your own image as shown in :ref:`neuron-runtime-dockerfile`. Run neuron-rtd container as shown below. A volume must be mounted to :/sock where neuron-rtd will open a UDS socket. The application can interact with runtime using this socket. .. code:: bash aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 790709498068.dkr.ecr.us-east-1.amazonaws.com docker pull 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.1.1402.0 docker tag 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.1.1402.0 neuron-rtd mkdir /tmp/neuron_rtd_sock chmod o+rwx /tmp/neuron_rtd_sock docker run --device=/dev/neuron0 --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If using older version of neuorn(below 1.1): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ docker pull 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.0.9592.0 docker tag 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:1.0.9592.0 neuron-rtd mkdir /tmp/neuron_rtd_sock chmod o+rwx /tmp/neuron_rtd_sock docker run --env AWS_NEURON_VISIBLE_DEVICES="0" --cap-add SYS_ADMIN --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd Step 2: Start application (tensorflow serving) container: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Build tensorflow-model-server-neuron image using provided example dockerfile :ref:`tensorflow-model-server-neuron-dockerfile`. Run assuming a compiled saved model was stored in s3:///my_model/ .. code:: bash # Note: the neuron-rtd socket directory must be mounted and pointed at using environment variable. # TensorFlow serving will use that socket to talk to Neuron-rtd docker run --env NEURON_RTD_ADDRESS=unix:/sock/neuron.sock \ -v /tmp/neuron_rtd_sock/:/sock \ -p 8501:8501 \ -p 8500:8500 \ --env MODEL_BASE_PATH=s3:///my_model/ \ --env MODEL_NAME=my_model tensorflow-model-server-neuron Step 3: Verify by running an inference! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As shown in :ref:`tensorflow-serving` ================================================ FILE: containers/docker-example/inference/Dockerfile-inference ================================================ # Example pytorch neuron container # To build: # docker build . -f Dockerfile.pt -t neuron-container:pytorch # To run on EC2 Inf1 instances with AWS DLAMI: # docker run -it --device=/dev/neuron0 neuron-container:pytorch FROM ubuntu:24.04 LABEL maintainer=" " RUN apt-get update -y \ && apt-get install -y --no-install-recommends \ gnupg2 \ wget \ python3-pip \ python3-setuptools \ && cd /usr/local/bin \ && pip3 --no-cache-dir install --upgrade pip \ && rm -rf /var/lib/apt/lists/* \ && apt-get clean RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - # Installing Neuron Tools RUN apt-get update -y && apt-get install -y \ aws-neuronx-tools # Sets up Path for Neuron tools ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}" # Include framework tensorflow-neuron or torch-neuronx and compiler (compiler not needed for inference) RUN pip3 install \ torch-neuronx \ --extra-index-url=https://pip.repos.neuron.amazonaws.com # Include your APP dependencies here. # RUN ... # Define the entrypoint script that has some application code (if needed) and executes the docker run command # For example you can use something like below # COPY dockerd-libmode-entrypoint.sh /opt/bin/dockerd-entrypoint.sh # RUN chmod +x /opt/bin/dockerd-entrypoint.sh # ENTRYPOINT ["/opt/bin/dockerd-entrypoint.sh"] CMD ["neuron-top"] ================================================ FILE: containers/docker-example/inference/Dockerfile-inference-dlc ================================================ FROM ubuntu:24.04 #SDK 1.17.1 has version 1. We skipped 1.18.0. LABEL dlc_major_version="2" LABEL maintainer="Amazon AI" LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true ARG PYTHON=python3.7 ARG PYTHON_VERSION=3.7.10 ARG TS_VERSION=0.5.2 ARG MAMBA_VERSION=4.12.0-0 # See http://bugs.python.org/issue19846 ENV LANG C.UTF-8 ENV LD_LIBRARY_PATH /lib/x86_64-linux-gnu:/opt/conda/lib/:$LD_LIBRARY_PATH ENV PATH /opt/conda/bin:$PATH ENV SAGEMAKER_SERVING_MODULE sagemaker_pytorch_serving_container.serving:main ENV TEMP=/home/model-server/tmp RUN apt-get update \ && apt-get install -y --no-install-recommends software-properties-common \ && add-apt-repository ppa:openjdk-r/ppa \ && apt-get update \ && apt-get install -y --no-install-recommends \ build-essential \ apt-transport-https \ ca-certificates \ cmake \ curl \ emacs \ git \ jq \ libgl1-mesa-glx \ libglib2.0-0 \ libsm6 \ libxext6 \ libxrender-dev \ openjdk-11-jdk \ vim \ wget \ unzip \ zlib1g-dev \ libcap-dev \ gpg-agent \ && rm -rf /var/lib/apt/lists/* \ && rm -rf /tmp/tmp* \ && apt-get clean RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - RUN apt-get update \ && apt-get install -y \ aws-neuron-tools \ && rm -rf /var/lib/apt/lists/* \ && rm -rf /tmp/tmp* \ && apt-get clean # https://github.com/docker-library/openjdk/issues/261 https://github.com/docker-library/openjdk/pull/263/files RUN keytool -importkeystore -srckeystore /etc/ssl/certs/java/cacerts -destkeystore /etc/ssl/certs/java/cacerts.jks -deststoretype JKS -srcstorepass changeit -deststorepass changeit -noprompt; \ mv /etc/ssl/certs/java/cacerts.jks /etc/ssl/certs/java/cacerts; \ /var/lib/dpkg/info/ca-certificates-java.postinst configure; RUN curl -L -o ~/mambaforge.sh https://github.com/conda-forge/miniforge/releases/download/${MAMBA_VERSION}/Mambaforge-${MAMBA_VERSION}-Linux-x86_64.sh \ && chmod +x ~/mambaforge.sh \ && ~/mambaforge.sh -b -p /opt/conda \ && rm ~/mambaforge.sh \ && /opt/conda/bin/conda update conda \ && /opt/conda/bin/conda install -c conda-forge -y \ python=$PYTHON_VERSION \ cython \ mkl-include \ mkl \ parso \ scipy \ typing \ # Below 2 are included in miniconda base, but not mamba so need to install conda-content-trust \ charset-normalizer \ && /opt/conda/bin/conda clean -ya RUN conda install -c conda-forge \ opencv \ scikit-learn \ pandas \ h5py \ requests \ && conda clean -ya \ && pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pythonhosted.org \ && ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \ && pip install packaging==20.4 \ enum-compat==0.0.3 \ numpy==1.20.3 \ ipython \ # pyOpenSSL requires cryptography>=2.3, but all versions <3.3 have vulnerabilities "cryptography>=3.3.2" RUN pip install --no-cache-dir -U \ scipy \ six \ # install PyYAML>=5.4 to avoid conflict with latest awscli "pyYAML>=5.4,<5.5" \ "pillow>=8.3" \ "awscli<2" \ boto3 RUN pip install neuron-cc[tensorflow] --extra-index-url https://pip.repos.neuron.amazonaws.com \ && pip install "torch-neuron>=1.10.2,<1.10.3" --extra-index-url https://pip.repos.neuron.amazonaws.com \ && pip install torchserve==$TS_VERSION \ && pip install --no-deps --no-cache-dir -U torchvision==0.11.3 \ # Install TF 1.15.5 to override neuron-cc[tensorflow]'s installation of tensorflow==1.15.0 && pip install -U tensorflow==1.15.5 \ && pip install torch-model-archiver==$TS_VERSION RUN useradd -m model-server \ && mkdir -p /home/model-server/tmp /opt/ml/model \ && chown -R model-server /home/model-server /opt/ml/model COPY torchserve-neuron.sh /usr/local/bin/entrypoint.sh COPY config.properties /home/model-server RUN chmod +x /usr/local/bin/dockerd-entrypoint.py \ && chmod +x /usr/local/bin/neuron-monitor.sh \ && chmod +x /usr/local/bin/entrypoint.sh ADD https://raw.githubusercontent.com/aws/deep-learning-containers/master/src/deep_learning_container.py /usr/local/bin/deep_learning_container.py RUN chmod +x /usr/local/bin/deep_learning_container.py RUN pip install --no-cache-dir "sagemaker-pytorch-inference==2.0.8" RUN HOME_DIR=/root \ && curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \ && unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ \ && cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance \ && chmod +x /usr/local/bin/testOSSCompliance \ && chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh \ && ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \ && rm -rf ${HOME_DIR}/oss_compliance* RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.10/license.txt -o /license.txt EXPOSE 8080 8081 CMD ["/usr/local/bin/entrypoint.sh"] ================================================ FILE: containers/docker-example/inference/Dockerfile-inference-dlc.rst ================================================ .. _inference-dlc-dockerfile: DLC sample Dockerfile for Application Container ============================================== .. literalinclude:: Dockerfile-inference-dlc :linenos: ================================================ FILE: containers/docker-example/inference/Dockerfile-libmode ================================================ # Example pytorch neuron container # To build: # docker build . -f Dockerfile.pt -t neuron-container:pytorch # To run on EC2 Inf1 instances with AWS DLAMI: # docker run -it --device=/dev/neuron0 neuron-container:pytorch FROM ubuntu:24.04 LABEL maintainer=" " RUN apt-get update -y \ && apt-get install -y --no-install-recommends \ gnupg2 \ wget \ python3-pip \ python3-setuptools \ && cd /usr/local/bin \ && pip3 --no-cache-dir install --upgrade pip \ && rm -rf /var/lib/apt/lists/* \ && apt-get clean RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - # Installing Neuron Tools RUN apt-get update -y && apt-get install -y \ aws-neuron-tools # Sets up Path for Neuron tools ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}" # Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference) RUN pip3 install \ torch-neuron \ --extra-index-url=https://pip.repos.neuron.amazonaws.com # Include your APP dependencies here. # RUN ... # Define the entrypoint script that has some application code (if needed) and executes the docker run command # For example you can use something like below # COPY dockerd-libmode-entrypoint.sh /opt/bin/dockerd-entrypoint.sh # RUN chmod +x /opt/bin/dockerd-entrypoint.sh # ENTRYPOINT ["/opt/bin/dockerd-entrypoint.sh"] CMD ["neuron-top"] ================================================ FILE: containers/docker-example/inference/Dockerfile-libmode.rst ================================================ .. _libmode-dockerfile: Dockerfile for Application Container ==================================== .. literalinclude:: Dockerfile-inference :linenos: ================================================ FILE: containers/docker-example/inference/Dockerfile-tf-serving.rst ================================================ .. _tensorflow-model-server-neuron-dockerfile: tensorflow-model-server-neuron Dockerfile ========================================= .. literalinclude:: Dockerfile.tf-serving :linenos: ================================================ FILE: containers/docker-example/inference/Dockerfile.mxnet-serving ================================================ # To build: # docker build . -f Dockerfile.mxnet-serving -t mxnet-model-server-neuron FROM amazonlinux:2 ENV PYTHONUNBUFFERED TRUE RUN dnf install -y gcc-c++ RUN dnf install -y python3-devel RUN dnf install -y java-1.8.0-openjdk RUN dnf install -y curl RUN cd /tmp \ && curl -O https://bootstrap.pypa.io/get-pip.py \ && python3 get-pip.py RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1 RUN pip install mxnet-neuron --index-url=https://pip.repos.neuron.amazonaws.com RUN pip install multi-model-server RUN useradd -m model-server \ && mkdir -p /home/model-server/tmp COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh RUN mkdir -p /home/model-server/tmp/models/ #copy your model COPY mxnet_model/resnet-50_compiled.mar /home/model-server/tmp/models/ RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \ && chown -R model-server /home/model-server EXPOSE 8080 8081 USER model-server WORKDIR /home/model-server ENV TEMP=/home/model-server/tmp ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"] CMD ["serve"] ================================================ FILE: containers/docker-example/inference/Dockerfile.tf-serving ================================================ # Example tensorflow-model-server-neuron dockerfile. # Note: tensorflow_model_server_neuron must be pointed at the model location and name using MODEL_BASE_PATH and # MODEL_NAME env variables. MODEL_BASE_PATH may be an s3 location. # To build: # docker build . -f Dockerfile.tf-serving -t tensorflow-model-server-neuron FROM amazonlinux:2 # Expose ports for gRPC and REST EXPOSE 8500 8501 ENV MODEL_BASE_PATH=/models \ MODEL_NAME=model RUN echo $'[neuron] \n\ name=Neuron YUM Repository \n\ baseurl=https://yum.repos.neuron.amazonaws.com \n\ enabled=1' > /etc/yum.repos.d/neuron.repo RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB RUN dnf install -y tensorflow-model-server-neuron RUN mkdir -p /root/models/ #copy your model COPY tf_model/ /root/models/ RUN ls -la /root/models/* CMD ["/bin/sh", "-c", "/usr/local/bin/tensorflow_model_server_neuron --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=/root/models/${MODEL_NAME}"] ================================================ FILE: containers/docker-example/inference/config-properties.rst ================================================ .. _torchserve-config-properties: Torchserve config.properties example ==================================== .. literalinclude:: config.properties :linenos: ================================================ FILE: containers/docker-example/inference/config.properties ================================================ vmargs=-XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError model_store=/opt/ml/model load_models=ALL inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 # management_address=unix:/tmp/management.sock # number_of_netty_threads=0 # netty_client_threads=0 # default_response_timeout=120 # default_workers_per_model=0 # job_queue_size=100 # async_logging=false # number_of_gpu=1 # cors_allowed_origin # cors_allowed_methods # cors_allowed_headers # keystore=src/test/resources/keystore.p12 # keystore_pass=changeit # keystore_type=PKCS12 # private_key_file=src/test/resources/key.pem # certificate_file=src/test/resources/certs.pem # max_response_size=6553500 # max_request_size=6553500 # blacklist_env_vars= # decode_input_request=false # enable_envvars_config=false ================================================ FILE: containers/docker-example/inference/dockerd-libmode-entrypoint.rst ================================================ .. _dockerd-libmode-entrypoint: Docker Entrypoint Example - Application container ================================================= .. literalinclude:: dockerd-libmode-entrypoint.sh :linenos: ================================================ FILE: containers/docker-example/inference/dockerd-libmode-entrypoint.sh ================================================ #!/bin/bash if [[ "$1" = "serve" ]]; then # Start your application here! # e.g: 'python my_server_app.py' else eval "$@" fi # prevent docker exit tail -f /dev/null ================================================ FILE: containers/docker-example/inference/torchserve-neuron.rst ================================================ .. _torchserve-neuron: Torchserve Example ================== .. literalinclude:: torchserve-neuron.sh :linenos: ================================================ FILE: containers/docker-example/inference/torchserve-neuron.sh ================================================ #!/bin/bash MODEL_STORE=/opt/ml/model TS_CONFIG=/home/model-server/config.properties MODEL_PATH="" while getopts ":m:t:" opt; do case $opt in m) MODEL_PATH="$OPTARG" ;; t) TS_CONFIG="$OPTARG" ;; \?) echo "Invalid option -$OPTARG" >&2 ;; esac done printf "Model path: %s\n" "$MODEL_PATH" printf "TS_CONFIG: %s\n" "$TS_CONFIG" # Start the Model Server if [[ -z "$MODEL_PATH" ]]; then torchserve --start --ts-config /home/model-server/config.properties --model-store /opt/ml/model & else torchserve --start --ts-config $TS_CONFIG --models $MODEL_PATH & fi status=$? if [ $status -ne 0 ]; then echo "Failed to start TF Model Server: $status" exit $status fi ================================================ FILE: containers/docker-example/training/Dockerfile-training-dlc ================================================ # Example pytorch neuron container # To build: # docker build . -f Dockerfile.pt -t neuron-container:pytorch # To run on EC2 Inf1 instances with AWS DLAMI: # docker run -it --net=host --device=/dev/neuron0 neuron-container:pytorch # You can find the latest Pytorch Training Image here - https://gallery.ecr.aws/neuron/pytorch-training-neuronx FROM public.ecr.aws/neuron/pytorch-training-neuronx:2.9.0-neuronx-py310-sdk2.27.0-ubuntu24.04 RUN mkdir -p /opt/ml COPY model.py /opt/ml/model.py COPY mlp_train.py /opt/ml/mlp_train.py ================================================ FILE: containers/docker-example/training/Dockerfile-trainium-dlc.rst ================================================ .. _trainium-dlc-dockerfile: Dockerfile for Application Container ==================================== .. literalinclude:: Dockerfile-training-dlc :linenos: ================================================ FILE: containers/docker-example/training/mlp.rst ================================================ .. _mlp-train: Simple MLP train script ======================== Save the following contents as mlp_train.py .. literalinclude:: mlp_train.py :linenos: Save the following contents as model.py .. literalinclude:: model.py :linenos: ================================================ FILE: containers/docker-example/training/mlp_train.py ================================================ import os import time import torch from model import MLP from torchvision.datasets import mnist from torch.utils.data import DataLoader from torchvision.transforms import ToTensor # XLA imports import torch_xla.core.xla_model as xm # Global constants EPOCHS = 4 WARMUP_STEPS = 2 BATCH_SIZE = 32 # Load MNIST train dataset train_dataset = mnist.MNIST(root='./MNIST_DATA_train', train=True, download=True, transform=ToTensor()) def main(): # Prepare data loader train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE) # Fix the random number generator seeds for reproducibility torch.manual_seed(0) # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance) device = 'xla' # Move model to device and declare optimizer and loss function model = MLP().to(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) loss_fn = torch.nn.NLLLoss() # Run the training loop print('----------Training ---------------') model.train() for epoch in range(EPOCHS): start = time.time() for idx, (train_x, train_label) in enumerate(train_loader): optimizer.zero_grad() train_x = train_x.view(train_x.size(0), -1) train_x = train_x.to(device) train_label = train_label.to(device) output = model(train_x) loss = loss_fn(output, train_label) loss.backward() optimizer.step() xm.mark_step() # XLA: collect ops and run them in XLA runtime if idx < WARMUP_STEPS: # skip warmup iterations start = time.time() # Compute statistics for the last epoch interval = idx - WARMUP_STEPS # skip warmup iterations throughput = interval / (time.time() - start) print("Train throughput (iter/sec): {}".format(throughput)) print("Final loss is {:0.4f}".format(loss.detach().to('cpu'))) # Save checkpoint for evaluation os.makedirs("checkpoints", exist_ok=True) checkpoint = {'state_dict': model.state_dict()} # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu # This can prevent "XRT memory handle not found" at end of test.py execution xm.save(checkpoint,'checkpoints/checkpoint.pt') print('----------End Training ---------------') if __name__ == '__main__': main() ================================================ FILE: containers/docker-example/training/model.py ================================================ import torch.nn as nn import torch.nn.functional as F # Declare 3-layer MLP for MNIST dataset class MLP(nn.Module): def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) ================================================ FILE: containers/docker-example/v1/inference/Dockerfile-app-rt-diff.rst ================================================ .. _app-rt-diff-dockerfile: Dockerfile with Application and Runtime in different Container ============================================================== .. literalinclude:: Dockerfile.app-rt-diff :linenos: ================================================ FILE: containers/docker-example/v1/inference/Dockerfile-app-rt-same.rst ================================================ .. _app-rt-same-dockerfile: Dockerfile with Application and Runtime in same Container ========================================================= .. literalinclude:: Dockerfile.torch-neuron :linenos: ================================================ FILE: containers/docker-example/v1/inference/Dockerfile-neuron-rtd.rst ================================================ .. _neuron-runtime-dockerfile: Neuron Runtime Dockerfile ========================= .. literalinclude:: Dockerfile.neuron-rtd :linenos: ================================================ FILE: containers/docker-example/v1/inference/Dockerfile-torch-neuron.rst ================================================ .. _torch-neuron-dockerfile: torch-neuron Dockerfile ======================= .. literalinclude:: Dockerfile.torch-neuron :linenos: ================================================ FILE: containers/docker-example/v1/inference/Dockerfile.app-rt-diff ================================================ # Example pytorch neuron container # To build: # docker build . -f Dockerfile.pt -t neuron-container:pytorch # To run on EC2 Inf1 instances with AWS DLAMI: # sudo service neuron-rtd stop # docker run -it --device=/dev/neuron0 -v /run/:/run --cap-add IPC_LOCK neuron-container:pytorch FROM ubuntu:18.04 LABEL maintainer=" " RUN apt-get update -y \ && apt-get install -y --no-install-recommends \ wget \ gnupg2 \ python3-pip \ python3-setuptools \ && cd /usr/local/bin \ && pip3 --no-cache-dir install --upgrade pip \ && rm -rf /var/lib/apt/lists/* \ && apt-get clean RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - # Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference) RUN pip3 install \ torch-neuron \ --extra-index-url=https://pip.repos.neuron.amazonaws.com # Include your APP dependencies here. # RUN/ENTRYPOINT/CMD ... ================================================ FILE: containers/docker-example/v1/inference/Dockerfile.neuron-rtd ================================================ # Example neuron-rtd dockerfile. # To build: # docker build . -f Dockerfile.neuron-rtd -t neuron-rtd # Note: the container must start with CAP_IPC_LOCK capability # To run on EC2 Inf1 instances with AWS DLAMI: # sudo service neuron-rtd stop # docker run --env AWS_NEURON_VISIBLE_DEVICES="0" --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock neuron-rtd FROM amazonlinux:2 RUN echo $'[neuron] \n\ name=Neuron YUM Repository \n\ baseurl=https://yum.repos.neuron.amazonaws.com \n\ enabled=1' > /etc/yum.repos.d/neuron.repo RUN rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB RUN dnf install -y aws-neuron-tools RUN dnf install -y aws-neuron-runtime RUN dnf install -y tar gzip ENV PATH="/opt/aws/neuron/bin:${PATH}" CMD neuron-rtd -g unix:/sock/neuron.sock --log-console ================================================ FILE: containers/docker-example/v1/inference/Dockerfile.torch-neuron ================================================ # Example pytorch neuron container # Note: a dockerd_entrypoint.sh script is required to succesfully build this image. Place the script on the same folder as the Dockerfile # To build: # docker build . -f Dockerfile.pt -t neuron-container:pytorch # To run on EC2 Inf1 instances with AWS DLAMI: # sudo service neuron-rtd stop # docker run -it --device=/dev/neuron0 --cap-add IPC_LOCK neuron-container:pytorch FROM ubuntu:18.04 LABEL maintainer=" " RUN apt-get update -y \ && apt-get install -y --no-install-recommends \ gnupg2 \ wget \ python3-pip \ python3-setuptools \ libcap-dev \ && cd /usr/local/bin \ && pip3 --no-cache-dir install --upgrade pip \ && rm -rf /var/lib/apt/lists/* \ && apt-get clean RUN echo "deb https://apt.repos.neuron.amazonaws.com bionic main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - # Installing Neuron Runtime and Tools RUN apt-get update -y && apt-get install -y \ aws-neuron-runtime \ aws-neuron-tools # Sets up Path for Neuron tools ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}" # Include framework tensorflow-neuron or torch-neuron and compiler (compiler not needed for inference) RUN pip3 install \ torch-neuron \ --extra-index-url=https://pip.repos.neuron.amazonaws.com # Include your APP dependencies here. # RUN ... # Define the entrypoint script that starts the runtime and executes the docker run command COPY dockerd-entrypoint.sh /opt/bin/dockerd-entrypoint.sh RUN chmod +x /opt/bin/dockerd-entrypoint.sh ENTRYPOINT ["/opt/bin/dockerd-entrypoint.sh"] CMD ["neuron-top"] ================================================ FILE: containers/docker-example/v1/inference/dockerd-entrypoint-app-rt-same.rst ================================================ .. _dockerd-entrypoint-app-rt-same: Docker Entrypoint Example - Application and Runtime in same Container ===================================================================== .. literalinclude:: dockerd-entrypoint.sh :linenos: ================================================ FILE: containers/docker-example/v1/inference/dockerd-entrypoint.sh ================================================ #!/bin/bash set -e wait_for_nrtd() { nrtd_sock="/run/neuron.sock" SOCKET_TIMEOUT=300 is_wait=true wait_time=0 i=1 sp="/-\|" echo -n "Waiting for neuron-rtd " pid=$1 while $is_wait; do if [ -S "$nrtd_sock" ]; then echo "$nrtd_sock Exist..." is_wait=false else sleep 1 wait_time=$((wait_time + 1)) if [ "$wait_time" -gt "$SOCKET_TIMEOUT" ]; then echo "neuron-rtd failed to start, exiting" cat /tmp/nrtd.log exit 1 fi printf "\b${sp:i++%${#sp}:1}" fi done cat /tmp/nrtd.log } # Start neuron-rtd /opt/aws/neuron/bin/neuron-rtd -g unix:/run/neuron.sock --log-console >> /tmp/nrtd.log 2>&1 & nrtd_pid=$! echo "NRTD PID: "$nrtd_pid"" #wait for nrtd to be up (5 minutes timeout) wait_for_nrtd $nrtd_pid export NEURON_RTD_ADDRESS=unix:/run/neuron.sock nrtd_present=1 if [[ "$1" = "serve" ]]; then # Start your application here! # e.g: 'python my_server_app.py' else eval "$@" fi # prevent docker exit tail -f /dev/null ================================================ FILE: containers/ec2-then-ec2-devflow.rst ================================================ .. _containers-ec2-then-ec2-devflow: .. include:: /devflows/inference/ec2-then-ec2-devflow.rst ================================================ FILE: containers/ec2.rst ================================================ .. _ec2-instance: EC2 Instance ============ Introduction ------------ Use of Neuron in containers on EC2 can be simple to achieve by following these steps - :ref:`tutorial-docker-env-setup-for-neuron` - More details on EC2 setup `can be found at `_ DLC Images ---------- - The location for DLC images for Neuron can be obtained from `here `_ - To get the list of images for neuron, the following commands can be used. ``aws ecr list-images --registry-id 763104351884 --repository-name tensorflow-inference-neuron`` ``aws ecr list-images --registry-id 763104351884 --repository-name pytorch-inference-neuron`` Setup recommendations --------------------- - The EC2 Inf1 instance needs to have the aws-neuron-runtime-base and aws-neruon-dkms package installed. - The DLC inference container runs the framework server (like tensorflow-model-server or TorchServe) and also the neuron runtime that interacts with the neuron driver running in the host. - For more details on setting up the container, check the `tensorflow `_ or `pytorch `_. Make sure the appropriate framework container image is used. Debug Hints ----------- - Use the docker log command to get the neuron rtd logs in the container. ``docker logs `` - Look for errors like the following - If we see *nrtd[8]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0*, it either means that some other container is using that device or the host is running the neuron-rtd process. - Check to see that host is not running neuron-rtd ``sudo systemctl status neuron-rtd`` ================================================ FILE: containers/faq-troubleshooting-releasenote.rst ================================================ Containers - FAQ, Troubleshooting & ReleaseNotes ================================================ .. toctree:: :maxdepth: 1 :hidden: FAQ troubleshooting /release-notes/components/containers * :ref:`container-faq` * :ref:`container-troubleshooting` * :ref:`containers_rn` ================================================ FILE: containers/faq.rst ================================================ .. _container-faq: Neuron Containers FAQ ===================== .. contents:: Table of Contents :local: :depth: 1 Where can I find DLC images --------------------------- * The Inference/Training DLC images can be found `here `_. * In the `DLC release page `_ do a search for neuron to get the ECR repo location of specific neuron DLC release. What is OCI Neuron Hook and do we need that ------------------------------------------- Neuron devices are exposed to the containers using the --device option in the docker run command. Docker runtime (runc) does not yet support the ALL option to expose all neuron devices to the container. With OCI neuron hook support is added to expose ALL devices to container using an environment variable, “AWS_NEURON_VISIBLE_DEVICES=ALL". For more details please refer :ref:`oci neuron hook ` In Kubernetes, if we are using the device plugin version 1.7 & below, then the oci neuron hook is needed. If using device plugin version >= 1.8 then oci neuron hook is not needed What container runtimes are supported ------------------------------------- Neuron containers have been tested to work with docker, containerd, cri-o runtimes without any changes. If the oci neuron hook is used then they need to be enabled in the runtime config. For more details please refer :ref:`oci neuron hook ` How to expose Neuron Devices to Container ----------------------------------------- Neuron Device: Represents the number of Inferentia/Trainium chips in the instance. Refer :ref:`Container Devices ` for more details How to expose Neuron Cores to Container --------------------------------------- Neuron Core: Represents the number of Neuron Cores in the instance. Refer :ref:`Container Cores ` for more details. Each Inferentia1 device has 4 Neuron Cores and each Inferentia2 and Trainium1 device has 2 Neuron Cores. When the devices are exposed to the containers all the cores in the device are available for use in the container. Please refer :ref:`nrt-configuration` to see how the environment variables NEURON_RT_VISIBLE_CORES and NEURON_RT_NUM_CORES can be used to assign core to containers Can Neuron Devices be shared by different Containers running in the same Host ----------------------------------------------------------------------------- Yes, except in Kubernetes environment where the devices cannot be shared Can Neuron Cores be shared by different Containers running in the same Host ----------------------------------------------------------------------------- No When would you use Neuron K8 Scheduler Extension ------------------------------------------------- The neuron cores/devices that are exposed to the container needs to be contiguous. The kubernetes device plugin does not guarantee the devices to be contiguous. The K8 Neuron Scheduler Extension takes care of assigning contiguous devices to the containers. How to add EFA devices to the container --------------------------------------- The EFA devices are exposed to the container using the --device option :: --device /dev/infiniband/uverbs0 In a Kubernetes environment, the EFA device plugin is used to detect and advertise the available EFA interfaces. The EFA device plugin can be installed using the `Helm chart provided by Amazon EKS `_ :: helm repo add eks https://aws.github.io/eks-charts helm install aws-efa-k8s-device-plugin --namespace kube-system eks/aws-efa-k8s-device-plugin Once the plugin is deployed, applications can use the resource type vpc.amazonaws.com/efa in a pod request spec :: resources: limits: vpc.amazonaws.com/efa: 4 Can distributed training jobs be run without EFA devices in container --------------------------------------------------------------------- No. For distributed training jobs on Trainium, all EFA interfaces provided by trn1.32xlarge need to be attached to the container ================================================ FILE: containers/files/index-dra.rst ================================================ .. meta:: :description: Templates supporting AWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes. :keywords: AWS Neuron, Neuron DRA, Dynamic Resource Allocation, Kubernetes, K8s, Device Plugin :date-modified: 02/05/2026 AWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes: Support files ========================================================================= This page provides templates supporting AWS Neuron Dynamic Resource Allocation (DRA) on Kubernetes. You can view and download these files from the links below. Resource Claim Specifications ----------------------------- Example resource claim templates and pod specifications demonstrating different Neuron device allocation patterns for various workload requirements. .. list-table:: :header-rows: 1 :widths: 30 55 15 * - File Name - Description - Download * - 1x4-connected-devices.yaml - Resource claim template for allocating 4 connected Neuron devices with topology constraints for optimal performance. - :download:`Download ` * - 2-node-inference-us.yaml - Multi-node inference configuration for distributed workloads across 2 Trainium nodes. - :download:`Download ` * - 4-node-inference-us.yaml - Large-scale inference setup for distributed workloads spanning 4 Trainium nodes. - :download:`Download ` * - all-devices.yaml - Resource claim template that allocates all available Neuron devices on a trn2.48xlarge instance. - :download:`Download ` * - lnc-setting-trn2.yaml - Logical NeuronCore configuration template optimized for Trainium2 instances. - :download:`Download ` * - specific-driver-version.yaml - Example configuration for requesting specific Neuron driver versions in resource claims. - :download:`Download ` * - us-and-lnc-config.yaml - Example configuration for requesting UltraServer node with Logical NeuronCore configuration. - :download:`Download ` ================================================ FILE: containers/files/manifests/clusterrole.yaml ================================================ apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: neuron-dra-driver-clusterrole rules: # Required for DRA device plugin to manage ResourceSlices - apiGroups: ["resource.k8s.io"] resources: ["resourceslices"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] # Required for DRA device plugin to read ResourceClaims - apiGroups: ["resource.k8s.io"] resources: ["resourceclaims"] verbs: ["get", "list", "watch"] # Required for DRA device plugin to read DeviceClasses - apiGroups: ["resource.k8s.io"] resources: ["deviceclasses"] verbs: ["get", "list", "watch"] # Required to read and modify node information - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch", "patch", "update"] # Required to modify node status - apiGroups: [""] resources: ["nodes/status"] verbs: ["patch"] ================================================ FILE: containers/files/manifests/clusterrolebinding.yaml ================================================ apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: neuron-dra-driver-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: neuron-dra-driver-clusterrole subjects: - kind: ServiceAccount name: neuron-dra-driver-sa namespace: neuron-dra-driver ================================================ FILE: containers/files/manifests/daemonset.yaml ================================================ apiVersion: apps/v1 kind: DaemonSet metadata: name: neuron-dra-driver-kubelet-plugin namespace: neuron-dra-driver labels: app: neuron-dra-driver-kubelet-plugin spec: updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 selector: matchLabels: app: neuron-dra-driver-kubelet-plugin template: metadata: labels: app: neuron-dra-driver-kubelet-plugin spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node.kubernetes.io/instance-type operator: In values: - trn1.2xlarge - trn1.32xlarge - trn1n.32xlarge - trn2.3xlarge - trn2.48xlarge - trn2n.48xlarge - key: eks.amazonaws.com/compute-type operator: NotIn values: - fargate - hybrid - auto serviceAccountName: neuron-dra-driver-sa hostNetwork: true containers: - name: neuron-dra-driver image: NEURON_DRA_IMAGE imagePullPolicy: Always command: ["k8s-neuron-dra-driver"] # args: # - --v=6 env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_UID valueFrom: fieldRef: fieldPath: metadata.uid - name: CDI_ROOT value: "/var/run/cdi" - name: KUBELET_REGISTRAR_DIRECTORY_PATH value: "/var/lib/kubelet/plugins_registry" - name: KUBELET_PLUGINS_DIRECTORY_PATH value: "/var/lib/kubelet/plugins" - name: HEALTHCHECK_PORT value: "51515" - name: NEURON_DRA_DRIVER_EMULATION_MODE value: "trn2u" resources: limits: cpu: 20m memory: 256Mi requests: cpu: 10m memory: 128Mi securityContext: privileged: true volumeMounts: - name: kubelet-plugins-dir mountPath: /var/lib/kubelet/plugins - name: kubelet-registry-dir mountPath: /var/lib/kubelet/plugins_registry - name: cdi-dir mountPath: /var/run/cdi livenessProbe: grpc: port: 51515 service: liveness failureThreshold: 3 periodSeconds: 10 initialDelaySeconds: 30 timeoutSeconds: 5 volumes: - name: kubelet-plugins-dir hostPath: path: /var/lib/kubelet/plugins - name: kubelet-registry-dir hostPath: path: /var/lib/kubelet/plugins_registry - name: cdi-dir hostPath: path: /var/run/cdi tolerations: - key: CriticalAddonsOnly operator: Exists - key: aws.amazon.com/neuron operator: Exists effect: NoSchedule - key: sagemaker.amazonaws.com/node-health-status operator: Equal value: Unschedulable effect: NoSchedule # - key: "kwok.x-k8s.io/node" # operator: "Exists" # effect: "NoSchedule" ================================================ FILE: containers/files/manifests/deviceclass.yaml ================================================ apiVersion: resource.k8s.io/v1beta1 kind: DeviceClass metadata: name: neuron.aws.com spec: selectors: - cel: expression: device.driver == "neuron.aws.com" ================================================ FILE: containers/files/manifests/namespace.yaml ================================================ apiVersion: v1 kind: Namespace metadata: name: neuron-dra-driver labels: name: neuron-dra-driver ================================================ FILE: containers/files/manifests/serviceaccount.yaml ================================================ apiVersion: v1 kind: ServiceAccount metadata: name: neuron-dra-driver-sa namespace: neuron-dra-driver ================================================ FILE: containers/files/scripts/install-dra-driver.sh ================================================ #!/bin/bash # Deploy Neuron DRA Driver set -e echo "🚀 Deploying Neuron DRA Driver..." # Check argument if [ $# -ne 1 ]; then echo "Usage: $0 " echo "Example: $0 123456789.dkr.ecr.us-west-2.amazonaws.com/neuron-dra-driver:v1.0" exit 1 fi # Get the script directory and set the manifests path SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" MANIFESTS_DIR="$SCRIPT_DIR/../../manifests" DRA_IMAGE="$1" # Apply all manifests in order echo "📝 Creating namespace..." kubectl apply -f "$MANIFESTS_DIR/namespace.yaml" echo "🔐 Creating ServiceAccount and RBAC..." kubectl apply -f "$MANIFESTS_DIR/serviceaccount.yaml" kubectl apply -f "$MANIFESTS_DIR/clusterrole.yaml" kubectl apply -f "$MANIFESTS_DIR/clusterrolebinding.yaml" echo "📱 Creating DeviceClass..." kubectl apply -f "$MANIFESTS_DIR/deviceclass.yaml" echo "🔧 Deploying DRA DaemonSet..." # Check if DaemonSet already exists before applying DAEMONSET_EXISTS=false if kubectl get daemonset neuron-dra-driver-kubelet-plugin -n neuron-dra-driver >/dev/null 2>&1; then DAEMONSET_EXISTS=true echo "📋 DaemonSet already exists, will restart after applying..." fi echo "🏷️ Using custom image: $DRA_IMAGE" sed "s|NEURON_DRA_IMAGE|$DRA_IMAGE|g" "$MANIFESTS_DIR/daemonset.yaml" | kubectl apply -f - # If DaemonSet was already running, restart it to pull latest image if [ "$DAEMONSET_EXISTS" = true ]; then echo "🔄 Restarting DaemonSet to pull latest image..." kubectl rollout restart daemonset/neuron-dra-driver-kubelet-plugin -n neuron-dra-driver echo "⏳ Waiting for rollout to complete..." kubectl rollout status daemonset/neuron-dra-driver-kubelet-plugin -n neuron-dra-driver --timeout=300s else echo "⏳ Waiting until pods are in a running state..." kubectl wait --for=condition=ready pod -l app=neuron-dra-driver-kubelet-plugin -n neuron-dra-driver --timeout=300s fi echo "✅ Deployment complete!" echo "" echo "📊 Recent logs from dra driver:" kubectl logs -n neuron-dra-driver -l app=neuron-dra-driver-kubelet-plugin --tail=10 echo "" ================================================ FILE: containers/files/specs/1x4-connected-devices.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: 1x4-connected-neurons spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com allocationMode: ExactCount count: 4 selectors: - cel: expression: "device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'" constraints: - requests: ["neurons"] matchAttribute: "resource.aws.com/devicegroup4_id" --- apiVersion: v1 kind: Pod metadata: name: pod0 labels: app: pod spec: containers: - name: ctr0 image: public.ecr.aws/ubuntu/ubuntu:22.04 command: ["bash", "-c"] args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"] resources: claims: - name: neurons resourceClaims: - name: neurons resourceClaimTemplateName: 1x4-connected-neurons ================================================ FILE: containers/files/specs/2-node-inference-us.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: us-2-node-config spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].resourceType == 'neuron_node'" allocationMode: ExactCount count: 1 config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: UltraServerConfig ultraserverMode: 2 --- apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: vllm annotations: leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-2 spec: rolloutStrategy: type: RollingUpdate rollingUpdateConfiguration: maxUnavailable: 1 maxSurge: 1 # Two replica groups of 2 nodes each replicas: 2 leaderWorkerTemplate: size: 2 restartPolicy: RecreateGroupOnPodRestart leaderTemplate: metadata: labels: role: leader spec: containers: - name: vllm-leader image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-2-node-config workerTemplate: metadata: labels: role: worker spec: containers: - name: vllm-worker image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-2-node-config ================================================ FILE: containers/files/specs/4-node-inference-us.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: us-4-node-config spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].resourceType == 'neuron_node'" allocationMode: ExactCount count: 1 config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: UltraServerConfig ultraserverMode: 4 --- apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: vllm annotations: leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-4 spec: rolloutStrategy: type: RollingUpdate rollingUpdateConfiguration: maxUnavailable: 1 maxSurge: 1 # Two replica groups of 4 nodes each, i.e. two ultraservers replicas: 2 leaderWorkerTemplate: size: 4 restartPolicy: RecreateGroupOnPodRestart leaderTemplate: metadata: labels: role: leader spec: containers: - name: vllm-leader image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-4-node-config workerTemplate: metadata: labels: role: worker spec: containers: - name: vllm-worker image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-4-node-config ================================================ FILE: containers/files/specs/all-devices.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: all-neurons spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'" allocationMode: All --- apiVersion: v1 kind: Pod metadata: name: pod0 labels: app: pod spec: containers: - name: ctr0 image: public.ecr.aws/ubuntu/ubuntu:22.04 command: ["bash", "-c"] args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"] resources: claims: - name: neurons resourceClaims: - name: neurons resourceClaimTemplateName: all-neurons ================================================ FILE: containers/files/specs/lnc-setting-trn2.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: all-neurons-lnc-1 spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'" allocationMode: All config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: NeuronConfig logicalNeuronCore: 1 --- apiVersion: v1 kind: Pod metadata: name: pod0 labels: app: pod spec: containers: - name: ctr0 image: public.ecr.aws/ubuntu/ubuntu:22.04 command: ["bash", "-c"] args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"] resources: claims: - name: neurons resourceClaims: - name: neurons resourceClaimTemplateName: all-neurons-lnc-1 ================================================ FILE: containers/files/specs/specific-driver-version.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: driver-version-neuron spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].neuronDriverVersion == '2.25.4.0'" allocationMode: All --- apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: replicas: 2 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: nginx image: public.ecr.aws/docker/library/nginx:alpine resourceClaims: - name: neurons resourceClaimTemplateName: driver-version-neuron ================================================ FILE: containers/files/specs/us-and-lnc-config.yaml ================================================ apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: us-and-lnc-config spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].resourceType == 'neuron_node'" allocationMode: ExactCount count: 1 config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: UltraServerConfig ultraserverMode: 2 - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: NeuronConfig logicalNeuronCore: 1 --- apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: vllm annotations: leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-2 spec: rolloutStrategy: type: RollingUpdate rollingUpdateConfiguration: maxUnavailable: 1 maxSurge: 1 # Two replica groups of 2 nodes each replicas: 2 leaderWorkerTemplate: size: 2 restartPolicy: RecreateGroupOnPodRestart leaderTemplate: metadata: labels: role: leader spec: containers: - name: vllm-leader image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-and-lnc-config workerTemplate: metadata: labels: role: worker spec: containers: - name: vllm-worker image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-and-lnc-config ================================================ FILE: containers/get-started/quickstart-configure-deploy-dlc.rst ================================================ .. meta:: :description: Learn how to deploy a vLLM server using preconfigured Neuron Deep Learning Container with on Trainium and Inferentia instances. :date_updated: 01/26/2026 .. _quickstart_vllm_dlc_deploy: Quickstart: Configure and deploy a vLLM server using Neuron Deep Learning Container (DLC) ========================================================================================== This topic guides you through deploying a vLLM server on Trainium and Inferentia instances using a Deep Learning Container preconfigured with AWS Neuron SDK artifacts. When you complete this tutorial, you will be able run a vLLM inference server on AWS Trainium and Inferentia instances. Overview -------- In this quickstart, you will pull a vLLM Docker image, configure it for Neuron devices, and start an inference server running vLLM. This process lets you deploy large language models on AWS ML accelerators for high-performance inference workloads. Before you start ---------------- This tutorial assumes that you have experience in the following areas: * Docker container management * AWS EC2 instance administration * Command-line interface operations Prerequisites ------------- Before you begin, ensure you have: * AWS Trainium or Inferentia instance access * Docker installed on your instance. You can set up docker environment according to :ref:`tutorial-docker-env-setup` * SSH access to your instance Prepare your environment ------------------------ Launch an AWS Trainium or Inferentia instance with sufficient resources for your model requirements. We recommend using one of the base DLAMIs to launch your instance - `Neuron Base DLAMI <#>`. Step 1: Pull the vLLM Docker image ----------------------------------- In this step, you will download the vLLM Docker image from AWS ECR. Get the latest vLLM Docker image from Neuron's ECR public gallery `pytorch-inference-vllm-neuronx `_ repository, and then get the latest published image tag and use it in the command below: .. code-block:: bash docker pull public.ecr.aws/neuron/pytorch-inference-vllm-neuronx: For example, replace ```` with an SDK 2.28.0 released DLC image tag such as ``0.13.0-neuronx-py312-sdk2.28.0-ubuntu24.04`` Step 2: Start the Docker container ----------------------------------- In this step, you will run the container with access to Neuron devices. For this tutorial, we are using an trn1.32xlarge instance. Run the container interactively with access to Neuron devices: .. code-block:: bash docker run -it \ --device=/dev/neuron0 \ --device=/dev/neuron1 \ --device=/dev/neuron2 \ --device=/dev/neuron3 \ --device=/dev/neuron4 \ --device=/dev/neuron5 \ --device=/dev/neuron6 \ --device=/dev/neuron7 \ --device=/dev/neuron8 \ --device=/dev/neuron9 \ --device=/dev/neuron10 \ --device=/dev/neuron11 \ --device=/dev/neuron12 \ --device=/dev/neuron13 \ --device=/dev/neuron14 \ --device=/dev/neuron15 \ --cap-add SYS_ADMIN \ --cap-add IPC_LOCK \ -p 8080:8080 \ --name \ \ bash .. note:: The trn1.32xlarge instance provides 16 Neuron devices. Adjust the number of Neuron devices (``--device=/dev/neuronX``) based on your instance type and requirements. Step 3: Start the vLLM server ------------------------------ In this step, you will launch the vLLM inference server inside the container. Inside the container, start the vLLM inference server: .. code-block:: bash vllm serve \ --model='TinyLlama/TinyLlama-1.1B-Chat-v1.0' \ --max-num-seqs=4 \ --max-model-len=128 \ --tensor-parallel-size=2 \ --block-size=32 \ --num-gpu-blocks-override=16 \ --port=8080 \ --additional-config='{"override_neuron_config":{"enable_bucketing":false}}' .. note:: **Version compatibility**: The command above is compatible with vLLM version 0.11.0 and later. If you are using an older version (such as 0.9.1), you must: * Replace ``--additional-config='{"override_neuron_config":{"enable_bucketing":false}}'`` with ``--override-neuron-config '{"enable_bucketing":false}'`` .. important:: * Choose the appropriate model for your use case * Set ``--tensor-parallel-size`` to be less than or equal to total number of NeuronCores (or TP ranks) available from your devices, accounting for cores per device and logical core configuration * Server startup typically takes 5-10 minutes Step 4: Verify server status ----------------------------- In this step, you will confirm the server starts successfully. Wait for the server to fully initialize. You will see output showing available API routes: .. code-block:: text INFO 08-12 00:04:47 [launcher.py:28] Available routes are: INFO 08-12 00:04:47 [launcher.py:36] Route: /health, Methods: GET INFO 08-12 00:04:47 [launcher.py:36] Route: /v1/chat/completions, Methods: POST INFO 08-12 00:04:47 [launcher.py:36] Route: /v1/completions, Methods: POST .. note:: During startup, you may see warning logs similar to the following, which can be safely ignored: .. code-block:: text No module named 'vllm._version' from .version import __version__, __version_tuple__ # isort:skip WARNING [__init__.py:25] The vLLM package was not found, so its version could not be inspected. This may cause platform detection to fail. INFO [__init__.py:243] Automatically detected platform neuron. WARNING [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") All complete! Now, let's confirm everything works. Step 5: Inference service confirmation --------------------------------------- Test the API to confirm your setup works correctly. Open a separate terminal and make an API call: .. code-block:: bash curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "user", "content": "What is the capital of Italy?" } ] }' You should receive a response similar to: .. code-block:: json { "id": "chatcmpl-ac7551dd2f2a4be3bd2c1aabffa79b4c", "object": "chat.completion", "created": 1754958455, "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of Italy is Rome...", "tool_calls": [] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 23, "total_tokens": 106, "completion_tokens": 83 } } Congratulations! You have successfully deployed a vLLM inference server using a preconfigured Neuron DLC. If you encountered any issues, see the **Common issues** section below. Available API endpoints ----------------------- The server provides various endpoints for different use cases: * **Health Check**: ``GET /health`` * **Chat Completions**: ``POST /v1/chat/completions`` * **Text Completions**: ``POST /v1/completions`` * **Models Info**: ``GET /v1/models`` * **API Documentation**: ``GET /docs`` Common issues ------------- Did you encounter an error while working through this tutorial? Here are common issues and solutions: - **Server won't start**: Check that you have sufficient Neuron devices allocated - **Connection refused**: Verify the container is running and port 8080 is properly mapped - **Slow performance**: Ensure your ``tensor-parallel-size`` matches your available Neuron devices - **Memory issues**: Consider using a larger instance type or reducing model size For additional help, refer to the complete vLLM User Guide for NxD Inference documentation. Clean up -------- To clean up resources after completing this tutorial: 1. Stop the Docker container: .. code-block:: bash docker stop 2. Remove the container: .. code-block:: bash docker rm 3. Terminate your EC2 instance if no longer needed. Next steps ---------- Now that you've completed this tutorial, explore these related topics: * Learn more about vLLM configuration options in the vLLM User Guide for NxD Inference * Explore model optimization techniques for better performance * Set up production deployment with load balancing and monitoring Further reading --------------- - `vLLM User Guide for NxD Inference <#>`_ - Complete documentation for vLLM on Neuron - `AWS Neuron SDK Documentation `_ - Full Neuron SDK reference ================================================ FILE: containers/get-started/quickstart-pytorch-inference-dlc.rst ================================================ .. meta:: :description: Learn how to run PyTorch inference using preconfigured Neuron Deep Learning Container with Llama-2-7b on Trainium instances. :date_updated: 02/17/2026 .. _quickstart_pytorch_inference_dlc: Quickstart: Run PyTorch inference using Neuron Deep Learning Container (DLC) ============================================================================= This topic guides you through running PyTorch inference on Trainium instances using a Deep Learning Container preconfigured with AWS Neuron SDK artifacts. When you complete this tutorial, you will be able to run inference with the Llama-2-7b model on AWS Trainium instances. Overview -------- In this quickstart, you will pull a PyTorch inference Docker image, download the Llama-2-7b model from S3, and run an inference demo that compiles, validates, and benchmarks the model. This process lets you deploy large language models on AWS ML accelerators for high-performance inference workloads. Before you start ---------------- This tutorial assumes that you have experience in the following areas: * Docker container management * AWS EC2 instance administration * Command-line interface operations * AWS S3 operations Prerequisites ------------- Before you begin, ensure you have: * AWS Trainium instance access (trn2.48xlarge recommended) * Docker installed on your instance. You can set up docker environment according to :ref:`tutorial-docker-env-setup` * SSH access to your instance * AWS credentials configured with access to the model S3 bucket Prepare your environment ------------------------ Launch an AWS Trainium instance with sufficient resources for your model requirements. We recommend using one of the base DLAMIs to launch your instance - `Neuron Base DLAMI <#>`. Step 1: Pull the PyTorch inference Docker image ------------------------------------------------ In this step, you will download the PyTorch inference Docker image from AWS ECR. Get the latest PyTorch inference Docker image from Neuron's ECR public gallery `pytorch-inference-neuronx `_ repository, and then get the latest published image tag and use it in the command below: .. code-block:: bash docker pull public.ecr.aws/neuron/pytorch-inference-neuronx: For example, replace ```` with an SDK 2.28.0 released DLC image tag such as ``2.9.0-neuronx-py312-sdk2.28.0-ubuntu24.04`` Step 2: Download the Llama-2-7b model -------------------------------------- In this step, you will download the Llama-2-7b model from HuggingFace to an S3 bucket, then copy it to your instance. First, download the model from HuggingFace and upload to your S3 bucket: .. code-block:: bash # Install HuggingFace CLI if not already installed pip install huggingface-hub # Login to HuggingFace (you'll need to accept the Llama-2 license first) hf auth login # Download the model hf download meta-llama/Llama-2-7b --local-dir ./Llama-2-7b # Upload to your S3 bucket aws s3 cp --recursive ./Llama-2-7b s3://your-bucket-name/models/Llama-2-7b/ Then, on your Trainium instance, download the model from S3: .. note:: Change ``/home/ec2-user`` to ``/home/ubuntu`` if you're using an Ubuntu AMI. .. code-block:: bash # Create directory for the model mkdir -p /home/ec2-user/model_hf/Llama-2-7b # Download from S3 aws s3 cp --recursive s3://your-bucket-name/models/Llama-2-7b/ /home/ec2-user/model_hf/Llama-2-7b/ # Verify the model downloaded successfully ls /home/ec2-user/model_hf/Llama-2-7b/config.json .. note:: You must accept the Llama-2 license on HuggingFace before you can download the model. Visit https://huggingface.co/meta-llama/Llama-2-7b to request access. Step 3: Start the Docker container ----------------------------------- In this step, you will run the container with access to Neuron devices and mount the model directory. For this tutorial, we are using a trn2.48xlarge instance. Run the container interactively with access to all Neuron devices: .. code-block:: bash docker run -it \ --device=/dev/neuron0 \ --device=/dev/neuron1 \ --device=/dev/neuron2 \ --device=/dev/neuron3 \ --device=/dev/neuron4 \ --device=/dev/neuron5 \ --device=/dev/neuron6 \ --device=/dev/neuron7 \ --device=/dev/neuron8 \ --device=/dev/neuron9 \ --device=/dev/neuron10 \ --device=/dev/neuron11 \ -v /home/ec2-user/model_hf/Llama-2-7b:/root/model_hf/Llama-2-7b \ --cap-add SYS_ADMIN \ --cap-add IPC_LOCK \ --name pytorch-inference-demo \ public.ecr.aws/neuron/pytorch-inference-neuronx: \ bash .. note:: The trn2.48xlarge instance provides 12 Neuron devices. Adjust the number of Neuron devices (``--device=/dev/neuronX``) based on your instance type and requirements. Step 4: Run the inference demo ------------------------------- In this step, you will run the inference demo script that compiles the model, checks accuracy, and benchmarks performance. Inside the container, run the inference demo: .. code-block:: bash inference_demo \ --model-type llama \ --task-type causal-lm \ run \ --model-path /root/model_hf/Llama-2-7b/ \ --compiled-model-path /root/traced_model/Llama-2-7b-demo/ \ --torch-dtype bfloat16 \ --tp-degree 96 \ --batch-size 2 \ --max-context-length 32 \ --seq-len 64 \ --on-device-sampling \ --enable-bucketing \ --top-k 1 \ --do-sample \ --pad-token-id 2 \ --prompt 'I believe the meaning of life is' \ --prompt 'The color of the sky is' \ --check-accuracy-mode token-matching \ --benchmark .. important:: * The inference demo takes approximately 20 minutes to complete on a trn2.48xlarge instance * The script will compile the model, validate accuracy, and run benchmarks * Set ``--tp-degree`` to match the number of NeuronCores you want to use (96 for trn2.48xlarge) Step 5: Verify the results --------------------------- In this step, you will confirm the inference demo completed successfully and review the benchmark results. Wait for the demo to complete. You will see output showing benchmark results: .. code-block:: text Benchmark completed and its result is as following { "e2e_model": { "latency_ms_p50": 8539.34, "latency_ms_p90": 8627.43, "latency_ms_p95": 8646.97, "latency_ms_p99": 8652.62, "latency_ms_p100": 8654.03, "latency_ms_avg": 8533.13, "throughput": 480.01 }, "context_encoding_model": { "latency_ms_p50": 132.42, "latency_ms_p90": 133.47, "latency_ms_p95": 133.59, "latency_ms_p99": 133.81, "latency_ms_p100": 133.86, "latency_ms_avg": 132.52, "throughput": 30908.75 }, "token_generation_model": { "latency_ms_p50": 7.84, "latency_ms_p90": 8.39, "latency_ms_p95": 8.47, "latency_ms_p99": 8.63, "latency_ms_p100": 28.96, "latency_ms_avg": 7.87, "throughput": 520434.73 } } Completed saving result to benchmark_report.json .. note:: You may see several red ``ERROR NRT:nrt_tensor_free`` errors at the end of the script output. These can be safely ignored - the actual benchmark results appear above these error messages. All complete! The benchmark results are saved to ``benchmark_report.json`` in the container. Understanding the results ------------------------- The benchmark output provides three key metrics: * **e2e_model**: End-to-end model performance including context encoding and token generation * **context_encoding_model**: Performance of processing the input prompt * **token_generation_model**: Performance of generating output tokens Each metric includes: * Latency percentiles (p50, p90, p95, p99, p100) in milliseconds * Average latency in milliseconds * Throughput in tokens per second Common issues ------------- Did you encounter an error while working through this tutorial? Here are common issues and solutions: - **Model download fails**: Verify you have accepted the Llama-2 license on HuggingFace and have valid AWS credentials - **Container won't start**: Check that you have sufficient Neuron devices allocated - **Compilation fails**: Ensure you have enough memory and the correct PyTorch version - **Slow performance**: Verify your ``tp-degree`` matches your available Neuron devices - **Memory issues**: Consider using a larger instance type or reducing batch size For additional help, refer to the complete NeuronX Distributed Inference documentation. Clean up -------- To clean up resources after completing this tutorial: 1. Exit the container: .. code-block:: bash exit 2. Stop and remove the container: .. code-block:: bash docker stop pytorch-inference-demo docker rm pytorch-inference-demo 3. Remove the model files if no longer needed: .. code-block:: bash rm -rf /home/ec2-user/model_hf/Llama-2-7b 4. Terminate your EC2 instance if no longer needed. Next steps ---------- Now that you've completed this tutorial, explore these related topics: * Learn more about NeuronX Distributed Inference configuration options * Explore different model architectures and optimization techniques * Set up production deployment with monitoring and logging Further reading --------------- - `NeuronX Distributed Inference Documentation <#>`_ - Complete documentation for inference on Neuron - `AWS Neuron SDK Documentation `_ - Full Neuron SDK reference - `Llama-2 Model Card `_ - Model details and license information ================================================ FILE: containers/getting-started.rst ================================================ .. _containers-getting-started: Getting started with Neuron DLC using Docker ============================================ .. tab-set:: .. tab-item:: Training .. dropdown:: Launch Trn1 Instance :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /setup/install-templates/launch-instance.txt .. dropdown:: Install Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. code:: bash # Configure Linux for Neuron repository updates sudo tee /etc/yum.repos.d/neuron.repo > /dev/null < /dev/null <= 2.26.26.0 (:ref:`tutorials/k8s-neuron-device-plugin`) * MPI operator installed on the cluster * An MPI job spec Instructions ------------ UltraServer Init Script ~~~~~~~~~~~~~~~~~~~~~~~ Download the UltraServer init script :download:`k8s-ultraserver-init-script.sh ` To use the script, either: - add it to your MPI job Dockerfile and build the image OR - create a new Dockerfile and build a new image from your MPI job image Example: .. code-block:: dockerfile FROM 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob COPY ultraserver-init-script.sh /tmp/ RUN chmod +x /tmp/ultraserver-init-script.sh ENTRYPOINT ["/tmp/ultraserver-init-script.sh"] Then add the 2 required init containers to the launcher pod. The first init container should utilize the /etc/mpi/discover_hosts.sh script to ensure that all worker pods are ready before continuing on to the UltraServer init script. The second init container should use the image containing ultraserver-init-script.sh. You can specify a value for NEURON_ULTRASERVER_NODE_CONFIG, which determines what UltraServer node config your MPI job will use, i.e. how many UltraServer nodes to use. Possible values are 4, 2, and 1, and the default value is 4. Example: .. code-block:: yaml apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: name: &job_name namespace: default spec: mpiReplicaSpecs: Launcher: replicas: 1 template: spec: containers: - name: mpitest image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob ... initContainers: - name: wait-hostfilename image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:mpijob command: - bash - -cx - | if [[ $(cat /etc/mpi/discover_hosts.sh | wc -l) != 1 ]]; then date echo "Ready" cat /etc/mpi/discover_hosts.sh else date echo "not ready ..." sleep 10 exit 1 fi while read host; do while ! ssh $host echo $host; do date echo "Pod $host is not up ..." sleep 10 done date echo "Pod $host is ready" done <<< "$(/etc/mpi/discover_hosts.sh)" resources: {} volumeMounts: - mountPath: /etc/mpi name: mpi-job-config - mountPath: /root/.ssh name: ssh-auth - name: ultraserver-init-container image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/ultraserver:init-container env: - name: NEURON_ULTRASERVER_NODE_CONFIG value: <"4", "2", OR "1"> volumeMounts: - mountPath: /etc/mpi name: mpi-job-config - mountPath: /root/.ssh name: ssh-auth - mountPath: /root/ultraserver_init name: ultraserver-init ... volumes: - name: ultraserver-init emptyDir: {} MPI Worker Pod Affinity ~~~~~~~~~~~~~~~~~~~~~~~ Single-node Job ^^^^^^^^^^^^^^^ 2-node job .. code-block:: yaml apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: name: &job_name namespace: default ... spec: mpiReplicaSpecs: Launcher: ... Worker: replicas: 2 template: spec: nodeSelector: node.kubernetes.io/instance-type: trn2u.48xlarge affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: training.kubeflow.org/job-name operator: NotIn values: - *job_name matchLabels: training.kubeflow.org/job-role: worker topologyKey: neuron.amazonaws.com/ultraserver-server-id-2 podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: training.kubeflow.org/job-role: worker training.kubeflow.org/job-name: *job_name topologyKey: neuron.amazonaws.com/ultraserver-server-id-2 ... 4-node job .. code-block:: yaml apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: name: &job_name namespace: default ... spec: mpiReplicaSpecs: Launcher: ... Worker: replicas: 4 template: spec: nodeSelector: node.kubernetes.io/instance-type: trn2u.48xlarge affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: training.kubeflow.org/job-name operator: NotIn values: - *job_name matchLabels: training.kubeflow.org/job-role: worker topologyKey: neuron.amazonaws.com/ultraserver-server-id-4 podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: training.kubeflow.org/job-role: worker training.kubeflow.org/job-name: *job_name topologyKey: neuron.amazonaws.com/ultraserver-server-id-4 ... Multi-node job ^^^^^^^^^^^^^^ .. code-block:: yaml apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: name: &job_name namespace: default ... spec: mpiReplicaSpecs: Launcher: ... Worker: replicas: 16 template: spec: nodeSelector: node.kubernetes.io/instance-type: trn2u.48xlarge affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: training.kubeflow.org/job-name operator: NotIn values: - *job_name matchLabels: training.kubeflow.org/job-role: worker topologyKey: neuron.amazonaws.com/ultraserver-server-id-4 podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: training.kubeflow.org/job-role: worker training.kubeflow.org/job-name: *job_name topologyKey: neuron.amazonaws.com/ultraserver-server-id-4 ... To use the affinity configuration, replace with your MPI job name and add it to your workload yaml spec. Confirm your work ----------------- To validate that the init container is working: .. code-block:: # Find the worker pods associated with your MPI job kubectl get pods # Get the logs of the init container kubectl logs -c ultraserver-init-container You should see logs under the init container. Example: .. code-block:: $ kubectl get pods NAME READY STATUS RESTARTS AGE demo-launcher-42lh9 0/1 Init:0/2 0 4s demo-worker-0 1/1 Running 0 4s demo-worker-1 1/1 Running 0 4s demo-worker-2 1/1 Running 0 4s demo-worker-3 1/1 Running 0 4s $ kubectl logs demo-launcher-42lh9 -c ultraserver-init-container Using 4-node config ... To validate that the affinity configuration is working: .. code-block:: # Find the worker pods and the nodes they are scheduled to kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName' # Compare the labels of the nodes to the kubectl get nodes \ -l neuron.amazonaws.com/ultraserver-mode \ -o=custom-columns='NAME:metadata.name,MODE:metadata.labels.neuron\.amazonaws\.com/ultraserver-mode,ULTRASERVER_SERVER_ID_2:metadata.labels.neuron\.amazonaws\.com/ultraserver-server-id-2,ULTRASERVER_NODE_ID_2:metadata.labels.neuron\.amazonaws\.com/ultraserver-node-id-2,ULTRASERVER_SERVER_ID_4:metadata.labels.neuron\.amazonaws\.com/ultraserver-server-id-4,ULTRASERVER_NODE_ID_4:metadata.labels.neuron\.amazonaws\.com/ultraserver-node-id-4' | awk 'NR==1{print;next}{print | "sort -k3,3 -k4,4"}' When looking at the nodes used by the worker pods, they should share the same ULTRASERVER_SERVER_ID_2 or ULTRASERVER_SERVER_ID_4 label based on which config you chose. Example when choosing a 4-node config: .. code-block:: $ kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName' POD_NAME NODE_NAME demo-launcher-42lh9 ip-172-32-5-227.ap-southeast-4.compute.internal demo-worker-0 ip-172-32-5-227.ap-southeast-4.compute.internal demo-worker-1 ip-172-32-11-17.ap-southeast-4.compute.internal demo-worker-2 ip-172-32-13-57.ap-southeast-4.compute.internal demo-worker-3 ip-172-32-9-4.ap-southeast-4.compute.internal $ kubectl get nodes \ -l neuron.amazonaws.com/ultraserver-mode \ -o=custom-columns='NAME:metadata.name,MODE:metadata.labels.neuron\.amazonaws\.com/ultraserver-mode,ULTRASERVER_SERVER_ID_2:metadata.labels.neuron\.amazonaws\.com/ultraserver-server-id-2,ULTRASERVER_NODE_ID_2:metadata.labels.neuron\.amazonaws\.com/ultraserver-node-id-2,ULTRASERVER_SERVER_ID_4:metadata.labels.neuron\.amazonaws\.com/ultraserver-server-id-4,ULTRASERVER_NODE_ID_4:metadata.labels.neuron\.amazonaws\.com/ultraserver-node-id-4' | awk 'NR==1{print;next}{print | "sort -k3,3 -k4,4"}' NAME MODE ULTRASERVER_SERVER_ID_2 ULTRASERVER_NODE_ID_2 ULTRASERVER_SERVER_ID_4 ULTRASERVER_NODE_ID_4 ip-172-32-11-17.ap-southeast-4.compute.internal 1_2_4 u5wy80u0o2saugxy 0 bog79p1y8tetj5uu 0 ip-172-32-13-57.ap-southeast-4.compute.internal 1_2_4 u5wy80u0o2saugxy 1 bog79p1y8tetj5uu 1 ip-172-32-5-227.ap-southeast-4.compute.internal 1_2_4 ygml2651y0lwdd46 0 bog79p1y8tetj5uu 2 ip-172-32-9-4.ap-southeast-4.compute.internal 1_2_4 ygml2651y0lwdd46 1 bog79p1y8tetj5uu 3 Common issues ------------- Init script fails to start ~~~~~~~~~~~~~~~~~~~~~~~~~~ If at least one of the worker pods isn't scheduled to a node, the init script will fail to start. Example: .. code-block:: $ kubectl get pods -o=custom-columns='POD_NAME:metadata.name,NODE_NAME:spec.nodeName' POD_NAME NODE_NAME demo-launcher-96xsl ip-172-32-9-4.ap-southeast-4.compute.internal demo-worker-0 demo-worker-1 demo-worker-2 demo-worker-3 $ kubectl logs demo-launcher-96xsl -c ultraserver-init-container Error from server (BadRequest): container "ultraserver-init-container" in pod "demo-launcher-96xsl" is waiting to start: PodInitializing Possible solution: Check your pods for affinity/scheduling issues. .. code-block:: $ kubectl describe pod demo-worker-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 3m13s default-scheduler 0/4 nodes are available: 4 node(s) didn't match pod affinity rules. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling. Related Information ------------------- - :ref:`kubernetes-getting-started` - Information about how to use Neuron on EKS - :ref:`tutorials/k8s-neuron-device-plugin` - Information about Neuron Device Plugin - :ref:`aws-trn2-arch` - Information about trn2 UltraServer architecture - :ref:`general-troubleshooting` - Information about general troubleshooting for Neuron - `MPI Operator `_ - Information about MPI Operator - `MPI User Guide `_ - Information about MPI jobs - `Kubernetes Pod Affinity `_ - Information about pod affinity rules - `YAML anchors `_ - Information about YAML anchors ================================================ FILE: containers/index.rst ================================================ .. meta:: :description: AWS Neuron Deep Learning Containers (DLCs) are pre-configured Docker images for training and serving models on AWS Trainium and Inferentia instances with the Neuron SDK. :keywords: Neuron Containers, Deep Learning Containers, DLC, Docker, Kubernetes, EKS, ECS, AWS Neuron, Trainium, Inferentia, vLLM, Container Deployment :date-modified: 01/22/2026 .. _neuron_containers: Neuron Containers ================= This section contains the technical documentation for using AWS Neuron Deep Learning Containers (DLCs) and containerized deployments on Inferentia and Trainium instances. .. toctree:: :maxdepth: 1 :hidden: Getting Started Locate Neuron DLC Images Customize DLC Neuron Plugins Tutorials How-To Guides FAQ DRA Release Notes What are Neuron Deep Learning Containers? ------------------------------------------ AWS Neuron Deep Learning Containers (DLCs) are a set of pre-configured Docker images for training and serving models on AWS Trainium and Inferentia instances using the AWS Neuron SDK. Each DLC is optimized for specific ML frameworks and comes with all Neuron components pre-installed, enabling you to quickly deploy containerized workloads without manual setup. With Neuron DLCs, developers can: * Deploy production-ready containers with pre-installed Neuron SDK and ML frameworks * Use containers across multiple deployment platforms including EC2, EKS, ECS, and SageMaker * Customize DLCs to fit specific project requirements * Leverage Neuron plugins for better observability and fault tolerance * Run distributed training and inference workloads with vLLM integration * Schedule MPI jobs on Trn2 UltraServers for improved performance Neuron DLCs support popular ML frameworks including PyTorch, TensorFlow, and JAX, and are available for both training and inference workloads on Inf1, Inf2, Trn1, Trn1n, and Trn2 instances. .. admonition:: Neuron DRA for Kubernetes Neuron has released support for Dynamic Resource Allocation (DRA) with Kubernetes. :doc:`Read more about it here `. Quickstarts ----------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Quickstart: Deploy a DLC with vLLM :link: quickstart_vllm_dlc_deploy :link-type: ref :class-card: sd-rounded-3 Get started by configuring and deploying a Deep Learning Container with vLLM for inference. Time to complete: ~30 minutes. .. grid-item-card:: Quickstart: Build a Custom Neuron Container :link: containers-getting-started :link-type: ref :class-card: sd-rounded-3 Learn how to build a custom Neuron container using Docker for training or inference workloads. Neuron Containers Documentation -------------------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Getting Started :link: containers-getting-started :link-type: ref :class-card: sd-rounded-3 Step-by-step guide for building Neuron containers using Docker, including driver installation and container setup. .. grid-item-card:: Locate Neuron DLC Images :link: locate-neuron-dlc-image :link-type: ref :class-card: sd-rounded-3 Find the right pre-configured Deep Learning Container image for your ML framework and instance type. .. grid-item-card:: Customize Neuron DLC :link: containers-dlc-then-customize-devflow :link-type: ref :class-card: sd-rounded-3 Learn how to customize Neuron Deep Learning Containers to fit your specific project requirements. .. grid-item-card:: Neuron Plugins :link: neuron-container-plugins :link-type: ref :class-card: sd-rounded-3 Explore Neuron plugins for containerized environments, providing better observability and fault tolerance. .. grid-item-card:: Tutorials :link: /containers/tutorials :link-type: doc :class-card: sd-rounded-3 Hands-on tutorials for deploying containers on EC2, EKS, ECS, and other platforms with various configurations. .. grid-item-card:: How-To: Schedule MPI Jobs on UltraServers :link: containers-how-to-ultraserver :link-type: ref :class-card: sd-rounded-3 Learn how to schedule MPI jobs to run on Neuron UltraServers in EKS for improved performance. .. grid-item-card:: FAQ & Troubleshooting :link: container-faq :link-type: ref :class-card: sd-rounded-3 Frequently asked questions and solutions for common issues with Neuron containers. .. grid-item-card:: Neuron Containers Release Notes :link: /release-notes/components/containers :link-type: doc :class-card: sd-rounded-3 Review the latest updates, new DLC images, and improvements in Neuron container releases. ================================================ FILE: containers/k8.rst ================================================ .. _self-managed-kubernetes-service: Self Managed Kubernetes Service =============================== Introduction ------------ Use of Neuron in containers on a Kubernetes cluster can be simple to achieve by following :ref:`tutorial-k8s-env-setup-for-neuron` Known Limitations ----------------- Scheduling on k8s cluster requires contiguous neuron device-ids. Neuron provides a scheduler extension to solve this problem for self-managed k8 clusters. Read more about it here: :ref:`neuron-k8-scheduler-ext`. ================================================ FILE: containers/kubernetes-getting-started.rst ================================================ .. _kubernetes-getting-started: Using Neuron with Amazon EKS ============================= .. contents:: Table of Contents :local: :depth: 2 .. _tutorial-k8s-env-setup-for-neuron: EKS Setup for Neuron -------------------- Customers that use Kubernetes can conveniently integrate Inf/Trn instances into their workflows. This section provides step-by-step instructions for setting up an EKS cluster with Neuron support. Prerequisites ~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-prerequisite.rst Neuron Helm Chart ~~~~~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-neuron-helm-chart.rst .. _k8s-neuron-device-plugin: Neuron Device Plugin ~~~~~~~~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-neuron-device-plugin.rst .. _neuron_scheduler: Neuron Scheduler Extension ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-neuron-scheduler.rst Neuron Node Problem Detector and Recovery ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.rst .. include:: /containers/tutorials/k8s-neuron-problem-detector-and-recovery.rst Neuron Monitor Daemonset ~~~~~~~~~~~~~~~~~~~~~~~~~ .. include:: /containers/tutorials/k8s-neuron-monitor.rst ================================================ FILE: containers/locate-neuron-dlc-image.rst ================================================ .. _locate-neuron-dlc-image: Neuron Deep Learning Containers =============================== .. contents:: Table of Contents :local: :depth: 2 Overview -------- AWS Deep Learning Containers (DLCs) provide a set of Docker images that are pre-installed with deep learning frameworks. The containers are optimized for performance and available in Amazon Elastic Container Registry (Amazon ECR). DLCs make it straightforward to deploy custom ML environments in a containerized manner, while taking advantage of the portability and reproducibility benefits of containers. AWS Neuron DLCs are a set of Docker images for training and serving models on AWS Trainium and Inferentia instances using AWS Neuron SDK. The sections below list all of the AWS Neuron DLCs, as well as the AWS DLCs that come pre-installed with the Neuron SDK. Inference Containers -------------------- .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - DLC Name - DLC Link(s) - Tutorial(s) * - Neuron Inference Containers - | `Neuron PyTorch Inference Containers `_ | `Neuronx PyTorch Inference Containers `_ | `Neuronx PyTorch vLLM Inference Containers `_ - | :ref:`tutorial-infer` | :ref:`torchserve-neuron` | :ref:`quickstart_vllm_dlc_deploy` * - Large Model Inference (LMI)/Deep Java Library (DJL) Containers - `LMI Containers `_ - * - HuggingFace Inference Containers - | `HuggingFace Neuron Inference Containers `_ | `HuggingFace Neuron vLLM Containers `_ | `HuggingFace Text Generation Inference (TGI) Containers `_ - * - Triton Inference Containers - `NVIDIA Triton Inference Containers `_ - Training Containers ------------------- .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - DLC Name - DLC Link(s) - Tutorial(s) * - Neuron Training Containers - | `Neuronx PyTorch Training Containers `_ | `Neuronx Jax Training Containers `_ - :ref:`tutorial-training` * - HuggingFace Training Containers - `HuggingFace Neuron Training Containers `_ - .. note:: Latest HuggingFace Neuron containers are also available on the `HuggingFace Optimum website `_. Getting started with Neuron DLC using Docker ---------------------------------------------- :ref:`containers-getting-started` Using containers on AWS services ---------------------------------- :ref:`Amazon EKS` ^^^^^^^^^^^^^^^^^^^^^^^^^^^ :ref:`Amazon ECS` ^^^^^^^^^^^^^^^^^^^^^^^^^^^ :ref:`Amazon SageMaker` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :ref:`AWS Batch` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Customizing Neuron Deep Learning Containers ------------------------------------------- Deep Learning Containers can be customized to fit your specific project needs. To read more, visit :ref:`containers-dlc-then-customize-devflow`. ================================================ FILE: containers/neo-then-hosting-devflow.rst ================================================ .. include:: /devflows/inference/neo-then-hosting-devflow.rst ================================================ FILE: containers/neuron-dra.rst ================================================ .. meta:: :description: AWS Neuron Dynamic Resource Allocation (DRA) for Kubernetes :keywords: AWS, Neuron, DRA, Kubernetes, Dynamic Resource Allocation .. _neuron-dra: ================================================= AWS Neuron Dynamic Resource Allocation (DRA) ================================================= What is DRA? ------------ Prior to Kubernetes 1.33, Kubernetes used device plugins for resource management. The Neuron device plugin implements the device plugin interface to allow Kubernetes scheduler to manage Neuron resources. However, the device plugin framework only tracks device count—the scheduler cannot see device attributes. Due to this limitation, the framework cannot natively facilitate attribute-based filtering during device selection. For example, the default Kubernetes scheduler prior to DRA cannot support allocation of connected devices without additional mechanisms such as a scheduler extension. Dynamic Resource Allocation (DRA) is a new framework for advanced resource management that addresses this limitation. DRA enables the scheduler to see the device attributes, allowing workloads to select devices based on specific attributes and achieve topology aware allocation. Hardware vendors determine which attributes are published for their hardware. The AWS Neuron DRA driver implements the kubelet plugin for DRA for AWS Trainium instances. For more information on DRA, refer to `Kubernetes Dynamic Resource Allocation `_. Where can I get the Neuron DRA driver and resource templates? ------------------------------------------------------------------- To review and download the individual resource claim templates, visit this page: * :doc:`/containers/files/index-dra`. What are the benefits of using DRA over device plugin? ------------------------------------------------------- **Reduced developer complexity** Device plugin-based workloads use node labels along with request and limits to allocate right resources. Example: .. code-block:: yaml Worker: replicas: 4 template: spec: containers: - image: .dkr.ecr.us-west-2.amazonaws.com/neuronx_nemo:latest name: mpitest imagePullPolicy: Always resources: limits: aws.amazon.com/neuron: "16" vpc.amazonaws.com/efa: "16" requests: aws.amazon.com/neuron: "16" vpc.amazonaws.com/efa: "16" volumeMounts: - name: dshm mountPath: /dev/shm volumes: - name: dshm emptyDir: medium: Memory DRA introduces ``ResourceClaim`` and ``ResourceClaimTemplates`` which provide abstraction: .. code-block:: yaml Worker: replicas: 4 template: spec: containers: - image: .dkr.ecr.us-west-2.amazonaws.com/neuronx_nemo:latest name: mpitest imagePullPolicy: Always resources: claims: - name: neurons volumeMounts: - name: dshm mountPath: /dev/shm volumes: - name: dshm emptyDir: medium: Memory resourceClaims: - name: neurons resourceClaimTemplateName: efa-neurons-4-devices The ``ResourceClaimTemplate`` name is a given name and can be defined by the ML infra operators to be friendly to their developers. The RCT definition translates the name into the underlying allocation details - these are abstracted away from ML developers. **Rich interface for resource requests** With DRA, resource requests can specify attribute-based selection. For example, RCT can follow requests, which was not possible to do with device plugins without additional node labeling and extensions. This interface allows us to facilitate topology-aware scheduling. * Allocate connected neuron devices from trn2 instance type and the devices in the set need to be running specified Neuron driver version. * Allocate a specific set of neuron devices for my pod - I want the pod to use devices in row 1 of the topology. **Dynamic configuration** DRA allows end users to specify additional configuration for the device via RCT. The Neuron DRA driver leverages this capability to allow ResourceClaimTemplates to specify LNC size to be used for the allocation. An example is shown below. The end user need not configure LNC via launch template while using Neuron devices with Neuron DRA driver. .. code-block:: yaml #Template will be vended by Neuron via documentation/code repo apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: namespace: neuron-test7 name: lnc-neurons spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: device.attributes['neuron.aws.com'].instanceType == "trn2.48xlarge" allocationMode: all config: - opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: NeuronConfig logicalNeuronCore: 1 requests: ["neurons"] Prerequisites ----------------------------- * **Kubernetes version** - Please use K8s control plane 1.34+ * **Instance type** - Trn2.48xlarge launched with K8s version 1.34.2+ For instructions on how to setup an EKS cluster, please refer to :ref:`prerequisites`. Installation via Helm --------------------- Connect to your cluster from local box. The cluster should have at least one trn2.48xlarge node. Do not install the Neuron device plugin on the cluster! Please confirm the cluster being used via: .. code-block:: bash kubectl config current-context Then install the DRA driver: .. code-block:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \ --set "devicePlugin.enabled=false" --set "npd.enabled=false" --set "draDriver.enabled=true" Example 1 – Connected Neuron Devices -------------------------------------- This section will demonstrate how to run a workload that needs to request a subset of connected Neuron Devices from a trn2.48xlarge instance. Before DRA, this use case required using Neuron Scheduler Extension. With DRA, this allocation is enabled natively. * [:download:`Download example YAML file `] The supported subsets include set of 1, 4, 8 or 16. Specifically, these are ``resource.aws.com/devicegroup1_id``, ``resource.aws.com/devicegroup4_id``, ``resource.aws.com/devicegroup8_id``, ``resource.aws.com/devicegroup16_id`` respectively. The sets of 4 and 8 are selected as shown in diagram below: .. image:: /containers/images/neuron-dra-connected-devices.jpeg :alt: Connected Neuron Devices :width: 600px To enable a workload to consume a connected subset of Neuron Devices, first create a ``ResourceClaimTemplate`` that requests a connected set of Neuron devices. From the package run: .. code-block:: bash kubectl apply -f specs/1x4-connected-devices.yaml This workload definition (which includes the ``ResourceClaimTemplate``) is shown below for quick reference: .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: 1x4-connected-neurons spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com allocationMode: ExactCount count: 4 selectors: - cel: expression: "device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'" constraints: - requests: ["neurons"] matchAttribute: "resource.aws.com/devicegroup4_id" Next step is to reference the ``ResourceClaimTemplate`` in a pod definition as shown below: .. code-block:: yaml --- apiVersion: v1 kind: Pod metadata: name: pod0 labels: app: pod spec: containers: - name: ctr0 image: public.ecr.aws/ubuntu/ubuntu:22.04 command: ["bash", "-c"] args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"] resources: claims: - name: neurons resourceClaims: - name: neurons resourceClaimTemplateName: 1x4-connected-neurons Deploy the above workload using ``kubectl apply``. When the pod is running, examine the related ``ResourceClaim`` using: .. code-block:: bash kubectl get resourceclaim -o yaml The ``resourceclaim`` output will show the 4 Neuron Devices that were allocated to the pod. An example is shown below. These will be connected Neuron Devices. .. code-block:: bash [devbox]$ kubectl get pod NAME READY STATUS RESTARTS AGE --------------------------------------- pod0 1/1 Running 0 3s [devbox]$ kubectl get resourceclaim NAME STATE AGE --------------------------------------------- pod0-neurons-zdk76 allocated,reserved 9s [devbox]$ kubectl get resourceclaim pod0-neurons-zdk76 -o yaml Status shown below: .. code-block:: yaml status: allocation: devices: results: - adminAccess: null device: neurondevice2 driver: neuron.aws.com pool: ip-1-1-1-1.region.compute.internal request: neurons - adminAccess: null device: neurondevice3 driver: neuron.aws.com pool: ip-1-1-1-1.region.compute.internal request: neurons - adminAccess: null device: neurondevice1 driver: neuron.aws.com pool: ip-1-1-1-1.region.compute.internal request: neurons - adminAccess: null device: neurondevice0 driver: neuron.aws.com pool: ip-1-1-1-1.region.compute.internal request: neurons .. note:: The RCT name can be simplified to communicate the intent of the allocation and abstract the allocation details away from ML developers. **Example RCT1 - "xl" - Allocate All 16 devices** .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: xl-trn2 spec: spec: devices: requests: - name: neurons exactly: allocationMode: ExactCount count: 16 deviceClassName: neuron.aws.com selectors: - cel: expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge' **Example RCT2 - large - Allocate 8 devices** .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: l-trn2 spec: spec: devices: constraints: - matchAttribute: resource.aws.com/devicegroup8_id requests: - neurons requests: - name: neurons exactly: allocationMode: ExactCount count: 8 deviceClassName: neuron.aws.com selectors: - cel: expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge' **Example RCT2 - 2.27-driver – Allocate 8 devices with driver version at the driver published by Neuron SDK 2.27** `Neuron 2.27.0 Runtime `_ .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: 2.27-driver-trn2 spec: spec: devices: constraints: - matchAttribute: resource.aws.com/devicegroup8_id requests: - neurons requests: - name: neurons exactly: allocationMode: ExactCount count: 8 deviceClassName: neuron.aws.com selectors: - cel: expression: device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge' && device.attributes['neuron.aws.com'].neuronDriverVersion == '2.25.4.0' Example 2 - Dynamic LNC config ------------------------------ This example shows how to set LNC per workload. Earlier, overriding LNC on a Node required a node template. With DRA, workloads can override default LNC via ``ResourceClaim.`` * [:download:`Download example YAML file `] Apply the following workload definition: .. code-block:: bash kubectl apply -f specs/lnc-setting-trn2.yaml This workload definition (which includes the ``ResourceClaimTemplate``) is shown below for quick reference: .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: all-neurons-lnc-1 spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].instanceType == 'trn2.48xlarge'" allocationMode: All config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: NeuronConfig logicalNeuronCore: 1 Then deploy a pod that references the above ``ResourceClaimTemplate`` as shown below: .. code-block:: yaml apiVersion: v1 kind: Pod metadata: name: pod0 labels: app: pod spec: containers: - name: ctr0 image: public.ecr.aws/ubuntu/ubuntu:22.04 command: ["bash", "-c"] args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"] resources: claims: - name: neurons resourceClaims: - name: neurons resourceClaimTemplateName: all-neurons-lnc-1 Example 3 – Four Node Inference on trn2u.48xlarge -------------------------------------------------- A trn2u.48xlarge Trn2 UltraServer has 4 Trn2 nodes interconnected by Neuron Links. trn2u.48xlarge instances can be allocated in set of 1, 2, or 4. The Neuron DRA driver can utilize 1 or more ``ResourceClaimTemplate`` definitions to convey the desired size of the set. The ``ResourceClaimTemplate`` allows end users to specify "UltraServerConfig" to declare their intent to use all 4 nodes of the UltraServer. This configuration value is passed by the Neuron DRA driver to the Neuron runtime and collectives inside the container. * [:download:`Download example YAML file `] Example yaml for 4-node inference on trn2u.48xlarge: .. code-block:: yaml apiVersion: resource.k8s.io/v1 kind: ResourceClaimTemplate metadata: name: us-4-node-config spec: spec: devices: requests: - name: neurons exactly: deviceClassName: neuron.aws.com selectors: - cel: expression: "device.attributes['neuron.aws.com'].resourceType == 'neuron_node'" allocationMode: ExactCount count: 1 config: - requests: ["neurons"] opaque: driver: neuron.aws.com parameters: apiVersion: neuron.aws.com/v1 kind: UltraServerConfig ultraserverMode: 4 --- apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: vllm annotations: leaderworkerset.sigs.k8s.io/exclusive-topology: neuron.amazonaws.com/ultraserver-server-id-4 spec: rolloutStrategy: type: RollingUpdate rollingUpdateConfiguration: maxUnavailable: 1 maxSurge: 1 # Two replica groups of 4 nodes each, i.e. two ultraservers replicas: 2 leaderWorkerTemplate: size: 4 restartPolicy: RecreateGroupOnPodRestart leaderTemplate: metadata: labels: role: leader spec: containers: - name: vllm-leader image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-4-node-config workerTemplate: metadata: labels: role: worker spec: containers: - name: vllm-worker image: public.ecr.aws/ubuntu/ubuntu:22.04 command: - sh - -c - "sleep infinity" resources: claims: - name: one-node-from-ultraserver resourceClaims: - name: one-node-from-ultraserver resourceClaimTemplateName: us-4-node-config Neuron DRA Driver Attributes Reference --------------------------------------- The Neuron DRA driver publishes the following attributes in resource slices. These attributes can be used in ``ResourceClaimTemplate`` CEL expressions to filter and select specific devices for allocation. Common Attributes ^^^^^^^^^^^^^^^^^ These attributes are common to all Neuron instances and their devices: * ``deviceId`` - An integer value representing the ID of the Neuron device. Used to identify which device is chosen from allocation. * ``instanceType`` - A string value representing the EC2 instance type of the Neuron device. Used to specify devices of which instance(s) to choose for allocation. * ``neuronDriverVersion`` - A string value representing the Neuron driver version running on the instance. Used to claim instances with the same driver version for allocation. * ``draDriverVersion`` - A version value of the Neuron DRA driver version. Provides visibility on which Neuron DRA driver version published the resource slice. * ``resourceType`` - A string value to distinguish between devices and UltraServer nodes. For devices, this value is ``neuron_device``. For UltraServers, this value is ``neuron_node``. * ``networkNodeLayer1`` - A string value representing network node layer 1. Can be used during topology-aware scheduling to minimize network latency and optimize instance placement. See `EC2 Instance Topology `_. * ``networkNodeLayer2`` - A string value representing network node layer 2. Can be used to allocate workloads to nodes on the same spine. See `EC2 Instance Topology `_. * ``networkNodeLayer3`` - A string value representing network node layer 3. Can be used during topology-aware scheduling to minimize network latency and optimize instance placement. See `EC2 Instance Topology `_. Trn Non-UltraServer Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These attributes are only populated for Neuron instances that have grid topology (trn) and are not UltraServers: * ``topology_x`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select a specific device or devices that belong to the same row. * ``topology_y`` - An integer value representing the column of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select a specific device or devices that belong to the same column. * ``topology4_id`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than 1. Can be used to select devices that belong to the same row. * ``topology8_id`` - An integer value representing the row of the device in a grid topology. Only populated when the number of devices in the instance is greater than or equal to 8. Can be used to select devices that belong to the same two rows. Trn UltraServer Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^ These attributes are only populated for Neuron instances that have grid topology (trn) and are UltraServers: * ``capacityBlockId`` - A string value representing the ID of the capacity block that the UltraServer instance is in. See `Instance Topology API `_. EFA-Enabled Instance Attributes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ These attributes are only populated for Neuron instances that are EFA-enabled: * ``resource.aws.com/devicegroup1_id`` - A string value representing the EFA Bus:Device:Function (BDF) corresponding to that device. * ``resource.aws.com/devicegroup4_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 4 get the same group ID. * ``resource.aws.com/devicegroup8_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 8 get the same group ID. * ``resource.aws.com/devicegroup16_id`` - A string value representing a hash, ensuring Neuron devices in the same topology group of 16 get the same group ID. FAQs ---- Can DRA plugin co-exist with other device plugins? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Device plugins and the DRA plugin can coexist in the same cluster, but **not** for the same node. As of now, the two mechanisms act independently. Neuron is preparing an upcoming feature that will allow device plugin based allocations to work with DRA, but the feature is still in alpha and not enabled on EKS. Ref: `Extended Resource `_. Is DRA replacing Neuron Device Plugin and Scheduler Extension? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We will continue to support the Neuron Device Plugin and Scheduler Extension as long as: 1. Upstream Kubernetes continues to support device plugins. 2. EKS continues to support Kubernetes versions below 1.34 (which do not support DRA). What Kubernetes versions are supported? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Kubernetes control plane must be on 1.34. For Node AMI, we support 1.34.2+. We do not support Node AMI for 1.34.0 or 1.34.1 since it had a regression in DRA. Upstream issue: `Kubernetes Issue #133920 `_ Where can I learn more about how to put together RCT using CEL expressions? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To learn more about RCTs, please visit `Kubernetes Dynamic Resource Allocation `_. To learn more about CEL expressions, please visit `CEL Language `_. Send us feedback and let us know which additional RCT examples you would like us to provide in the source code. .. toctree:: :maxdepth: 1 :hidden: Support Files ================================================ FILE: containers/neuron-plugins.rst ================================================ .. _neuron-container-plugins: Neuron Plugins for Containerized Environments ============================================= This section provides an overview of the Neuron infrastructure components for containerized environments. For detailed setup instructions, see :ref:`tutorial-k8s-env-setup-for-neuron`. Neuron Device Plugin -------------------- Exposes Neuron hardware resources to Kubernetes as schedulable resources (``aws.amazon.com/neuron`` and ``aws.amazon.com/neuroncore``). The device plugin discovers Neuron devices on each node, advertises them to the scheduler, and manages allocation to Pods with exclusive access. Neuron Scheduler Extension --------------------------- Provides topology-aware scheduling for optimal Neuron device allocation. It considers device connectivity and placement to ensure efficient utilization. This component is optional and most beneficial for workloads requesting specific subsets of Neuron devices or cores. Neuron Node Problem Detector and Recovery ------------------------------------------ Monitors Neuron device health and detects hardware and software errors. When unrecoverable issues occur, it can mark nodes as unhealthy and trigger node replacement. It also publishes CloudWatch metrics under the ``NeuronHealthCheck`` namespace for monitoring. For ECS environments, see :ref:`ecs-neuron-problem-detector-and-recovery`. Neuron Monitor -------------- Collects and exposes metrics from Neuron devices including hardware utilization, performance counters, memory usage, and device health. Supports integration with observability platforms like Prometheus for monitoring and alerting. Neuron Dynamic Resource Allocation (DRA) Driver ----------------------------------------------- Manages Neuron hardware resources in a Kubernetes environment. It integration with Kubernetes Dynamic Resource Allocation (DRA) framework to advertise Neuron devices and their attributes. This feature cannot be used alongside Neuron device plugin for nodes of the same cluster. For more information on Neuron DRA driver, please refer to :ref:`neuron-dra` ================================================ FILE: containers/neuron_dlc_images.csv ================================================ Framework,Neuron Package,Job Type,Supported EC2 Instance Types,Python Version Options,ECR Public Repo URL,Image Details,Other Packages PyTorch 2.1.2,"aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx",inference,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx,torchserve PyTorch 2.1.2,"aws-neuronx-tools, neuronx_distributed, torch-neuronx",training,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-training-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx, PyTorch 1.13.1,"aws-neuronx-tools, torch-neuron",inference,inf1,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuron,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuron,torchserve PyTorch 1.13.1,"aws-neuronx-tools, neuronx_distributed, torch-neuronx, transformers-neuronx",inference,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-inference-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-inference-neuronx,torchserve PyTorch 1.13.1,"aws-neuronx-tools, neuronx_distributed, torch-neuronx",training,trn1 and inf2,3.10 (py310),https://gallery.ecr.aws/neuron/pytorch-training-neuronx,https://github.com/aws-neuron/deep-learning-containers#pytorch-training-neuronx, ================================================ FILE: containers/troubleshooting.rst ================================================ .. _container-troubleshooting: Troubleshooting Neuron Containers ================================= This document aims to provide more information on how to fix issues you might encounter while using the Neuron Containers. For each issue we will provide an explanation of what happened and what can potentially correct the issue. If your issue is not listed below or you have a more nuanced problem, contact us via `issues `__ posted to this repo, the `AWS Neuron developer forum `__, or through AWS support. Neuron Container includes the following Neuron Components. For issues relating to these components inside the container refer the individual component troubleshooting guides :ref:`general-troubleshooting` * Neuron Runtime/Driver * Pytorch/Tenosrflow/MXNet frameworks * Libfabric/EFA The following are container specific issues Neuron Device Not found ----------------------- The neuron container expects the neuron devices to be exposed to the container as referenced in :ref:`container-devices`. Please look at the container logs to see messages like below :: 2022-Sep-08 17:55:23.0768 19:19 ERROR TDRV:tdrv_get_dev_info No neuron device available If the above message is seen then devices are not exposed to container Solution '''''''' * Refer :ref:`container-devices` and make sure the devices are exposed to container * If specific cores are being used refer :ref:`container-cores` and make sure the cores are exposed to container * In kubernetes environment refer :ref:`k8s-specify-devices` or :ref:`k8s-specify-cores` to make sure neuron devices/cores are there in pods container spec Contiguous Device ID's ----------------------- Neuron runtime expects the inferentia/trainium device id's to be contigious. If the device id's are not contiguous you might see error messages like below :: 2022-Sep-08 21:52:11.0307 7:7 ERROR TDRV:tdrv_init_mla_phase1 Could not open the nd1 :: 2022-Sep-08 23:00:05.0667 8:8 ERROR NRT:nrt_allocate_neuron_cores Neuron cores are not contiguous Solution '''''''' * In the docker run command make sure the devices specified using --device are all contiguous * If oci neuron hook is used and the env variable AWS_NEURON_VISIBLE_DEVICES is used then make sure the devices specified are all contiguous * In kubernetes environment with just the neuron device plugin running there is no guarantee that the devices allocated will be contiguous. Make sure to run the neuron scheduler extension as specified in :ref:`neuron-k8-scheduler-ext` ================================================ FILE: containers/tutorial-docker-runtime1.0.rst ================================================ .. _tutorial-docker-environment-setup-for-neuron-runtime-10: Tutorial: Docker environment setup for Neuron Runtime 1.x ========================================================= Introduction ------------ A Neuron application can be deployed using docker containers. This tutorial describes how to configure docker to expose Inferentia devices to containers. Once the environment is setup, a container can be started with *AWS_NEURON_VISIBLE_DEVICES* environment variable to specify desired set of Inferentia devices to be exposed to the container. AWS_NEURON_VISIBLE_DEVICES is a set of contiguous comma-seperated inferentia logical ids. To find out the available logical ids on your instance, run the neuron-ls tool. For example, on inf1.6xlarge instance with 4 inferentia devices, you may set AWS_NEURON_VISIBLE_DEVICES="2,3" to expose the last two devices to a container. When running neuron-ls inside a container, you will only see the set of exposed Inferentias. For example: .. code:: bash docker run --env AWS_NEURON_VISIBLE_DEVICES="0" neuron-test neuron-ls Would produce the following output: :: +--------------+---------+--------+-----------+-----------+------+------+ | PCI BDF | LOGICAL | NEURON | MEMORY | MEMORY | EAST | WEST | | | ID | CORES | CHANNEL 0 | CHANNEL 1 | | | +--------------+---------+--------+-----------+-----------+------+------+ | 0000:00:1f.0 | 0 | 4 | 4096 MB | 4096 MB | 0 | 0 | +--------------+---------+--------+-----------+-----------+------+------+ Steps: ------ This tutorial starts from a fresh Ubuntu Server 16.04 LTS AMI "ami-08bc77a2c7eb2b1da". Step 1: install aws-neuron-runtime-base package ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Follow the :ref:`install-guide-index` to setup access to Neuron repos. Then, install the aws-neuron-runtime-base package. .. code:: bash sudo apt-get install aws-neuron-runtime-base Step 2: Make sure that the neuron-rtd service is not running ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If neuron-rtd is running on the host, stop the neuron-rtd service before starting the containerized neuron-rtd. This is needed to allow assignment of devices to containers: .. code:: bash sudo service neuron-rtd stop Step 3: install oci-add-hooks dependency ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `oci-add-hooks `__ is an OCI runtime with the sole purpose of injecting OCI prestart, poststart, and poststop hooks into a container config.json before passing along to an OCI compatable runtime. oci-add-hooks is used to inject a hook that exposes Inferentia devices to the container. .. code:: bash sudo apt install -y golang && \ export GOPATH=$HOME/go && \ go get github.com/joeshaw/json-lossless && \ cd /tmp/ && \ git clone https://github.com/awslabs/oci-add-hooks && \ cd /tmp/oci-add-hooks && \ make build && \ sudo cp /tmp/oci-add-hooks/oci-add-hooks /usr/local/bin/ .. _step-4-setup-docker-to-use-oci-neuron-oci-runtime: Step 4: setup Docker to use oci-neuron OCI runtime. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ oci-neuron is a script representing OCI compatible runtime. It wraps oci-add-hooks, which wraps runc. In this step, we configure docker to point at oci-neuron OCI runtime. Install dockerIO: .. code:: bash sudo apt install -y docker.io sudo usermod -aG docker $USER Logout and log back in to refresh membership. Place daemon.json Docker configuration file supplied by Neuron SDK in default location. This file specifies oci-neuron as default docker runtime: .. code:: bash sudo cp /opt/aws/neuron/share/docker-daemon.json /etc/docker/daemon.json sudo service docker restart If the docker restart command fails, make sure to check if the docker systemd service is not masked. More information on this can be found here: https://stackoverflow.com/a/37640824 Verify docker: .. code:: bash docker run hello-world Expected result: :: Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/ Build a docker image using provided dockerfile :ref:`neuron-runtime-dockerfile`, and use to verify whitelisting: .. code:: bash docker build . -f Dockerfile.neuron-rtd -t neuron-test Then run: .. code:: bash docker run --env AWS_NEURON_VISIBLE_DEVICES="0" neuron-test neuron-ls Expected result: :: +--------------+---------+--------+-----------+-----------+------+------+ | PCI BDF | LOGICAL | NEURON | MEMORY | MEMORY | EAST | WEST | | | ID | CORES | CHANNEL 0 | CHANNEL 1 | | | +--------------+---------+--------+-----------+-----------+------+------+ | 0000:00:1f.0 | 0 | 4 | 4096 MB | 4096 MB | 0 | 0 | +--------------+---------+--------+-----------+-----------+------+------+ ================================================ FILE: containers/tutorials/build-run-neuron-container.rst ================================================ .. _how-to-build-neuron-container: Tutorial How to Build and Run a Neuron Container ================================================ Introduction ------------ This document explains how to build a Neuron Container using an existing Dockerfile. Pre-requisites -------------- #. Docker version 18 or newer is configured according to :ref:`tutorial-docker-env-setup` #. Inf1/Trn1 instance with available :ref:`Neuron Devices` #. If running a serving application such as tensorflow-model-server, torchserve or multi-model-server, make sure the appropriate ports that the server listens to are exposed using EXPOSE in the Dockerfile or the arguments ``-p 80:8080`` on the ``docker run`` command. .. _running-application-container: Build and Run the Application Container --------------------------------------- Follow the steps below for creating neuron application containers. - Build a docker image using provided dockerfile :ref:`libmode-dockerfile` for Inf1 and :ref:`trainium-dlc-dockerfile` for Trn1 (also for Trn1 the dockerfile needs mlp train script found here at :ref:`mlp-train` .. code:: bash docker build . -f Dockerfile.pt -t neuron-container:pytorch - Run the container locally: .. code:: bash docker run -it --name pt17 --device=/dev/neuron0 neuron-container:pytorch neuron-ls Expected result for Inf1: :: +--------------+---------+--------+-----------+-----------+------+------+ | PCI BDF | LOGICAL | NEURON | MEMORY | MEMORY | EAST | WEST | | | ID | CORES | CHANNEL 0 | CHANNEL 1 | | | +--------------+---------+--------+-----------+-----------+------+------+ | 0000:00:1f.0 | 0 | 4 | 4096 MB | 4096 MB | 0 | 0 | +--------------+---------+--------+-----------+-----------+------+------+ Expected result for Trn1: :: +--------+--------+--------+-----------+---------+ | NEURON | NEURON | NEURON | CONNECTED | PCI | | DEVICE | CORES | MEMORY | DEVICES | BDF | +--------+--------+--------+-----------+---------+ | 0 | 4 | 8 GB | 1 | 00:1f.0 | +--------+--------+--------+-----------+---------+ .. note:: If instead of the --device option above if the env variable AWS_NEURON_VISIBLE_DEVICES is to be used then the oci hook needs to installed by following instructions in :ref:`tutorial-oci-hook` Important to know ----------------- .. _container-devices: Devices ^^^^^^^ - The docker native way is to use --device /dev/neuron# for each of the Neuron Devices intended to be passed. When using --device option ALL/all is not supported. .. code:: bash docker run --device=/dev/neuron0 --device=/dev/neuron1 - If you install the aws-neuronx-oci-hook package, you will have an OCI hook that also supports use of a container environment variable AWS_NEURON_VISIBLE_DEVICES=, which intends to make things easier for multi device scenarios. Following are some examples. For setting up oci hook please refer :ref:`oci neuron hook ` .. code:: bash docker run -e “AWS_NEURON_VISIBLE_DEVICES=0,1” docker run -e “AWS_NEURON_VISIBLE_DEVICES=ALL” - In kubernetes environment, the neuron device plugin is used for exposing the neuron device to the containers in the pod. The number of devices can be adjusted using the *aws.amazon.com/neuron* resource in the pod specification. Refer :ref:`K8s setup ` for more details .. code:: bash resources: limits: aws.amazon.com/neuron: 1 .. note:: Only the number of devices can be specfied. When only the neuron device plugin is running that does not guaratee the devices to be contiguous. Make sure to run the neuron scheduler extension :ref:`neuron-k8-scheduler-ext` so that it makes sure that contigiuous devices are allocated to the containers - Multiple container applications running in the same host can share the devices but the cores cannot be shared. This is similar to running multiple applications in the host. - In the kubernetes environment the devices cannot be shared by multiple containers in the pod .. _container-cores: Cores ^^^^^ Each neuron device has multiple cores. The cores allocated to process/container can be controlled by the environment variable NEURON_RT_VISIBLE_CORES and NEURON_RT_NUM_CORES. Please refer :ref:`nrt-configuration` for more details. - The docker native way is to use --device /dev/neuron# for each of the Neuron Devices intended to be passed. Add --env NEURON_RT_VISIBLE_CORES-1,2 to use cores 1 and 2 to this container. For example in inf1.24xlarge with 64 cores, if we want to use cores 51 & 52, the appropriate device and NEURON_RT_VISIBLE_CORES needs to be used. With 4 cores in each device, core 51 is in device 12 and 52 is in device 13 .. code:: bash docker run --device=/dev/neuron12 --device=/dev/neuron13 --env NEURON_RT_VISIBLE_CORES=51,52 - In kubernetes environment, the neuron device plugin is used for exposing the neuron cores to the containers in the pod. The number of cores can be adjusted using the *aws.amazon.com/neuroncore* resource in the pod specification. Refer :ref:`K8s setup ` for more details. .. code:: bash resources: limits: aws.amazon.com/neuroncore: 1 .. note:: Only the number of cores can be specfied. When only the neuron device plugin is running that does not guaratee the cores to be contiguous. Make sure to run the neuron scheduler extension :ref:`neuron-k8-scheduler-ext` so that it makes sure that contigiuous cores are allocated to the containers - Multiple container applications running in the same host cannot share the cores. This is similar to running multiple applications in the host. - In the kubernetes environment the cores cannot be shared by multiple containers in the pod ================================================ FILE: containers/tutorials/inference/index.rst ================================================ Containers -- Inference Tutorials ================================= .. toctree:: :maxdepth: 1 :hidden: /containers/tutorials/inference/tutorial-infer /containers/tutorials/inference/k8s_rn50_demo .. include:: /containers/tutorials/inference/index.txt ================================================ FILE: containers/tutorials/inference/index.txt ================================================ * :ref:`tutorial-infer` * :ref:`example-deploy-rn50-as-k8s-service` ================================================ FILE: containers/tutorials/inference/k8s_rn50_demo.rst ================================================ .. _example-deploy-rn50-as-k8s-service: Deploy a TensorFlow Resnet50 model as a Kubernetes service ---------------------------------------------------------- This tutorial uses Resnet50 model as a teaching example on how to deploy an inference application using Kubernetes on the Inf1 instances. Prerequisite: ^^^^^^^^^^^^^ - Please follow instructions at :ref:`tutorial-k8s-env-setup-for-neuron` to setup k8s support on your cluster. - Inf1 instances as worker nodes with attached roles allowing: - ECR read access policy to retrieve container images from ECR: **arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly** - S3 access to retrieve saved_model from within tensorflow serving container. Deploy a TensorFlow Serving application image ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A trained model must be compiled to an Inferentia target before it can be deployed on Inferentia instances\. To continue, you will need a Neuron-optimized TensorFlow model saved in Amazon S3\. If you don’t already have a SavedModel, please follow the tutorial for `creating a Neuron compatible ResNet50 model `_ and upload the resulting SavedModel to S3\. ResNet-50 is a popular machine learning model used for image classification tasks\. For more information about compiling Neuron models, see `The AWS Inferentia Chip With DLAMI `_ in the AWS Deep Learning AMI Developer Guide\. The sample deployment manifest manages a pre-built inference serving container for TensorFlow provided by AWS Deep Learning Containers. Inside the container is the AWS Neuron Runtime and the TensorFlow Serving application. A complete list of pre-built Deep Learning Containers optimized for Neuron is maintained on GitHub under `Available Images `_. At start\-up, the DLC will fetch your model from Amazon S3, launch Neuron TensorFlow Serving with the saved model, and wait for prediction requests\. The number of Neuron devices allocated to your serving application can be adjusted by changing the `aws.amazon.com/neuron` resource in the deployment yaml\. Please note that communication between TensorFlow Serving and the Neuron runtime happens over GRPC, which requires passing the `IPC_LOCK` capability to the container. 1. Create a file named `rn50_deployment.yaml` with the contents below\. Update the region\-code and model path to match your desired settings. The model name is for identification purposes when a client makes a request to the TensorFlow server\. This example uses a model name to match a sample ResNet50 client script that will be used in a later step for sending prediction requests\. .. note:: 1. Replace the s3 bucket name in model_base_path arg in the file with the location of the where the saved model was stored in s3. 2. In the image: add the appropriate location of the DLC tensorflow image :: kind: Deployment apiVersion: apps/v1 metadata: name: k8s-neuron-test labels: app: k8s-neuron-test role: master spec: replicas: 2 selector: matchLabels: app: k8s-neuron-test role: master template: metadata: labels: app: k8s-neuron-test role: master spec: containers: - name: k8s-neuron-test image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference-neuron:1.15.4-neuron-py37-ubuntu18.04 command: - /usr/local/bin/entrypoint.sh args: - --port=8500 - --rest_api_port=9000 - --model_name=resnet50_neuron - --model_base_path=s3://${your-bucket-of-models}/resnet50_neuron/ ports: - containerPort: 8500 - containerPort: 9000 imagePullPolicy: IfNotPresent env: - name: AWS_REGION value: "us-east-1" - name: S3_USE_HTTPS value: "1" - name: S3_VERIFY_SSL value: "0" - name: S3_ENDPOINT value: s3.us-east-1.amazonaws.com - name: AWS_LOG_LEVEL value: "3" resources: limits: cpu: 4 memory: 4Gi aws.amazon.com/neuron: 1 requests: cpu: "1" memory: 1Gi securityContext: capabilities: add: - IPC_LOCK 2. Deploy the model\. :: kubectl apply -f rn50_deployment.yaml 3. Create a file named `rn50_service.yaml` with the following contents\. The HTTP and gRPC ports are opened for accepting prediction requests\. :: kind: Service apiVersion: v1 metadata: name: k8s-neuron-test labels: app: k8s-neuron-test spec: type: ClusterIP ports: - name: http-tf-serving port: 8500 targetPort: 8500 - name: grpc-tf-serving port: 9000 targetPort: 9000 selector: app: k8s-neuron-test role: master 4. Create a Kubernetes service for your TensorFlow model Serving application\. :: kubectl apply -f rn50_service.yaml Make predictions against your TensorFlow Serving service ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. To test locally, forward the gRPC port to the `k8s-neuron-test` service\. :: kubectl port-forward service/k8s-neuron-test 8500:8500 & 2. Create a Python script called `tensorflow-model-server-infer.py` with the following content. This script runs inference via gRPC, which is service framework. :: import numpy as np import grpc import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from tensorflow.keras.applications.resnet50 import decode_predictions if __name__ == '__main__': channel = grpc.insecure_channel('localhost:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) img_file = tf.keras.utils.get_file( "./kitten_small.jpg", "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg") img = image.load_img(img_file, target_size=(224, 224)) img_array = preprocess_input(image.img_to_array(img)[None, ...]) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50_inf1' request.inputs['input'].CopyFrom( tf.make_tensor_proto(img_array, shape=img_array.shape)) result = stub.Predict(request) prediction = tf.make_ndarray(result.outputs['output']) print(decode_predictions(prediction)) 3. Run the script to submit predictions to your service\. :: python3 tensorflow-model-server-infer.py Your output should look like the following: :: [[(u'n02123045', u'tabby', 0.68817204), (u'n02127052', u'lynx', 0.12701613), (u'n02123159', u'tiger_cat', 0.08736559), (u'n02124075', u'Egyptian_cat', 0.063844085), (u'n02128757', u'snow_leopard', 0.009240591)]] ================================================ FILE: containers/tutorials/inference/tutorial-infer.rst ================================================ .. _tutorial-infer: Run Inference in PyTorch Neuron Container ========================================== .. contents:: Table of Contents :local: :depth: 2 Overview -------- This tutorial demonstrates how to run a pytorch DLC on an inferentia instance. By the end of this tutorial you will be able to run the inference using the container You will use an inf1.2xlarge to test your Docker configuration for Inferentia. To find out the available neuron devices on your instance, use the command ``ls /dev/neuron*``. Setup Environment ----------------- 1. Launch an Inf1 Instance 2. Set up docker environment according to :ref:`tutorial-docker-env-setup` 3. Clone the `aws-neuron/deep-learning-containers `_ GitHub repository and use one of the PyTorch inference Dockerfiles found in the folders of the repo: .. code:: bash git clone https://github.com/aws-neuron/deep-learning-containers.git cd deep-learning-containers/docker/pytorch/inference/2.9.0 For additional prerequisites and setup requirements, see the `docker build prerequisites `_. This tutorial requires the `torchserve entrypoint `_ and `torchserve config.properties `_ which are copied over to the same parent folder as part of prerequisites. With the files in a local directory, build the image with the following command: .. code:: bash docker build . -f Dockerfile.neuronx -t neuron-container:pytorch Run the following command to start the container .. code:: bash docker run -itd --name pt-cont -p 80:8080 -p 8081:8081 --device=/dev/neuron0 neuron-container:pytorch /usr/local/bin/entrypoint.sh -m 'pytorch-resnet-neuron=https://aws-dlc-sample-models.s3.amazonaws.com/pytorch/Resnet50-neuron.mar' -t /home/model-server/config.properties ================================================ FILE: containers/tutorials/k8s-default-scheduler.rst ================================================ This approach integrates the Neuron Scheduler Extension directly with the Kubernetes default scheduler. This method requires access to modify the default scheduler configuration. **Prerequisites** Ensure that the Neuron Device Plugin is running. **Step 1: Configure kube-scheduler** Enable the kube-scheduler to use a ConfigMap for scheduler policy. In your ``cluster.yml``, update the spec section with the following: .. code:: yaml spec: kubeScheduler: usePolicyConfigMap: true **Step 2: Launch the Cluster** Create and launch the cluster: .. code:: bash kops create -f cluster.yml kops create secret --name neuron-test-1.k8s.local sshpublickey admin -i ~/.ssh/id_rsa.pub kops update cluster --name neuron-test-1.k8s.local --yes **Step 3: Install Neuron Scheduler Extension** Install the Neuron Scheduler Extension and register it with kube-scheduler: .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \ --set "scheduler.enabled=true" \ --set "scheduler.customScheduler.enabled=false" \ --set "scheduler.defaultScheduler.enabled=true" \ --set "npd.enabled=false" ================================================ FILE: containers/tutorials/k8s-multiple-scheduler.rst ================================================ This approach deploys a separate scheduler alongside the default Kubernetes scheduler. This is useful in environments where you don't have access to modify the default scheduler configuration, such as Amazon EKS. In this setup, a new scheduler (``my-scheduler``) is deployed with the Neuron Scheduler Extension integrated. Pods that need to run Neuron workloads specify this custom scheduler in their configuration. .. note:: Amazon EKS does not natively support modifying the default scheduler, so this multiple scheduler approach is required for EKS environments. **Prerequisites** Ensure that the Neuron Device Plugin is running. **Step 1: Install Neuron Scheduler Extension** Install the Neuron Scheduler Extension as a custom scheduler: .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \ --set "scheduler.enabled=true" \ --set "npd.enabled=false" **Step 2: Verify Installation** Check that there are no errors in the ``my-scheduler`` pod logs and that the ``k8s-neuron-scheduler`` pod is bound to a node: .. code:: bash kubectl logs -n kube-system my-scheduler-79bd4cb788-hq2sq **Expected output:** .. code:: bash I1012 15:30:21.629611 1 scheduler.go:604] "Successfully bound pod to node" pod="kube-system/k8s-neuron-scheduler-5d9d9d7988-xcpqm" node="ip-192-168-2-25.ec2.internal" evaluatedNodes=1 feasibleNodes=1 **Step 3: Configure Pods to Use Custom Scheduler** When creating Pods that need to use the Neuron Scheduler Extension, specify ``my-scheduler`` as the scheduler name. Here's a sample Pod specification: .. code:: yaml apiVersion: v1 kind: Pod metadata: name: spec: restartPolicy: Never schedulerName: my-scheduler containers: - name: command: [""] image: resources: limits: cpu: "4" memory: 4Gi aws.amazon.com/neuroncore: 9 requests: cpu: "1" memory: 1Gi **Step 4: Verify Scheduling** After running a Neuron workload Pod, verify that the Neuron Scheduler successfully processed the filter and bind requests: .. code:: bash kubectl logs -n kube-system k8s-neuron-scheduler-5d9d9d7988-xcpqm **Expected output for filter request:** .. code:: bash 2022/10/12 15:41:16 POD nrt-test-5038 fits in Node:ip-192-168-2-25.ec2.internal 2022/10/12 15:41:16 Filtered nodes: [ip-192-168-2-25.ec2.internal] 2022/10/12 15:41:16 Failed nodes: map[] 2022/10/12 15:41:16 Finished Processing Filter Request... **Expected output for bind request:** .. code:: bash 2022/10/12 15:41:16 Executing Bind Request! 2022/10/12 15:41:16 Determine if the pod %v is NeuronDevice podnrt-test-5038 2022/10/12 15:41:16 Updating POD Annotation with alloc devices! 2022/10/12 15:41:16 Return aws.amazon.com/neuroncore 2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neuroncore in node: ip-192-168-2-25.ec2.internal is [false false false false false false false false false false false false false false false false] 2022/10/12 15:41:16 Allocated ids for POD nrt-test-5038 are: 0,1,2,3,4,5,6,7,8 2022/10/12 15:41:16 Try to bind pod nrt-test-5038 in default namespace to node ip-192-168-2-25.ec2.internal with &Binding{ObjectMeta:{nrt-test-5038 8da590b1-30bc-4335-b7e7-fe574f4f5538 0 0001-01-01 00:00:00 +0000 UTC map[] map[] [] [] []},Target:ObjectReference{Kind:Node,Namespace:,Name:ip-192-168-2-25.ec2.internal,UID:,APIVersion:,ResourceVersion:,FieldPath:,},} 2022/10/12 15:41:16 Updating the DevUsageMap since the bind is successful! 2022/10/12 15:41:16 Return aws.amazon.com/neuroncore 2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neuroncore in node: ip-192-168-2-25.ec2.internal is [false false false false false false false false false false false false false false false false] 2022/10/12 15:41:16 neuronDevUsageMap for resource:aws.amazon.com/neurondevice in node: ip-192-168-2-25.ec2.internal is [false false false false] 2022/10/12 15:41:16 Allocated devices list 0,1,2,3,4,5,6,7,8 for resource aws.amazon.com/neuroncore 2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [0] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [1] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Allocated devices list [2] for other resource aws.amazon.com/neurondevice 2022/10/12 15:41:16 Return aws.amazon.com/neuroncore 2022/10/12 15:41:16 Succesfully updated the DevUsageMap [true true true true true true true true true false false false false false false false] and otherDevUsageMap [true true true false] after alloc for node ip-192-168-2-25.ec2.internal 2022/10/12 15:41:16 Finished executing Bind Request... ================================================ FILE: containers/tutorials/k8s-neuron-device-plugin.rst ================================================ The Neuron Device Plugin is a Kubernetes device plugin that exposes Neuron hardware resources to the cluster's scheduler. It discovers available Neuron devices on each node, advertises them as allocatable resources, and manages their lifecycle. When Pods request Neuron resources, the device plugin handles the allocation and ensures exclusive access to the assigned devices. This integration enables Kubernetes to treat Neuron accelerators as first-class schedulable resources, similar to GPUs or other specialized hardware. The device plugin registers two resource types with Kubernetes: * ``aws.amazon.com/neuroncore`` - Used for allocating individual Neuron cores to containers * ``aws.amazon.com/neuron`` - Used for allocating entire Neuron devices to containers (all cores belonging to the device) **Deploy Neuron Device Plugin** **Prerequisites** Ensure that all :ref:`prerequisites` are satisfied before proceeding. **Installation** Apply the Neuron Device Plugin as a DaemonSet on the cluster: .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \ --set "npd.enabled=false" **Verify Installation** Verify that the Neuron Device Plugin is running: .. code:: bash kubectl get ds neuron-device-plugin -n kube-system Expected output (example with 2 nodes in cluster): .. code:: bash NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE neuron-device-plugin 2 2 2 2 2 18h **Verify Allocatable Resources** Verify that nodes have allocatable Neuron cores: .. code:: bash kubectl get nodes "-o=custom-columns=NAME:.metadata.name,NeuronCore:.status.allocatable.aws\.amazon\.com/neuroncore" Expected output: .. code:: bash NAME NeuronCore ip-192-168-65-41.us-west-2.compute.internal 32 ip-192-168-87-81.us-west-2.compute.internal 32 Verify that nodes have allocatable Neuron devices: .. code:: bash kubectl get nodes "-o=custom-columns=NAME:.metadata.name,NeuronDevice:.status.allocatable.aws\.amazon\.com/neuron" Expected output: .. code:: bash NAME NeuronDevice ip-192-168-65-41.us-west-2.compute.internal 16 ip-192-168-87-81.us-west-2.compute.internal 16 ================================================ FILE: containers/tutorials/k8s-neuron-helm-chart.rst ================================================ .. _k8s-neuron-helm-chart: The Neuron Helm Chart simplifies the deployment and management of Neuron infrastructure components on Kubernetes clusters. It provides a unified installation method for all essential Neuron components, streamlining the setup process and ensuring consistent configuration across your cluster. Components Included ^^^^^^^^^^^^^^^^^^^ The Neuron Helm Chart includes the following components: * Neuron Device Plugin * Neuron Scheduler Extension * :ref:`Neuron Node Problem Detector and Recovery ` * Neuron DRA (Dynamic Resource Allocation) Driver. Refer to :ref:`neuron-dra`. Installation ^^^^^^^^^^^^ To install the Neuron Helm Chart: .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart For detailed information on configuration options, advanced deployment scenarios, and troubleshooting, please refer to the official Neuron Helm Charts repository: https://github.com/aws-neuron/neuron-helm-charts/ ================================================ FILE: containers/tutorials/k8s-neuron-monitor.rst ================================================ .. _k8s-neuron-monitor: Neuron Monitor is a monitoring solution that collects and exposes metrics from Neuron devices and the Neuron runtime. It provides visibility into hardware utilization, performance counters, memory usage, and device health status. The monitor can export metrics in formats compatible with popular observability platforms like Prometheus, enabling integration with existing monitoring and alerting infrastructure. This allows operators to track Neuron device performance, identify bottlenecks, and troubleshoot issues in production environments. For detailed information about Neuron Monitor, see the `Neuron Monitor User Guide `_. .. note:: Neuron Monitor does not currently support environments using the Neuron DRA (Dynamic Resource Allocation) Driver. Deploy Neuron Monitor DaemonSet ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Step 1: Download the Configuration** Download the Neuron Monitor YAML file: :download:`k8s-neuron-monitor-daemonset.yml ` **Step 2: Apply the Configuration** Apply the Neuron Monitor YAML to create a DaemonSet on the cluster: .. code:: bash kubectl apply -f k8s-neuron-monitor-daemonset.yml **Step 3: Verify Installation** Verify that the Neuron Monitor DaemonSet is running: .. code:: bash kubectl get ds neuron-monitor --namespace neuron-monitor Expected output (example with 2 nodes in cluster): .. code:: bash NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE neuron-monitor 2 2 2 2 2 27h **Step 4: Get Pod Names** Retrieve the Neuron Monitor pod names: .. code:: bash kubectl get pods --namespace neuron-monitor Expected output: .. code:: bash NAME READY STATUS RESTARTS AGE neuron-monitor-slsxf 1/1 Running 0 17m neuron-monitor-wc4f5 1/1 Running 0 17m **Step 5: Verify Prometheus Endpoint** Verify that the Prometheus metrics endpoint is available: .. code:: bash kubectl exec neuron-monitor-wc4f5 --namespace neuron-monitor -- wget -q --output-document - http://127.0.0.1:8000 Expected output (sample metrics): .. code:: bash # HELP python_gc_objects_collected_total Objects collected during gc # TYPE python_gc_objects_collected_total counter python_gc_objects_collected_total{generation="0"} 362.0 python_gc_objects_collected_total{generation="1"} 0.0 python_gc_objects_collected_total{generation="2"} 0.0 # HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC # TYPE python_gc_objects_uncollectable_total counter ================================================ FILE: containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.rst ================================================ .. _k8s-neuron-problem-detector-and-recovery-irsa: Permissions for Neuron Node Problem Detector and Recovery ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Neuron Node Problem Detector and Recovery requires IAM roles for service accounts (IRSA) for authorization. For more information, see `IAM roles for service accounts `__ in the Amazon EKS User Guide. This section shows how to configure an IAM role for service accounts using the ``eksctl`` command-line tool. **Step 1: Install eksctl** Install the ``eksctl`` CLI using the instructions at https://eksctl.io/installation/. **Step 2: Create IAM Policy** Create an IAM policy that grants the necessary permissions for the Neuron Node Problem Detector. .. code:: json { "Version": "2012-10-17", "Statement": [ { "Action": [ "autoscaling:SetInstanceHealth", "autoscaling:DescribeAutoScalingInstances" ], "Effect": "Allow", "Resource": "" }, { "Action": [ "ec2:DescribeInstances" ], "Effect": "Allow", "Resource": "*", "Condition": { "ForAllValues:StringEquals": { "ec2:ResourceTag/aws:autoscaling:groupName": "" } } }, { "Action": [ "cloudwatch:PutMetricData" ], "Effect": "Allow", "Resource": "*", "Condition": { "StringEquals": { "cloudwatch:Namespace": "NeuronHealthCheck" } } } ] } Save the policy template above to a file named ``npd-policy.json`` (replacing the placeholder values), then run: .. code:: bash aws iam create-policy \ --policy-name NeuronProblemDetectorPolicy \ --policy-document file://npd-policy.json **Step 3: Create Namespace and Service Account** Create a dedicated namespace for the Neuron Node Problem Detector: .. code:: bash kubectl create ns neuron-healthcheck-system **Step 4: Associate IAM Role with Service Account** Use the following script to create the service account and associate it with the IAM role: .. code:: bash #!/bin/bash CLUSTER_NAME= REGION_CODE=$(aws configure get region) POLICY_ARN= eksctl create iamserviceaccount \ --name node-problem-detector \ --namespace neuron-healthcheck-system \ --cluster $CLUSTER_NAME \ --attach-policy-arn $POLICY_ARN \ --approve \ --role-name neuron-problem-detector-role-$CLUSTER_NAME \ --region $REGION_CODE \ --override-existing-serviceaccounts **Step 5: Verify Service Account Configuration** Verify that the service account is annotated correctly with the IAM role: .. code:: bash kubectl describe sa node-problem-detector -n neuron-healthcheck-system Expected output: .. code:: bash Name: node-problem-detector Namespace: neuron-healthcheck-system Labels: app.kubernetes.io/managed-by=eksctl Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/neuron-problem-detector-role-cluster1 Image pull secrets: Mountable secrets: Tokens: Events: **Cleanup** To remove the service account and associated IAM role, use the following command: .. code:: bash #!/bin/bash CLUSTER_NAME= REGION_CODE=$(aws configure get region) eksctl delete iamserviceaccount \ --name node-problem-detector \ --namespace neuron-healthcheck-system \ --cluster $CLUSTER_NAME \ --approve \ --region $REGION_CODE ================================================ FILE: containers/tutorials/k8s-neuron-problem-detector-and-recovery.rst ================================================ .. _k8s-neuron-problem-detector-and-recovery: Deploy Neuron Node Problem Detector and Recovery ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The Neuron Node Problem Detector and Recovery is a critical resiliency component that continuously monitors the health of Neuron devices on each Kubernetes node by detecting hardware and software errors such as device failures, driver problems, and runtime errors. It integrates with the Kubernetes Node Problem Detector framework to report Neuron-specific conditions. When unrecoverable issues are detected, it can automatically remediate problems by marking nodes as unhealthy and triggering node replacement to prevent workload scheduling on faulty hardware. The component can also publish CloudWatch metrics under the ``NeuronHealthCheck`` namespace for monitoring and alerting purposes. **Requirements** Before deploying the Neuron Node Problem Detector and Recovery, ensure the following requirements are met: * **Neuron Driver:** Version 2.15 or later * **Neuron Runtime:** SDK 2.18 or later * **Prerequisites:** All prerequisites for Kubernetes containers and the Neuron Node Problem Detector must be satisfied **Installation** Install the Neuron Node Problem Detector and Recovery as a DaemonSet using Helm: .. note:: The installation pulls the container image from the upstream Node Problem Detector repository at ``registry.k8s.io/node-problem-detector``. .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart **Enable Node Recovery** By default, the Neuron Node Problem Detector runs in **monitor-only mode**. To enable automatic node recovery functionality: .. code:: bash helm upgrade --install neuron-helm-chart oci://public.ecr.aws/neuron/neuron-helm-chart \ --set "npd.nodeRecovery.enabled=true" **Verify Installation** Verify that the Node Problem Detector pods are running: .. code:: bash kubectl get pod -n neuron-healthcheck-system Expected output (example with 4 nodes in cluster): .. code:: bash NAME READY STATUS RESTARTS AGE node-problem-detector-7qcrj 1/1 Running 0 59s node-problem-detector-j45t5 1/1 Running 0 59s node-problem-detector-mr2cl 1/1 Running 0 59s node-problem-detector-vpjtk 1/1 Running 0 59s **Monitoring and Metrics** When an unrecoverable error occurs, the Neuron Node Problem Detector: * Publishes metrics to CloudWatch under the ``NeuronHealthCheck`` namespace * Updates the node's ``NodeCondition``, which can be viewed using: .. code:: bash kubectl describe node ================================================ FILE: containers/tutorials/k8s-neuron-scheduler-flow.rst ================================================ .. _k8s-neuron-scheduler-flow: Neuron Scheduler Extension Flow Diagram --------------------------------------- :: +----------------------------+ | POD Manifest | | with Request | | aws.amazon.com/neuroncore:2| | | | | 2 +-------------+--------------+ +--------------------------------+ | | | | | | | 3 +------------------------------+-----+ | | | Kubelet in INF1/TRN1 Node| | | | +<-----------+ | | +-----+---------------------+--------+ | +-----v-----------v--------------+ | ^ | | Kube-Scheduler | | | | | | | | | +--^------+---------------+------+ 9 | 1 | | | | | | | 8| 5| |4 | | | | | | | | | | | | |6 v | | | | | +-----+---------------------+--------+ | +--+------v---------------v------+ | neuron-device-plugin | +-------+ neuron|scheduler|ext | | in INF1/TRN1 node | +---------------------+----------+ +----+----------------------+--------+ | | | |7 | |10 | | | v 11| | +---------+-------+ | | |POD Manifest: | | | |Annotation: | | | |NEURON_CORES:2,3 | v +---------------------------------------->+ | --device=/dev/neuron1 --env NEURON_RT_VISIBLE_CORES=2,3 | | | | +-----------------+ 1. neuron-device-plugin returns the list of Neuron cores/devices to kublet 2. Kubelet advertises the Core/Device list to K8s API server (in turn to kube-scheduler) 3. POD Request for neuron cores/devices [Kube-Scheduler picks up the POD creation request] 4. kube-scheduler calls the neuron-scheduler-extn filter function with list of nodes and POD Specification 5. neuron-scheduler-extn scans through the nodes and filters out nodes with non contiguous cores/devices and returns the nodes that are capable of supporing the given POD specification 6. kube-scheduler calls the neuron-scheduler-extn bind function with pod and node 7. neuron-scheduler-extn updates the POD annotation with allocated neuron core/device Ids (contiguous) 8. neuron-scheduler-extn sends the bind request to kubelet of the selected node 9. Kubelet calls the Alloc function of the neuron-device-plugin 10. neuron-device-plugin queries the POD Annotation for allocated core/device Ids 11. neuron-device-plugin exports the devices & visisble cores to container runtime ================================================ FILE: containers/tutorials/k8s-neuron-scheduler.rst ================================================ The Neuron Scheduler Extension is a Kubernetes scheduler plugin that provides intelligent, topology-aware scheduling for Neuron workloads. While the device plugin handles basic resource allocation, the scheduler extension optimizes Pod placement by considering Neuron core topology, NeuronCore-to-NeuronCore connectivity, and workload requirements. It ensures efficient utilization of Neuron devices by placing Pods on nodes where the requested Neuron cores are optimally configured. This component is optional and primarily beneficial for workloads that require specific subsets of Neuron devices or cores rather than consuming all available resources on a node. The scheduler extension is required for scheduling Pods that request more than one Neuron core or device resource. It finds sets of directly connected devices with minimal communication latency when scheduling containers, ensuring optimal performance for multi-device workloads. For a graphical depiction of how the Neuron Scheduler Extension works, see :ref:`k8s-neuron-scheduler-flow`. **Device Allocation by Instance Type** The Neuron Scheduler Extension applies topology-aware scheduling rules based on instance type to ensure consistent and high performance regardless of which cores and devices are assigned to containers. **Inf1 and Inf2 Instances (Ring Topology)** Devices are connected through a ring topology with no restrictions on the number of devices requested (as long as it is fewer than the total devices on a node). When N devices are requested, the scheduler finds a node where N contiguous devices are available to minimize communication latency. It will never allocate non-contiguous devices to the same container. For example, when a container requests 3 Neuron devices, the scheduler might assign devices 0, 1, 2 if available, but never devices 0, 2, 4 because those devices are not directly connected. The figure below shows examples of device sets on an Inf2.48xlarge node that could be assigned to a container requesting 2 devices: |eks-inf2-device-set| **Trn1.32xlarge and Trn1n.32xlarge Instances (2D Torus Topology)** Devices are connected via a 2D torus topology. The scheduler enforces that containers request 1, 4, 8, or all 16 devices. If your container requires a different number of devices (such as 2 or 5), we recommend using an Inf2 instance instead to benefit from more flexible topology support. If you request an invalid number of devices (such as 7), your Pod will not be scheduled and you will receive a warning: ``Instance type trn1.32xlarge does not support requests for device: 7. Please request a different number of devices.`` When requesting 4 devices, your container will be allocated one of the following device sets if available: |eks-trn1-device-set4| When requesting 8 devices, your container will be allocated one of the following device sets if available: |eks-trn1-device-set8| .. note:: For all instance types, requesting one or all Neuron cores or devices is always valid. **Deploy Neuron Scheduler Extension** .. tab-set:: .. tab-item:: Multiple Scheduler Approach .. include:: /containers/tutorials/k8s-multiple-scheduler.rst .. tab-item:: Default Scheduler Approach .. include:: /containers/tutorials/k8s-default-scheduler.rst .. |eks-inf2-device-set| image:: /images/eks-inf2-device-set.png .. |eks-trn1-device-set4| image:: /images/eks-trn1-device-set4.png .. |eks-trn1-device-set8| image:: /images/eks-trn1-device-set8.png ================================================ FILE: containers/tutorials/k8s-prerequisite.rst ================================================ .. _k8s-prerequisite: .. meta:: :description: Learn how to create an Amazon EKS cluster with AWS Trainium instances (Trn1, Trn2) for machine learning workloads using AWS Neuron SDK. Step-by-step guide with eksctl and CloudFormation templates. :keywords: EKS, Kubernetes, Trainium, Trn1, Trn2, Neuron, AWS, machine learning, distributed training, eksctl, CloudFormation, EFA, node group Before setting up Neuron components on your EKS cluster, you must create an EKS cluster and add Neuron-enabled nodes. This section guides you through creating an Amazon Elastic Kubernetes Service (EKS) cluster with AWS Trainium-enabled nodes (Trn1 or Trn2 instances) using CloudFormation templates and the eksctl command-line tool. You'll configure optimized networking with Elastic Fabric Adapter (EFA) support and pre-configured Neuron components for distributed training and inference workloads. For detailed information, refer to: * `EKS Cluster Creation Guide `_ * `EKS Compute Resources Guide `_ * `eksctl Getting Started `_ **Step 1: Download Node Group Template** Download the node group CloudFormation template for your instance type. .. tab-set:: .. tab-item:: Trn1 .. code-block:: bash wget https://raw.githubusercontent.com/aws-neuron/aws-neuron-eks-samples/master/dp_bert_hf_pretrain/cfn/eks_trn1_ng_stack.yaml .. tab-item:: Trn2 .. code-block:: bash wget https://raw.githubusercontent.com/aws-neuron/aws-neuron-eks-samples/master/dp_bert_hf_pretrain/cfn/eks_trn2_ng_stack_al2023.yaml **Important template configuration information** * **Placement Group:** Optimizes network speed between nodes * **EFA Driver:** Installed automatically (ensure ``libfabric`` version matches between AMI and workload containers) * **AMI:** Uses `EKS optimized accelerated AMI `_ with Neuron components pre-installed * **Instance Type:** Configured for trn1.32xlarge or trn2.48xlarge (update to your desired instance type) * **Kubernetes Version:** Trn1 templates use Kubernetes 1.25+, Trn2 templates use Kubernetes 1.34+ (update as needed) Trn2 LNC configuration (Optional): Trn2 instances use a default Logical NeuronCore Configuration (LNC) of ``2``. To change it to ``1``, update the ``UserData`` section of the launch template: .. code-block:: bash --==BOUNDARY== Content-Type: text/x-shellscript; charset="us-ascii" #!/bin/bash set -ex config_dir=/opt/aws/neuron config_file=${config_dir}/logical_nc_config [ -d "$config_dir" ] || mkdir -p "$config_dir" [ -f "$config_file" ] || touch "$config_file" if ! grep -q "^NEURON_LOGICAL_NC_CONFIG=1$" "$config_file" 2>/dev/null; then printf "NEURON_LOGICAL_NC_CONFIG=1" >> "$config_file" fi --==BOUNDARY==-- **Step 2: Create Cluster Parameter Script** Create a bash script to capture the parameters needed for the node template: .. tab-set:: .. tab-item:: Trn1 .. code-block:: bash #!/bin/bash CLUSTER_NAME=$1 CLUSTER_SG=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].ResourcesVpcConfig.ClusterSecurityGroupId") VPC_ID=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].ResourcesVpcConfig.VpcId") cat < cfn_params.json [ { "ParameterKey": "ClusterName", "ParameterValue": "$CLUSTER_NAME" }, { "ParameterKey": "ClusterControlPlaneSecurityGroup", "ParameterValue": "$CLUSTER_SG" }, { "ParameterKey": "VpcId", "ParameterValue": "$VPC_ID" } ] EOF .. tab-item:: Trn2 .. code-block:: bash #!/bin/bash CLUSTER_NAME=$1 CLUSTER_SG=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].ResourcesVpcConfig.ClusterSecurityGroupId") VPC_ID=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].ResourcesVpcConfig.VpcId") CLUSTER_ENDPOINT=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].Endpoint") CLUSTER_SERVICE_CIDR=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].KubernetesNetworkConfig.ServiceIpv4Cidr") CLUSTER_CA=$(eksctl get cluster $CLUSTER_NAME -o json | jq -r ".[0].CertificateAuthority.Data") cat < cfn_params.json [ { "ParameterKey": "ClusterName", "ParameterValue": "$CLUSTER_NAME" }, { "ParameterKey": "ClusterControlPlaneSecurityGroup", "ParameterValue": "$CLUSTER_SG" }, { "ParameterKey": "VpcId", "ParameterValue": "$VPC_ID" }, { "ParameterKey": "ClusterEndpoint", "ParameterValue": "$CLUSTER_ENDPOINT" }, { "ParameterKey": "ClusterServiceCidr", "ParameterValue": "$CLUSTER_SERVICE_CIDR" }, { "ParameterKey": "ClusterCertificateAuthority", "ParameterValue": "$CLUSTER_CA" } ] EOF This script captures the cluster name, security group for control plane connectivity, and VPC ID. **Step 3: Create CloudFormation Stack** Create the CloudFormation stack for the node group. .. tab-set:: .. tab-item:: Trn1 .. code-block:: bash aws cloudformation create-stack \ --stack-name eks-trn1-ng-stack \ --template-body file://eks_trn1_ng_stack.yaml \ --parameters file://cfn_params.json \ --capabilities CAPABILITY_IAM .. tab-item:: Trn2 .. code-block:: bash aws cloudformation create-stack \ --stack-name eks-trn2-ng-stack \ --template-body file://eks_trn2_ng_stack_al2023.yaml \ --parameters file://cfn_params.json \ --capabilities CAPABILITY_IAM Wait for the stack creation to complete before proceeding. You can monitor the progress in the AWS CloudFormation console. **Step 4: Determine Availability Zones** Identify the availability zones for your cluster: .. code-block:: bash aws ec2 describe-availability-zones \ --region $REGION_CODE \ --query "AvailabilityZones[]" \ --filters "Name=zone-id,Values=$1" \ --query "AvailabilityZones[].ZoneName" \ --output text **Step 5: Generate Node Group Configuration** Create a script named ``create_ng_yaml.sh`` to generate the node group YAML configuration. The script requires: region, availability zones, cluster name, and CloudFormation stack name. .. tab-set:: .. tab-item:: Trn1 .. code-block:: bash #!/bin/bash REGION_CODE=$1 EKSAZ1=$2 EKSAZ2=$3 CLUSTER_NAME=$4 STACKNAME=$5 LT_ID_TRN1=$(aws cloudformation describe-stacks --stack-name $STACKNAME \ --query "Stacks[0].Outputs[?OutputKey=='LaunchTemplateIdTrn1'].OutputValue" \ --output text) cat < trn1_nodegroup.yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: $CLUSTER_NAME region: $REGION_CODE version: "1.28" iam: withOIDC: true availabilityZones: ["$EKSAZ1","$EKSAZ2"] managedNodeGroups: - name: trn1-32xl-ng1 launchTemplate: id: $LT_ID_TRN1 minSize: 1 desiredCapacity: 1 maxSize: 1 availabilityZones: ["$EKSAZ1"] privateNetworking: true efaEnabled: true EOF .. tab-item:: Trn2 .. code-block:: bash #!/bin/bash REGION_CODE=$1 EKSAZ1=$2 EKSAZ2=$3 CLUSTER_NAME=$4 STACKNAME=$5 LT_ID_TRN2=$(aws cloudformation describe-stacks --stack-name $STACKNAME \ --query "Stacks[0].Outputs[?OutputKey=='LaunchTemplateIdTrn2'].OutputValue" \ --output text) cat < trn2_nodegroup.yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: $CLUSTER_NAME region: $REGION_CODE version: "1.34" iam: withOIDC: true availabilityZones: ["$EKSAZ1","$EKSAZ2"] managedNodeGroups: - name: trn2-48xl-ng1 launchTemplate: id: $LT_ID_TRN2 minSize: 1 desiredCapacity: 1 maxSize: 1 availabilityZones: ["$EKSAZ1"] privateNetworking: true efaEnabled: true EOF Run the script to generate the configuration file. Update the Kubernetes version as needed for your environment. Example output: .. tab-set:: .. tab-item:: Trn1 .. code-block:: yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: nemo2 region: us-west-2 version: "1.28" iam: withOIDC: true availabilityZones: ["us-west-2d","us-west-2c"] managedNodeGroups: - name: trn1-32xl-ng1 launchTemplate: id: lt-093c222b35ea89009 minSize: 1 desiredCapacity: 1 maxSize: 1 availabilityZones: ["us-west-2d"] privateNetworking: true efaEnabled: true .. tab-item:: Trn2 .. code-block:: yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: nemo2 region: us-west-2 version: "1.34" iam: withOIDC: true availabilityZones: ["us-west-2d","us-west-2c"] managedNodeGroups: - name: trn2-48xl-ng1 launchTemplate: id: lt-093c222b35ea89010 minSize: 1 desiredCapacity: 1 maxSize: 1 availabilityZones: ["us-west-2d"] privateNetworking: true efaEnabled: true **Step 6: Create Node Group** Create the node group using the generated configuration. .. tab-set:: .. tab-item:: Trn1 .. code-block:: bash eksctl create nodegroup -f trn1_nodegroup.yaml .. tab-item:: Trn2 .. code-block:: bash eksctl create nodegroup -f trn2_nodegroup.yaml Wait for the nodes to reach the ``Ready`` state. Verify using: .. code-block:: bash kubectl get nodes **Step 7: Install EFA Device Plugin (Optional)** If you plan to run distributed training or inference jobs, install the EFA device plugin following the instructions at the `EFA device plugin repository `_. ================================================ FILE: containers/tutorials/k8s-setup.rst ================================================ .. _tutorial-k8s-env-setup-for-neuron-to-remove: Kubernetes environment setup for Neuron ======================================= Introduction ------------ Customers that use Kubernetes can conveniently integrate Inf1/Trn1 instances into their workflows. This tutorial will go through deploying the neuron device plugin daemonset and also how to allocate neuron cores or devices to application pods. .. dropdown:: Prerequisite :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /containers/tutorials/k8s-prerequisite.rst .. dropdown:: Deploy Neuron Device Plugin :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /containers/tutorials/k8s-neuron-device-plugin.rst .. dropdown:: Deploy Neuron Scheduler Extension :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. include:: /containers/tutorials/k8s-neuron-scheduler.rst ================================================ FILE: containers/tutorials/training/index.rst ================================================ Containers -- Training Tutorials ================================= .. toctree:: :maxdepth: 1 :hidden: /containers/tutorials/training/tutorial-training /containers/tutorials/training/k8s_mlp_train_demo .. include:: /containers/tutorials/training/index.txt ================================================ FILE: containers/tutorials/training/index.txt ================================================ * :ref:`tutorial-training` * :ref:`example-deploy-mlp-train-pod` ================================================ FILE: containers/tutorials/training/k8s_mlp_train_demo.rst ================================================ .. _example-deploy-mlp-train-pod: Deploy a simple mlp training script as a Kubernetes job ---------------------------------------------------------- This tutorial uses mlp train as a teaching example on how to deploy an training application using Kubernetes on the Trn1 instances. For more advanced example, please refer to `Tutorial: Launch a Multi-Node PyTorch Neuron Training Job on Trainium Using TorchX and EKS `__ Prerequisite: ^^^^^^^^^^^^^ - :ref:`tutorial-k8s-env-setup-for-neuron`: to setup k8s support on your cluster. - Trn1 instances as worker nodes with attached roles allowing: - ECR read access policy to retrieve container images from ECR: **arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly** - Have a container image that is build using :ref:`tutorial-training` Deploy a mlp training image ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Create a file named `mlp_train.yaml` with the contents below\. .. note:: In the image: add the appropriate location of the image :: apiVersion: v1 kind: Pod metadata: name: trn1-mlp spec: restartPolicy: Never schedulerName: default-scheduler hostNetwork: true nodeSelector: beta.kubernetes.io/instance-type: trn1.32xlarge beta.kubernetes.io/instance-type: trn1.2xlarge containers: - name: trn1-mlp command: ["/usr/local/bin/python3"] args: ["/opt/ml/mlp_train.py"] image: 647554078242.dkr.ecr.us-east-1.amazonaws.com/sunda-pt:k8s_mlp_0907 imagePullPolicy: IfNotPresent env: - name: NEURON_RT_LOG_LEVEL value: "INFO" resources: limits: aws.amazon.com/neuron: 2 requests: aws.amazon.com/neuron: 2 2. Deploy the pod. :: kubectl apply -f mlp_train.yaml 3. Check the logs to make sure training completed :: kubectl logs Your log should have the following :: Final loss is 0.1977 ----------End Training --------------- ================================================ FILE: containers/tutorials/training/tutorial-training.rst ================================================ .. _tutorial-training: Run Training in PyTorch Neuron Container ======================================== .. contents:: Table of Contents :local: :depth: 2 Overview -------- This tutorial demonstrates how to run a pytorch container on an trainium instance. By the end of this tutorial you will be able to run simple mlp training using the container You will use an trn1.2xlarge to test your Docker configuration for Trainium. To find out the available neuron devices on your instance, use the command ``ls /dev/neuron*``. Setup Environment ----------------- 1. Launch an Trn1 Instance .. include:: /setup/install-templates/launch-instance.txt 2. Set up docker environment according to :ref:`tutorial-docker-env-setup` 3. A sample Dockerfile for for torch-neuron can be found here :ref:`trainium-dlc-dockerfile`. This dockerfile needs the mlp train script found here :ref:`mlp-train` With the files in a dir, build the image with the following command: .. code:: bash docker build . -f Dockerfile.pt -t neuron-container:pytorch Run the following command to start the container .. code:: bash docker run -it --name pt-cont --net=host --device=/dev/neuron0 neuron-container:pytorch python3 /opt/ml/mlp_train.py ================================================ FILE: containers/tutorials/tutorial-docker-env-setup.rst ================================================ .. _tutorial-docker-env-setup: Tutorial Docker environment setup ================================= Introduction ------------ A Neuron application can be deployed using docker containers. This tutorial describes how to configure docker on Amazon Linux 2023 to expose Inferentia/Trainium devices to containers. .. tab-set:: .. tab-item:: Training .. dropdown:: Install Drivers :class-title: sphinx-design-class-title-small :class-body: sphinx-design-class-body-small :animate: fade-in .. code:: bash # Configure Linux for Neuron repository updates sudo tee /etc/yum.repos.d/neuron.repo > /dev/null < /dev/null <`__ is an OCI runtime with the sole purpose of injecting OCI prestart, poststart, and poststop hooks into a container config.json before passing along to an OCI compatable runtime. oci-add-hooks is used to inject a hook that exposes Inferentia devices to the container. .. code:: bash sudo apt install -y golang && \ export GOPATH=$HOME/go && \ go get github.com/joeshaw/json-lossless && \ cd /tmp/ && \ git clone https://github.com/awslabs/oci-add-hooks && \ cd /tmp/oci-add-hooks && \ make build && \ sudo cp /tmp/oci-add-hooks/oci-add-hooks /usr/local/bin/ Install the package that has oci hook software ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. important:: This step should run on the Linux host and not inside the container. For Inf1 install the following package .. code:: bash sudo apt-get install aws-neuron-runtime-base -y For Trn1 install the following package .. code:: bash sudo apt-get install aws-neuronx-oci-hook -y For docker runtime setup Docker to use oci-neuron OCI runtime. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ oci-neuron is a script representing OCI compatible runtime. It wraps oci-add-hooks, which wraps runc. In this step, we configure docker to point at oci-neuron OCI runtime. Install dockerIO: .. code:: bash sudo cp /opt/aws/neuron/share/docker-daemon.json /etc/docker/daemon.json sudo service docker restart If the docker restart command fails, make sure to check if the docker systemd service is not masked. More information on this can be found here: https://stackoverflow.com/a/37640824 For containerd runtime, setup containerd to use oci-neuron OCI runtime. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Update the following fields in the /etc/containerd/config.toml to configure containerd to use the neuron oci hook .. code:: bash default_runtime_name = "neuron" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.neuron] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.neuron.options] BinaryName = "/opt/aws/neuron/bin/oci_neuron_hook_wrapper.sh" After that restart the containerd daemon .. code:: bash sudo systemctl restart containerd For cri-o runtime, setup cri-o to use oci-neuron OCI runtime. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Update the following fields in the /etc/crio/crio.conf to configure cri-o to use the neuron oci hook .. code:: bash default_runtime_name = "neuron" [crio.runtime.runtimes.neuron] runtime_path = "/opt/aws/neuron/bin/oci_neuron_hook_wrapper.sh" After that restart the containerd daemon .. code:: bash sudo systemctl restart cri-o .. _oci-hook-workarounds: OCI hook workarounds ^^^^^^^^^^^^^^^^^^^^ **ECS (EC2)** Add the following to your ECS task definition: .. code:: json "linuxParameters": { "devices": [ { "containerPath": "/dev/neuron0", "hostPath": "/dev/neuron0", "permissions": [ "read", "write" ] }, { "containerPath": "/dev/neuron1", "hostPath": "/dev/neuron1", "permissions": [ "read", "write" ] }, ..., ], }, The linuxParameters parameter can be found under containerDefinition. More information can be found here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_linuxparameters. Expose as many Neuron devices as needed, up to the max number of devices for the specified instance. For example, the trn1.32xlarge instance type contains 16 neuron devices, so the devices that can be exposed are /dev/neuron0, /dev/neuron1, up to /dev/neuron15. To see an example of an ECS task definition exposing Neuron devices, see https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-inference-task-def.html. ================================================ FILE: containers/tutorials.rst ================================================ .. meta:: :description: Comprehensive tutorials for deploying AWS Neuron SDK in containers with Docker and Kubernetes. Learn to build Neuron containers, configure EKS clusters, deploy device plugins, and set up monitoring for Trainium and Inferentia instances. :keywords: Neuron containers, Docker, Kubernetes, EKS, Trainium, Inferentia, device plugin, scheduler, monitoring, tutorials, AWS, machine learning Containers - Tutorials ======================= Learn how to deploy and manage AWS Neuron workloads in containerized environments. These tutorials cover everything from building Docker containers with Neuron support to deploying production-ready Kubernetes clusters with device plugins, schedulers, and monitoring solutions. Whether you're running inference or training workloads on AWS Trainium or Inferentia instances, these step-by-step guides will help you configure your container infrastructure for optimal performance and reliability. .. toctree:: :maxdepth: 1 :hidden: Inference Training /containers/tutorials/tutorial-docker-env-setup /containers/tutorials/build-run-neuron-container /containers/tutorials/tutorial-oci-hook /containers/tutorials/k8s-setup /containers/tutorials/k8s-neuron-helm-chart /containers/tutorials/k8s-neuron-scheduler-flow /containers/tutorials/k8s-neuron-monitor /containers/tutorials/k8s-neuron-problem-detector-and-recovery /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa General Container Tutorials ---------------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Docker Environment Setup :link: /containers/tutorials/tutorial-docker-env-setup :link-type: doc Configure Docker on Amazon Linux 2023 to expose Inferentia and Trainium devices to containers. Install Neuron drivers, runtime, and configure the Docker daemon for Neuron device access. .. grid-item-card:: Build and Run Neuron Containers :link: /containers/tutorials/build-run-neuron-container :link-type: doc Learn how to build Docker images with Neuron support using provided Dockerfiles and run containerized applications on Inf1 and Trn1 instances with proper device exposure. .. grid-item-card:: Docker Neuron OCI Hook Setup :link: /containers/tutorials/tutorial-oci-hook :link-type: doc Install and configure the Neuron OCI hook to enable the AWS_NEURON_VISIBLE_DEVICES environment variable for exposing all Neuron devices to containers without explicit device flags. Kubernetes Setup and Configuration ----------------------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Kubernetes Environment Setup :link: /containers/tutorials/k8s-setup :link-type: doc Complete guide to setting up Kubernetes for Neuron, including EKS cluster creation with Trainium nodes, device plugin installation, scheduler extension setup, and resource allocation configuration. .. grid-item-card:: Neuron Helm Chart :link: /containers/tutorials/k8s-neuron-helm-chart :link-type: doc Simplify Neuron infrastructure deployment with the unified Helm chart that installs device plugins, scheduler extensions, node problem detector, and DRA driver in a single command. Kubernetes Device Management ----------------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Scheduler Flow Diagram :link: /containers/tutorials/k8s-neuron-scheduler-flow :link-type: doc Visual diagram showing how the Neuron Scheduler Extension integrates with Kubernetes components to schedule Pods with Neuron resource requests. Kubernetes Monitoring and Recovery ----------------------------------- .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Neuron Monitor :link: /containers/tutorials/k8s-neuron-monitor :link-type: doc Deploy Neuron Monitor to collect and expose metrics from Neuron devices and runtime. Integrate with Prometheus for observability, performance tracking, and troubleshooting. .. grid-item-card:: Node Problem Detector and Recovery :link: /containers/tutorials/k8s-neuron-problem-detector-and-recovery :link-type: doc Monitor Neuron device health and automatically remediate issues by detecting hardware failures, driver problems, and runtime errors. Enable automatic node replacement for faulty hardware. .. grid-item-card:: NPD Permissions (IRSA) :link: /containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa :link-type: doc Configure IAM roles for service accounts (IRSA) to grant the Neuron Node Problem Detector necessary permissions for Auto Scaling group operations and CloudWatch metrics. Training and Inference Container Tutorials ------------------------------------------ .. tab-set:: .. tab-item:: Training .. include:: /containers/tutorials/training/index.txt .. tab-set:: .. tab-item:: Inference .. include:: /containers/tutorials/inference/index.txt ================================================ FILE: devflows/aws-batch-flows.rst ================================================ .. _aws_batch_flow: AWS Batch ========= .. toctree:: :maxdepth: 1 /devflows/training/batch/batch-training ================================================ FILE: devflows/aws-batch-flows.txt ================================================ .. tab-set:: .. tab-item:: Inference .. include:: /devflows/inference/aws-batch-flows.txt .. tab-set:: .. tab-item:: Training .. include:: /devflows/training/aws-batch-flows.txt ================================================ FILE: devflows/dlc-then-customize-devflow.rst ================================================ .. _dlc-then-customize-devflow: Customize Neuron DLC ============================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- This guide covers how to customize and extend the Neuron Deep Learning Container (DLC) to fit your specific project needs. You can customize the DLC either by using the DLC as a base image in your Dockerfile or by modifying published Dockerfiles on GitHub. Method 1: Using DLC as a Base Image ----------------- 1. Create a New Dockerfile. In your Dockerfile, specify the Neuron DLC as your base image using the FROM directive. 2. Complete the Dockerfile. You can add additional packages, change the base environment, or any other modifications that suit your project. `AWS Batch Training `_ is a good example which needs customize Neuron DLC by using it as the base image. From its `Dockerfile `_, we can find the customized container copies llama_batch_training.sh to the container and runs it. 3. Navigate to the directory containing your Dockerfile and build your custom container. Method 2: Modifying Published Dockerfiles ----------------- 1. Visit the `Neuron DLC Github repo `_ and locate the Dockerfile for the container you wish to customize. 2. Modify the Dockerfile as needed. You can add additional packages, change the base environment, or any other modifications that suit your project. For example, if you do not need to use Neuron tools in your scenario and want to make the container smaller, you can remove aws-neuronx-tools at this `line `_. 3. Navigate to the directory containing your Dockerfile and build your custom container. ================================================ FILE: devflows/ec2-flows.rst ================================================ .. _amazon-ec2: Amazon EC2 ========== .. toctree:: :maxdepth: 1 :hidden: Inference Training .. include:: /devflows/ec2-flows.txt ================================================ FILE: devflows/ec2-flows.txt ================================================ .. tab-set:: .. tab-item:: Inference .. include:: /devflows/inference/ec2-flows.txt .. tab-set:: .. tab-item:: Training .. include:: /devflows/training/ec2-flows.txt ================================================ FILE: devflows/ecs-flows.rst ================================================ .. _ecs_flow: Amazon ECS ========== .. toctree:: :maxdepth: 1 /devflows/plugins/npd-ecs-flows /devflows/inference/dlc-then-ecs-devflow /devflows/training/dlc-then-ecs-devflow In this section, you'll find resources to help you use Neuron with ECS cluster, deploying inference and training workloads on Inferentia and Trainium ECS clusters. Using Neuron Node Problem Detector Plugin with ECS -------------------------------------------------- Neuron node problem detector and recovery plugin enhances resiliency by detecting and remediating errors. To get started with using Neuron node problem detector plugin and recovery plugin on an ECS cluster, please refer to :ref:`ecs-neuron-problem-detector-and-recovery`. Running Inference workload -------------------------- This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Inferentia instances. For running machine learning inference workloads on Amazon ECS using AWS Deep Learning Containers, please refer to :ref:`inference-dlc-then-ecs-devflow`. Running Training workload ------------------------- This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an ECS cluster with Trainium instances. For running machine learning training workloads on Amazon ECS using AWS Deep Learning Containers, please refer to :ref:`training-dlc-then-ecs-devflow`. ================================================ FILE: devflows/eks-flows.rst ================================================ .. _eks_flow: Amazon EKS ========== .. toctree:: :maxdepth: 1 /containers/kubernetes-getting-started /devflows/inference/dlc-then-eks-devflow /containers/tutorials/training/k8s_mlp_train_demo In this section, you'll find resources to help you use Neuron with EKS cluster, deploying inference and training workloads on Inferentia and Trainium EKS clusters. EKS Setup ------------ This guide covers setting up the Neuron device plugin, scheduler extension, node problem detector, and monitoring plugins. These components enable efficient resource utilization, monitoring, and resilience when using Inferentia and Trainium instances for inference and training workloads on Kubernetes clusters. To get started with using AWS Neuron and setting up the required plugins on an EKS cluster, please refer to :ref:`tutorial-k8s-env-setup-for-neuron`. Running Inference workload -------------------------- This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Inferentia instances. For running machine learning inference workloads on Amazon EKS using AWS Deep Learning Containers, please refer to :ref:`dlc-then-eks-devflow`. Running Training workload ------------------------- This guide walks you through the end-to-end process of building and running a Docker container with your model and deploying it on an EKS cluster with Trainium instances. For running machine learning training workloads on Amazon EKS using AWS Deep Learning Containers, please refer to :ref:`example-deploy-mlp-train-pod`. ================================================ FILE: devflows/index.rst ================================================ .. _neuron-devflows: .. meta:: :description: :date-modified: AWS Workload Orchestration ========================== AWS Neuron integrates seamlessly with various AWS compute and orchestration services to accelerate deep learning workloads. This section provides deployment patterns and best practices for running Neuron-powered applications across different AWS services, from container orchestration to high-performance computing clusters. .. grid:: 2 :gutter: 2 .. grid-item-card:: Amazon EKS :link: /devflows/eks-flows :link-type: doc :class-body: sphinx-design-class-title-small Deploy Neuron workloads on Kubernetes with Amazon Elastic Kubernetes Service .. grid-item-card:: Amazon ECS :link: /devflows/ecs-flows :link-type: doc :class-body: sphinx-design-class-title-small Run containerized Neuron applications using Amazon Elastic Container Service .. grid-item-card:: AWS ParallelCluster :link: /devflows/parallelcluster-flows :link-type: doc :class-body: sphinx-design-class-title-small Set up HPC clusters for distributed training and inference workloads .. grid-item-card:: AWS Batch :link: /devflows/aws-batch-flows :link-type: doc :class-body: sphinx-design-class-title-small Execute batch ML jobs with automatic scaling and resource management .. toctree:: :maxdepth: 1 :hidden: /devflows/eks-flows /devflows/ecs-flows /devflows/parallelcluster-flows /devflows/aws-batch-flows Amazon SageMaker Third-party Solutions ================================================ FILE: devflows/inference/aws-batch-flows.rst ================================================ AWS Batch Flows - Inference =========================== .. include:: /devflows/inference/aws-batch-flows.txt ================================================ FILE: devflows/inference/aws-batch-flows.txt ================================================ .. note:: AWS Batch supports Inf1. An example of how to deploy a model with Neuron using Batch is coming soon. ================================================ FILE: devflows/inference/byoc-hosting-devflow-inf2.rst ================================================ .. _byoc-hosting-devflow-inf2: Bring Your Own Neuron Container to Sagemaker Hosting (inf2 or trn1) ==================================================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/byoc-then-hosting-dev-flow.png :width: 850 :alt: Neuron developer flow on SageMaker Neo :align: middle You can use a SageMaker Notebook or an EC2 instance to compile models and build your own containers for deployment on SageMaker Hosting using ml.inf2 instances. In this developer flow, you provision a Sagemaker Notebook or an EC2 instance to train and compile your model to Inferentia. Then you deploy your model to SageMaker Hosting using the `SageMaker Python SDK `_. You may not need to create a container to bring your own **code** to Amazon SageMaker. When you are using a framework such as TensorFlow or PyTorch that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework. Follow the steps bellow to setup your environment. Once your environment is set you'll be able to follow the `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample `_. .. _byoc-hosting-setenv: Setup Environment ----------------- 1. Create a Compilation Instance: If using an **EC2 instance for compilation only** you can use any instances to compile a model. It is recommended that you start with an c5.4xlarge instance. If using an **EC2 instance for compilation and test a model** you can use an Inf2 instance. Follow these steps to launch an Inf2 instance: .. include:: /setup/install-templates/inf2/launch-inf2-dlami.rst If using an **SageMaker Notebook for compilation**, follow the instructions in `Get Started with Notebook Instances `_ to provision the environment. It is recommended that you start with an ml.c5.4xlarge instance for the compilation. Also, increase the volume size of you SageMaker notebook instance, to accomodate the models and containers built locally. A volume of 10GB is sufficient. .. note:: To compile the model in the SageMaker Notebook instance, you'll need to install the Neuron Compiler and Neuron Framework Extensions. Follow the `Compiling and Deploying HuggingFace Pretrained BERT on Inf2 on Amazon SageMaker Sample `_ to install the environments. 2. Set up the environment to compile a model, build your own container and deploy: To compile your model on EC2 or SageMaker Notebook, follow the *Set up a development environment* section on the EC2 :ref:`ec2-then-ec2-setenv` documentation. Refer to `Adapting Your Own Inference Container `_ documentation for information on how to bring your own containers to SageMaker Hosting. Make sure to add the **AmazonEC2ContainerRegistryPowerUser** role to your IAM role ARN, so you're able to build and push containers from your SageMaker Notebook instance. .. note:: The container image can be created using :ref:`how-to-build-neuron-container`. ================================================ FILE: devflows/inference/byoc-hosting-devflow.rst ================================================ .. _byoc-hosting-devflow: Bring Your Own Neuron Container to Sagemaker Hosting (inf1) ==================================================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/byoc-then-hosting-dev-flow.png :width: 850 :alt: Neuron developer flow on SageMaker Neo :align: middle You can use a SageMaker Notebook or an EC2 instance to compile models and build your own containers for deployment on SageMaker Hosting using ml.inf1 instances. In this developer flow, you provision a Sagemaker Notebook or an EC2 instance to train and compile your model to Inferentia. Then you deploy your model to SageMaker Hosting using the SageMaker Python SDK. Follow the steps bellow to setup your environment. Once your environment is set you'll be able to follow the :ref:`BYOC HuggingFace pretrained BERT container to Sagemaker Tutorial ` . .. _byoc-hosting-setenv: Setup Environment ----------------- 1. Create a Compilation Instance: If using an **EC2 instance for compilation** you can use an Inf1 instance to compile and test a model. Follow these steps to launch an Inf1 instance: .. include:: /setup/install-templates/inf1/launch-inf1-ami.rst If using an **SageMaker Notebook for compilation**, follow the instructions in `Get Started with Notebook Instances `_ to provision the environment. It is recommended that you start with an ml.c5.4xlarge instance for the compilation. Also, increase the volume size of you SageMaker notebook instance, to accomodate the models and containers built locally. A volume of 10GB is sufficient. .. note:: To compile the model in the SageMaker Notebook instance, you'll need to update the conda environments to include the Neuron Compiler and Neuron Framework Extensions. Follow the installation guide on the section :ref:`how-to-update-to-latest-Neuron-Conda-Env` to update the environments. 2. Set up the environment to compile a model, build your own container and deploy: To compile your model on EC2 or SageMaker Notebook, follow the *Set up a development environment* section on the EC2 :ref:`ec2-then-ec2-setenv` documentation. Refer to `Adapting Your Own Inference Container `_ documentation for information on how to bring your own containers to SageMaker Hosting. Make sure to add the **AmazonEC2ContainerRegistryPowerUser** role to your IAM role ARN, so you're able to build and push containers from your SageMaker Notebook instance. .. note:: The container image can be created using :ref:`how-to-build-neuron-container`. ================================================ FILE: devflows/inference/container-sm-hosting-devflow.rst ================================================ .. _container-sm-hosting-devflow: Deploy on Sagemaker Hosting =========================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- You can use `Sagemaker Hosted Endpoint `_ to do inference on Inf1 instances. ================================================ FILE: devflows/inference/dev-flows.rst ================================================ .. _neuron1-devflows: .. _compilation-flow-target: .. _deploym-flow-target: Developer Flows Introduction ============================ |image| .. |image| image:: /images/neuron-devflow.jpg :width: 500 :alt: Neuron developer flow A typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s. You can develop on Neuron using one of the following combinations of developer flows: .. toctree:: :maxdepth: 1 ec2-then-ec2-devflow ec2-then-ec2-devflow-inf2 neo-then-hosting-devflow byoc-hosting-devflow dlc-then-ec2-devflow dlc-then-ecs-devflow dlc-then-eks-devflow ================================================ FILE: devflows/inference/dlc-then-ec2-devflow.rst ================================================ .. _dlc-then-ec2-devflow: Deploy Neuron Container on EC2 ============================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/dlc-on-ec2-dev-flow.png :width: 500 :alt: Neuron developer flow for DLC on EC2 :align: middle You can use the Neuron version of the `AWS Deep Learning Containers `_ to run inference on inf1 instances. In this developer flow, you provision an EC2 inf1 instance using a Deep Learming AMI (DLAMI), pull the container image with the Neuron version of the desired framework, and run the container as a server for the already compiled model. This developer flow assumes the model has already has been compiled through a :ref:`compilation developer flow ` .. _dlc-then-ec2-setenv: Setup Environment ----------------- 1. Launch an Inf1 Instance .. include:: /setup/install-templates/inf1/launch-inf1-ami.rst 2. Once you have your EC2 environment set according to :ref:`tutorial-docker-env-setup`, you can build and run a Neuron container using the :ref:`how-to-build-neuron-container` section above. .. [DLC specific flow, uncomment when DLC available] Follow the `Getting Started with Deep Learning Containers for Inference on EC2 `_ and use the appropriate DLC container. .. note:: **Prior to running the container**, make sure that the Neuron runtime on the instance is turned off, by running the command: .. code:: bash sudo service neuron-rtd stop ================================================ FILE: devflows/inference/dlc-then-ecs-devflow.rst ================================================ .. _inference-dlc-then-ecs-devflow: Deploy Neuron Container on Elastic Container Service (ECS) for Inference ======================================================================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/dlc-on-ecs-dev-flow.png :width: 750 :alt: Neuron developer flow for DLC on ECS :align: middle You can use the Neuron version of the `AWS Deep Learning Containers `_ to run inference on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with inf1/inf2 instances, create a task description for your inference service and deploy it to your cluster. This developer flow assumes: 1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance ` or through :ref:`Compilation with Sagemaker Neo `. 2. You already set up your container to retrieve it from storage. .. _inference-dlc-then-ecs-setenv: Setup Environment ----------------- 1. Set up an Amazon ECS cluster: Follow the instructions on `Setting up Amazon ECS for Deep Learning Containers `_ 2. Define an Inference Task: Use the instruction on the `DLC Inference on ECS Tutorial `_ to define a task and create a service for the appropriate framework. When creating tasks for inferentia instances on ECS, be aware of the considerations and requirements listed in `Working with inference workloads on Amazon ECS `_. 3. Use the container image created using :ref:`how-to-build-neuron-container` as the ``image`` in your task definition. .. _inference-push_to_ecr_note: .. note:: Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image `_ for more information. ================================================ FILE: devflows/inference/dlc-then-eks-devflow.rst ================================================ .. _dlc-then-eks-devflow: Deploy Neuron Container on Elastic Kubernetes Service (EKS) for Inference ========================================================================= .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/dlc-on-eks-dev-flow.png :width: 750 :alt: Neuron developer flow for DLC on ECS :align: middle You can use the Neuron version of the `AWS Deep Learning Containers `_ to run inference on Amazon Elastic Kubernetes Service (EKS). In this developer flow, you set up an EKS cluster with Inf1 instances, create a Kubernetes manifest for your inference service and deploy it to your cluster. This developer flow assumes: 1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance ` or through :ref:`Compilation with Sagemaker Neo `. 2. You already set up your container to retrieve it from storage. .. _dlc-then-eks-setenv: Setup Environment ----------------- Please add inferentia nodes using instructions at :ref:`tutorial-k8s-env-setup-for-neuron` . Using the YML deployment manifest shown `in the EKS documentation for inferentia `_, replace the `image` in the `containers` specification with the one you built using :ref:`how-to-build-neuron-container`. .. note:: Before deploying the yaml to your EKS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image `_ for more information. Inference Example ----------------- Please refer to :ref:`example-deploy-rn50-as-k8s-service` run a simple inference example. Note that the container image referenced in the YML manifest is created using :ref:`how-to-build-neuron-container`. ================================================ FILE: devflows/inference/dlc-then-k8s-devflow.rst ================================================ .. _dlc-then-k8s-devflow: Deploy Neuron Container on Kubernetes ====================================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- Use of Neuron in containers on Kubernetes cluster can be simple to achieve by following :ref:`tutorial-k8s-env-setup-for-neuron` Known Limitations ----------------- Scheduling on k8s cluster requires contiguous neuron device-ids ================================================ FILE: devflows/inference/ec2-flows.rst ================================================ EC2 Flows - Inference ===================== .. toctree:: :maxdepth: 1 :hidden: /devflows/inference/ec2-then-ec2-devflow /devflows/inference/ec2-then-ec2-devflow-inf2 .. include:: /devflows/inference/ec2-flows.txt ================================================ FILE: devflows/inference/ec2-flows.txt ================================================ * :ref:`ec2-then-ec2-devflow` * :ref:`ec2-then-ec2-devflow-inf2` ================================================ FILE: devflows/inference/ec2-then-ec2-devflow-inf2.rst ================================================ .. _ec2-then-ec2-devflow-inf2: Compile with Framework API and Deploy on EC2 Inf2 ================================================= .. contents:: Table of Contents :local: :depth: 3 Description ----------- |image| .. |image| image:: /images/ec2-then-ec2-dev-flow-inf2.png :width: 500 :alt: Neuron developer flow on EC2 :align: middle You can use a single inf2 instance as a development environment to compile and deploy Neuron models. In this developer flow, you provision an EC2 inf2 instance using a Deep Learning AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps below to setup your environment. .. note:: **Model compilation can be executed on a non-inf2 instance** for later deployment. Follow the same EC2 Developer Flow Setup using other instance families and leverage `Amazon Simple Storage Service `_ (S3) to share the compiled models between different instances. .. _ec2-then-ec2-setenv: Setup Environment ----------------- 1. Launch an Inf2 Instance ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf2/launch-inf2-dlami.rst 2. Set up a development environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Enable PyTorch-Neuron ~~~~~~~~~~~~~~~~~~~~~ .. include :: /setup/install-templates/inf2/note-setup-libnrt-warning.rst .. include:: /setup/install-templates/inf2/dlami-enable-neuron-pytorch.rst 3. Set up Jupyter notebook ^^^^^^^^^^^^^^^^^^^^^^^^^^ To develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting` You can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. ================================================ FILE: devflows/inference/ec2-then-ec2-devflow.rst ================================================ .. _ec2-then-ec2-devflow: Compile with Framework API and Deploy on EC2 Inf1 ================================================= .. contents:: Table of Contents :local: :depth: 3 Description ----------- |image| .. |image| image:: /images/ec2-then-ec2-dev-flow.png :width: 500 :alt: Neuron developer flow on EC2 :align: middle You can use a single inf1 instance as a development environment to compile and deploy Neuron models. In this developer flow, you provision an EC2 inf1 instance using a Deep Learming AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps bellow to setup your environment. .. note:: **Model compilation can be executed on a non-inf1 instance** for later deployment. Follow the same EC2 Developer Flow Setup using other instance families and leverage `Amazon Simple Storage Service `_ (S3) to share the compiled models between different instances. .. _ec2-then-ec2-setenv: Setup Environment ----------------- 1. Launch an Inf1 Instance ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/inf1/launch-inf1-dlami.rst 2. Set up a development environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Enable PyTorch-Neuron ~~~~~~~~~~~~~~~~~~~~~ .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. include:: /setup/install-templates/inf1/dlami-enable-neuron-pytorch.rst Enable TensorFlow-Neuron ~~~~~~~~~~~~~~~~~~~~~~~~~ .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. include:: /setup/install-templates/inf1/dlami-enable-neuron-tensorflow.rst Enable Apache MXNet ~~~~~~~~~~~~~~~~~~~~ .. include :: /setup/install-templates/inf1/note-setup-libnrt-warning.rst .. include:: /setup/install-templates/inf1/dlami-enable-neuron-mxnet.rst 3. Set up Jupyter notebook ^^^^^^^^^^^^^^^^^^^^^^^^^^ To develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting` You can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. ================================================ FILE: devflows/inference/env-setup-text.rst ================================================ A typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s. You can also choose one of the following combinations for compilation and deployment: ================================================ FILE: devflows/inference/neo-then-hosting-devflow.rst ================================================ .. _neo-then-hosting-devflow: Compile with Sagemaker Neo and Deploy on Sagemaker Hosting (inf1) ========================================================== .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/neo-then-hosting-dev-flow.png :width: 700 :alt: Neuron developer flow on SageMaker Neo :align: middle You can use SageMaker Neo to compile models for deployment on SageMaker Hosting using ml.inf1 instances. In this developer flow, you provision a Sagemaker Notebook instance to train, compile and deploy your model using the SageMaker Python SDK. Follow the steps bellow to setup your environment. .. _neo-then-hosting-setenv: Setup Environment ----------------- 1. Create an Amazon SageMaker Notebook Instance: Follow the instructions in `Get Started with Notebook Instances `_ The Notebook instance created provides the required Python SDK for training, compiling and deploying models with Amazon SageMaker. 2. Compile a model using the Amazon SageMaker SDK: Refer to `Supported Instances Types and Frameworks `_ for information on the framework versions currently supported by Amazon SageMaker Neo on AWS Inferentia. More information about compiling and deploying models with Amazon SageMaker Neo can be found on `Use Neo to Compile a Model `_ ================================================ FILE: devflows/inference/parallelcluster-flows.rst ================================================ Parallel Cluster Flows - Inference =================================== .. include:: /devflows/inference/parallelcluster-flows.txt ================================================ FILE: devflows/inference/parallelcluster-flows.txt ================================================ .. note:: AWS ParallelCluster support is coming soon. ================================================ FILE: devflows/inference/sagemaker-flows.rst ================================================ Sagemaker Flows - Inference =========================== .. toctree:: :maxdepth: 1 :hidden: /devflows/inference/byoc-hosting-devflow-inf2 /devflows/inference/byoc-hosting-devflow /devflows/inference/neo-then-hosting-devflow .. include:: /devflows/inference/sagemaker-flows.txt ================================================ FILE: devflows/inference/sagemaker-flows.txt ================================================ * :ref:`byoc-hosting-devflow-inf2` * :ref:`byoc-hosting-devflow` * :ref:`neo-then-hosting-devflow` * `AWS Neuron Sagemaker Samples GitHub Repository `_ ================================================ FILE: devflows/parallelcluster-flows.rst ================================================ AWS ParallelCluster =================== .. toctree:: :maxdepth: 1 /devflows/training/parallelcluster-flows .. .. include:: /devflows/parallelcluster-flows.txt ================================================ FILE: devflows/parallelcluster-flows.txt ================================================ .. tab-set:: .. tab-item:: Training .. include:: /devflows/training/parallelcluster-flows.txt .. tab-set:: .. tab-item:: Inference .. note:: AWS ParallelCluster support is coming soon. ================================================ FILE: devflows/plugins/npd-ecs-flows.rst ================================================ .. _ecs-neuron-problem-detector-and-recovery: Neuron Problem Detector And Recovery ==================================== .. include:: /devflows/plugins/npd-ecs-flows.txt ================================================ FILE: devflows/plugins/npd-ecs-flows.txt ================================================ Neuron node problem detector and recovery artifact checks the health of Neuron devices on each ECS instance. After detecting an unrecoverable Neuron error, it triggers an instance replacement. In order to get started with Neuron node problem detector and recovery, make sure that the following requirements are satisfied: * The Neuron node problem detector and recovery requires Neuron driver 2.15+, and it requires the runtime to be at SDK 2.18 or later. Creating a Task Definition -------------------------- Configuration ~~~~~~~~~~~~~ The task definition includes two containers: - **npd-container**: This container is responsible for enabling Problem detection functionality in the ECS cluster. - **recovery-container**: This container handles recovery operations in case of failures detected by Neuron Problem Detector. The **recovery-container** has an environment variable called ``ENABLE_RECOVERY`` that controls whether recovery is enabled or disabled. Set the value to ``true`` to enable recovery, or ``false`` to disable it. Follow these steps to create a task definition for NPD and recovery: 1. Go to the `ECS console `_ and select **Task Definitions** in the navigation pane. 2. Click **Create new Task Definition** and choose **Create new Task Definition with JSON**. 3. Paste the task definition JSON provided, replacing the placeholders with your account-specific values. .. code-block:: json { "family": "neuron-npd-and-recovery", "containerDefinitions": [ { "name": "npd", "image": "registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19", "cpu": 0, "portMappings": [ { "name": "npd-80-tcp", "containerPort": 80, "hostPort": 80, "protocol": "tcp", "appProtocol": "http" } ], "essential": true, "entryPoint": [ "/bin/sh", "-c" ], "command": [ "echo '{\"plugin\":\"kmsg\",\"logPath\":\"/dev/kmsg\",\"lookback\":\"5m\",\"bufferSize\":10,\"source\":\"kernel-monitor\",\"conditions\":[{\"type\":\"NeuronHealth\",\"reason\":\"NeuronHasNoError\",\"message\":\"Neuronhasnoerror\"}],\"rules\":[{\"type\":\"permanent\",\"condition\":\"NeuronHealth\",\"reason\":\"NeuronHasError_SRAM_UNCORRECTABLE_ERROR\",\"pattern\":\".*NEURON_HW_ERR=SRAM_UNCORRECTABLE_ERROR.*\"},{\"type\":\"permanent\",\"condition\":\"NeuronHealth\",\"reason\":\"NeuronHasError_NC_UNCORRECTABLE_ERROR\",\"pattern\":\".*NEURON_HW_ERR=NC_UNCORRECTABLE_ERROR.*\"},{\"type\":\"permanent\",\"condition\":\"NeuronHealth\",\"reason\":\"NeuronHasError_HBM_UNCORRECTABLE_ERROR\",\"pattern\":\".*NEURON_HW_ERR=HBM_UNCORRECTABLE_ERROR.*\"},{\"type\":\"permanent\",\"condition\":\"NeuronHealth\",\"reason\":\"NeuronHasError_DMA_ERROR\",\"pattern\":\".*NEURON_HW_ERR=DMA_ERROR.*\"}]}' > /config/kernel-monitor.json && /node-problem-detector --v=2 --logtostderr --enable-k8s-exporter=false --config.system-log-monitor=/config/kernel-monitor.json" ], "environment": [], "mountPoints": [], "volumesFrom": [], "linuxParameters": { "devices": [ { "hostPath": "/dev/kmsg", "containerPath": "/dev/kmsg", "permissions": [ "read", "write" ] } ] }, "privileged": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/npd", "awslogs-create-group": "true", "awslogs-region": "us-west-2", "awslogs-stream-prefix": "ecs" }, "secretOptions": [] }, "systemControls": [] }, { "name": "recovery", "image": "public.ecr.aws/neuron/neuron-node-recovery:1.3.0", "cpu": 0, "portMappings": [], "essential": true, "entryPoint": [ "/bin/sh", "-c" ], "command": [ "python scripts/check-health.py" ], "environment": [ { "name": "ENABLE_RECOVERY", "value": "false" } ], "mountPoints": [], "volumesFrom": [], "readonlyRootFilesystem": true, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-create-group": "true", "awslogs-group": "/ecs/recovery", "awslogs-region": "us-west-2", "awslogs-stream-prefix": "ecs" } }, "systemControls": [] } ], "executionRoleArn": "arn:aws:iam::012345678910:role/ecsTaskExecutionRole", "taskRoleArn": "arn:aws:iam::012345678910:role/ecsTaskExecutionRole", "networkMode": "awsvpc", "requiresCompatibilities": [ "EC2" ], "cpu": "1024", "memory": "3072", "runtimePlatform": { "cpuArchitecture": "X86_64", "operatingSystemFamily": "LINUX" } } 4. Review the task definition and click **Create**. For more details on task definitions, refer to the `AWS documentation `_. .. _deploy-service: Deploying the Service --------------------- After creating the task definition, follow these steps to deploy the service: 1. In the ECS console, select the task definition and click **Deploy** → **Create Service**. 2. Select your ECS cluster, set the launch type to **EC2**, and the service type to **Daemon**. 3. Click **Create** to deploy the service. For more details on deploying services, refer to the `AWS documentation `_. Permissions ~~~~~~~~~~~ Ensure the ECS task execution role and task role have permissions to: - Publish metrics to CloudWatch - Read and set health status of EC2 instances in the Auto Scaling group Refer to the `AWS documentation on IAM roles for ECS tasks `_ for more information. When any unrecoverable error occurs, Neuron node problem detector and recovery publishes a metric under the CloudWatch namespace NeuronHealthCheck. It also reflects in NodeCondition and can be seen with kubectl describe node. ================================================ FILE: devflows/sagemaker-flows.rst ================================================ .. _sagemaker_flow: Amazon SageMaker ================ Amazon SageMaker is a fully managed machine learning (ML) platform that streamlines the end-to-end ML workflow at scale. AWS Neuron integrates with Amazon SageMaker to provide optimized performance for ML workloads on AWS Inferentia and AWS Trainium chips. .. contents:: Table of contents :local: :depth: 1 SageMaker JumpStart """"""""""""""""""" Use `Amazon SageMaker JumpStart `_ to train and deploy models using Neuron. SageMaker JumpStart is an ML hub that accelerates model selection and deployment. It provides support for fine-tuning and deploying popular models such as Meta’s Llama family of models. Users can customize pre-trained models with their data and easily deploy them. SageMaker HyperPod """""""""""""""""" Use `Amazon SageMaker HyperPod `_ to streamline ML infrastructure setup and optimization with AWS Neuron. SageMaker HyperPod leverages pre-configured distributed training libraries to split workloads across numerous AI accelerators, enhancing model performance. HyperPod ensures uninterrupted training through automatic checkpointing, fault detection, and recovery. SageMaker Training """""""""""""""""" `Amazon SageMaker Model Training `_ reduces the time and cost to train and tune ML models at scale without the need to manage infrastructure. SageMaker Inference """"""""""""""""""" With `Amazon SageMaker `_ , you can start getting predictions, or inferences, from your trained ML models. SageMaker provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. ================================================ FILE: devflows/setup/ecs-flows.rst ================================================ ECS Flows - Setup ================= .. toctree:: :maxdepth: 1 :hidden: /devflows/plugins/npd-ecs-flows .. include:: /devflows/setup/ecs-flows.txt ================================================ FILE: devflows/setup/ecs-flows.txt ================================================ * :ref:`ecs-neuron-problem-detector-and-recovery` ================================================ FILE: devflows/setup/eks-flows.rst ================================================ EKS - Setup ===================== .. toctree:: :maxdepth: 1 :hidden: /containers/kubernetes-getting-started .. include:: /devflows/setup/eks-flows.txt ================================================ FILE: devflows/setup/eks-flows.txt ================================================ * :ref:`kubernetes-getting-started` ================================================ FILE: devflows/third-party-solutions.rst ================================================ .. _third-party-devflow-solutions: Third-party solutions ==================== AWS Neuron integrates with multiple third-party partner solutions that alow you to run deep learning workloads on Amazon EC2 instances powered by AWS Trainium and AWS Inferentia chips. The following list gives an overview of third-party solutions that work with AWS Neuron. .. contents:: Table of contents :local: :depth: 1 Ray """ Ray, by Anyscale, is the open source AI Compute Engine at the center of the world's most powerful AI Platforms. It precisely orchestrates infrastructure for any distributed AI workload like data processing, model training, and serving on any accelerator at any scale. Ray simplifies the complexity of distributed computing, improves efficiency, lower costs, and accelerates developer productivity. `Ray Train documentation `_ Domino """""" Domino is an open enterprise platform for data science, machine learning, and AI research. It works with an expansive list of industry leading tools and technologies to enrich data science research, development, and deployment processes. Domino works with a wide range of data sources, languages, IDEs, tools, libraries, and publication targets. `Domino documentation `_ ================================================ FILE: devflows/training/aws-batch-flows.rst ================================================ AWS Batch Flows- Training ========================= .. include:: /devflows/training/aws-batch-flows.txt ================================================ FILE: devflows/training/aws-batch-flows.txt ================================================ * :ref:`batch-training` ================================================ FILE: devflows/training/batch/batch-training.rst ================================================ .. _batch-training: Train your model on AWS Batch ============================= .. contents:: Table of Contents :local: :depth: 3 Description ------------ AWS Batch provides a scalable and cost-effective solution for running batch computing workloads in the AWS Cloud. Integrating Trainium with AWS Batch provides an efficient and cost-effective way of training deep learning models at scale. Once you configure your training job, AWS Batch effectively manages the orchestration, execution, and dynamic scaling of compute resources for your extensive machine learning workloads. To learn more about AWS Batch, see `the AWS Batch documentation `_. How does AWS Batch work with Trainium ------------------------------------- .. image:: /images/batch-setup.png As depicted in the illustration above, our workflow begins by building a ``Docker container image for Trainium`` and pushing it to Amazon Elastic Container Registry (ECR). Following this, we configure our AWS Batch environment with the required capabilities, and subsequently submit the training job. Please follow the below mentioned steps to run your training jobs on ``AWS Batch`` with ``Trainium``. #. **Before you begin, please ensure that you have the following prerequisites completed:** * ``AWS VPC`` with at least one ``Subnet`` and ``EFA Enabled Security Group`` (learn more about EFA-enabled security group `the AWS EFA User Guide `_). Please make sure subnet needs to be private, and the VPC needs to have a NAT gateway to allow internet connectivity for the private subnet. * ``AWS ECR`` repository * ``AWS CLI`` installed and configured with permissions for the above mentioned AWS resources * ``Docker`` * ``jq`` #. **Setup to start working with AWS Batch** Connect to your EC2 instance(``x86_64-based Linux instance``) and clone the ``aws-neuron-samples`` repo. Once done, navigate to aws batch scripts directory. .. code:: shell cd ~/ git clone https://github.com/aws-neuron/aws-neuron-samples.git cd ~/aws-neuron-samples/torch-neuronx/training/aws-batch/all-reduce #. **Configure resource requirements** Update the ``build_configs_and_setup.sh`` with your environment variables. Once done, execute the bash script using the command ``./build_configs_and_setup.sh``. #. **Build the required docker image and publish it to ECR** Run ``./build_docker_image.sh`` to build a Neuron Deep-Learning Container image using the latest Neuron packages and push this image to ECR. #. **Prepare the AWS infrastructure required to submit the batch job** Run ``./create_resources.sh`` to create all AWS Batch resources needed for your training workload. Below is the brief description of various AWS Batch components this script will create for you - * ``Placement Group`` enables you to influence the placement of your EC2 (Elastic Compute Cloud) instances within the AWS infrastructure. * ``Launch Template`` allows you to define a set of instance configuration parameters, including the Amazon Machine Image (AMI), instance type, key pair, security groups, and other settings, in a template format. * ``Compute Environment`` helps you to specify configuration that specifies the type of compute resources you want to use for your batch jobs. It includes details such as the EC2 instance types, the minimum and maximum number of instances, the VPC configuration, and other settings related to the compute environment. * ``Job Definition`` is a blueprint that specifies how a batch job should be run. It encapsulates information about the job, such as the Docker image to be used, the command to execute within the container, the CPU and memory requirements, job dependencies, and other settings. * ``Job Queue`` acts as a queueing mechanism for managing and scheduling the execution of batch computing workloads. By using job queues, AWS Batch provides a scalable and efficient way to process batch workloads, managing the allocation of resources and ensuring optimal use of compute capacity. #. **Submit the job to AWS-Batch** Run ``./submit_job.sh`` to submit a basic all-reduce job in the provisioned AWS Batch environment #. **Monitor the AWS-Batch job** You can use Amazon CloudWatch Logs to monitor, store, and view all your logs from AWS Batch job. To learn more about it, please see `the AWS docs on using Batch and EKS with CloudWatch `_. .. note:: * You could run a full model training job using this setup. For example, `this sample `_ runs the Llama2-7B tutorial on AWS Batch using the same setup. * You can further tailor your ``Dockerfile`` to include any additional dependencies specific to your needs. * You have the option to leverage ``trn1n.32xlarge`` instances as an alternative to ``trn1.32xlarge``. To make this transition, you only need to make adjustments to the ``launch template`` and ``job definition`` in order to accommodate the use of 16 EFA (Elastic Fabric Adapter) devices, whereas the current setup for ``trn1`` employs 8 EFA devices. Please check out `this document `_ to start with ``trn1n.32xlarge`` for multi-node execution. ================================================ FILE: devflows/training/dlc-then-ecs-devflow.rst ================================================ .. _training-dlc-then-ecs-devflow: Deploy Neuron Container on Elastic Container Service (ECS) for Training ======================================================================= .. contents:: Table of Contents :local: :depth: 2 Description ----------- |image| .. |image| image:: /images/dlc-on-ecs-dev-flow.png :width: 750 :alt: Neuron developer flow for DLC on ECS :align: middle You can use the Neuron version of the `AWS Deep Learning Containers `_ to run training on Amazon Elastic Container Service (ECS). In this developer flow, you set up an ECS cluster with trn1 instances, create a task description for your training container and deploy it to your cluster. This developer flow assumes: 1. The model has already been compiled through :ref:`Compilation with Framework API on EC2 instance ` or through :ref:`Compilation with Sagemaker Neo `. 2. You already set up your container to retrieve it from storage. .. _training-dlc-then-ecs-setenv: Setup Environment ----------------- 1. Set up an Amazon ECS cluster: Follow the instructions on `Setting up Amazon ECS for Deep Learning Containers `_ 2. Define a Training Task: Use the instruction on the `DLC Training on ECS Tutorial `_ to define a task and create a service for the appropriate framework. When creating tasks for trn1 instances on ECS, be aware of the considerations and requirements listed in `Working with training workloads on Amazon ECS `_. 3. Use the container image created using :ref:`how-to-build-neuron-container` as the ``image`` in your task definition. .. _training_push_to_ecr_note: .. note:: Before deploying your task definition to your ECS cluster, make sure to push the image to ECR. Refer to `Pushing a Docker image `_ for more information. ================================================ FILE: devflows/training/ec2/ec2-training.rst ================================================ .. _ec2-training: Train your model on EC2 ======================= .. contents:: Table of Contents :local: :depth: 3 Description ----------- |image| .. |image| image:: /images/trn1-on-ec2-dev-flow.png :width: 500 :alt: Neuron developer flow on EC2 :align: middle You can use a single Trn1 instance as a development environment to compile and train Neuron models. In this developer flow, you provision an EC2 Trn1 instance using a Deep Learming AMI (DLAMI) and execute the two steps of the development flow in the same instance. The DLAMI comes pre-packaged with the Neuron frameworks, compiler, and required runtimes to complete the flow. Development happens through Jupyter Notebooks or using a secure shell (ssh) connection in terminal. Follow the steps bellow to setup your environment. Setup Environment ----------------- 1. Launch an Trn1 Instance ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. include:: /setup/install-templates/launch-trn1-dlami.rst 2. Set up a development environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Enable PyTorch-Neuron ~~~~~~~~~~~~~~~~~~~~~ .. include:: /frameworks/torch/torch-neuronx/setup/install-templates/pytorch-dev-install.txt 3. Set up Jupyter notebook ^^^^^^^^^^^^^^^^^^^^^^^^^^ To develop from a Jupyter notebook see :ref:`setup-jupyter-notebook-steps-troubleshooting` You can also run a Jupyter notebook as a script, first enable the ML framework Conda or Python environment of your choice and see :ref:`running-jupyter-notebook-as-script` for instructions. ================================================ FILE: devflows/training/ec2-flows.rst ================================================ EC2 Flows- Training ==================== .. toctree:: :maxdepth: 1 :hidden: /devflows/training/ec2/ec2-training .. include:: /devflows/training/ec2-flows.txt ================================================ FILE: devflows/training/ec2-flows.txt ================================================ * :ref:`ec2-training` ================================================ FILE: devflows/training/parallelcluster/parallelcluster-training.rst ================================================ .. _parallelcluster-training: Train your model on ParallelCluster =================================== .. contents:: Table of Contents :local: :depth: 3 Description ------------ This document explains how to use AWS ParallelCluster to build HPC compute environment that uses Trn1 compute nodes to run your distributed ML training job. Once the nodes are launched, we will run a training task to confirm that the nodes are working, and use slurm commands to check the job status. In this tutorial, we will use AWS `pcluster` command to run a yaml file in order to generate the cluster. As an example, we are going to launch multiple Trn1.32xl nodes in our cluster. We are going to set up our ParallelCluster infrastructure as below: .. image:: /images/vpc-setup.png As shown in the figure above, inside a VPC, there are two subnets, a public and a private ones. Head Node resides in the public subnet, while the compute fleet (in this case, trn1 instances) are in the private subnet. A Network Address Translation (NAT) gateway is also needed in order for nodes in the private subnet to connect to clients outside the VPC. In the next section, we are going to describe how to set up all the necessary infrastructure for trn1 ParallelCluster. Setup environment ----------------- 1. Install prerequisite infrastructure: Follow `these setup `_ instructions to install VPC and all the necessary components for ParallelCluster. 2. Install AWS ParallelCluster in a virtual environment (recommended) Follow `https://docs.aws.amazon.com/parallelcluster/latest/ug/install-v3-virtual-environment.html` 3. Create and launch ParallelCluster Follow `these creating cluster `_ instructions to launch ParallelCluster in the VPC. 1. Launch training job Follow `these running training `_ instructions to submit a model training script as a slurm job. ================================================ FILE: devflows/training/parallelcluster-flows.rst ================================================ Parallel Cluster Flows- Training ================================ .. toctree:: :maxdepth: 1 :hidden: /devflows/training/parallelcluster/parallelcluster-training .. include:: /devflows/training/parallelcluster-flows.txt ================================================ FILE: devflows/training/parallelcluster-flows.txt ================================================ * :ref:`parallelcluster-training` ================================================ FILE: devflows/training/sagemaker-flows.rst ================================================ Sagemaker Flows- Training ========================= .. toctree:: :maxdepth: 1 :hidden: /devflows/training/sm-devflow/sm-training-devflow .. include:: /devflows/training/sagemaker-flows.txt ================================================ FILE: devflows/training/sagemaker-flows.txt ================================================ * :ref:`sm-training-devflow` * `AWS Neuron Sagemaker Samples GitHub Repository `_ ================================================ FILE: devflows/training/sm-devflow/sm-training-devflow.rst ================================================ .. _sm-training-devflow: Train your model on SageMaker =================================== .. contents:: Table of Contents :local: :depth: 3 Description ------------ SageMaker Training helps you manage cloud computing resources in Amazon EC2, data storage services such as S3, EFS, and FSx, and security management services such as IAM and VPC. SageMaker Training provides you a complete end-to-end experience of training classical ML and state-of-the-art DL models. You can use SageMaker to train models using Trn1 instances (ml.trn1 instance types). In this developer flow, you provision a SageMaker Notebook instance or SageMaker Studio to train your model using the `SageMaker Python SDK `_. The Amazon SageMaker Python SDK lets you launch training jobs in just a few lines of code with ease. As shown in the below diagram Amazon SageMaker launches Trn1 instances, copies both data and code onto the instance. It then runs the training script to generate model artifacts. The trained model artifacts are then uploaded to S3 and finally SageMaker will terminate the provisioned instances. In order to speed up the training process for successive runs you can copy the `Neuron Persistent Cache `_ to S3 and then copied by future training jobs as they will leverage the cached artifacts. (See `Hugging Face fine tuning BERT base model on Amazon SageMaker Tutorial `_ for an example on how to reuse the compiled cache.) .. image:: /images/trn1-on-sm-dev-flow.png Setup environment ----------------- 1. Create an Amazon SageMaker Notebook Instance Follow the instructions in `Get Started with Notebook Instances `_ or `Use Amazon SageMaker Studio Notebooks `_. The Notebook instance provides the required Python SDK for training models with Amazon SageMaker. Please make sure SageMaker Python SDK version is 2.116.0 or later. 2. Train a model using the Amazon SageMaker SDK Follow the instructions in `Distributed Training with PyTorch Neuron on Trn1 instances `_. You’ll be able to follow the `Hugging Face fine tuning BERT base model on Amazon SageMaker Tutorial `_. .. note:: SageMaker support for EC2 Trn1 instance is currently available only for PyTorch Estimator. HuggingFace Estimator will be available in future release. ================================================ FILE: dlami/index.rst ================================================ .. meta:: :description: Neuron Deep Learning AMIs (DLAMIs) are pre-configured Amazon Machine Images with the Neuron SDK for easy deployment on AWS Inferentia and Trainium instances. :keywords: Neuron DLAMI, Deep Learning AMI, AWS Neuron SDK, Inferentia, Trainium, PyTorch, JAX, TensorFlow, vLLM, SSM Parameters :date-modified: 01/22/2026 .. _neuron-dlami-overview: .. _setup-ubuntu22-multi-framework-dlami: .. _setup-ubuntu24-multi-framework-dlami: Neuron DLAMI User Guide ======================= This guide helps you select, configure, and deploy AWS Neuron Deep Learning AMIs (DLAMIs) for running machine learning workloads on AWS Inferentia and Trainium instances. Learn about the different DLAMI types available, pre-installed virtual environments for popular ML frameworks like PyTorch and JAX, and how to automate DLAMI deployment. .. contents:: Table of Contents :local: :depth: 2 What are Neuron DLAMIs? ------------------------ Neuron Deep Learning AMIs (DLAMIs) are pre-configured Amazon Machine Images that provide the easiest way to get started with the AWS Neuron SDK. Each DLAMI comes with Neuron drivers, frameworks, and libraries pre-installed, enabling you to quickly launch and run deep learning workloads on AWS Inferentia and Trainium instances without manual setup. Neuron currently supports three types of DLAMIs to meet different deployment needs: * **Multi-Framework DLAMIs**: Support multiple ML frameworks (PyTorch, JAX, vLLM) with separate virtual environments for each * **Single Framework DLAMIs**: Optimized for a specific framework version with focused virtual environments * **Base DLAMIs**: Include only Neuron drivers, EFA, and tools - ideal for containerized applications and custom builds All Neuron DLAMIs support automated discovery through AWS Systems Manager (SSM) parameters, making them easy to integrate into cloud automation workflows and infrastructure-as-code deployments. .. note:: Starting with version 2.26.1, Neuron DLAMIs no longer support ``Inf1`` instance types due to an incompatibility with the Neuron driver. If you'd like to run ``Inf1`` workloads, use previous DLAMIs released up to SDK version 2.26. ---- Neuron Multi Framework DLAMI ---------------------------- Neuron Multi-Framework DLAMIs provide the most comprehensive environment, supporting multiple ML frameworks and libraries in isolated virtual environments. Each DLAMI is pre-installed with Neuron drivers and supports all current Neuron instance types (Inf2, Trn1, Trn1n, Trn2, Trn3). This is the recommended option for teams working with multiple frameworks or exploring different ML libraries. .. note:: Starting with version 2.27.1, AL2023 DLAMIs no longer support ``PyTorch 2.9+`` due to an incompatibility issue with the default GLIB.c installed on AL2023. PyTorch requires GLIB.c 2.35+ and upgrading the version within AL2023 can break other system dependencies. This is the error message: ``ImportError: /lib64/libm.so.6: version `GLIBC_2.35' not found`` Since the latest vLLM version depends on PyTorch 2.9, we have also removed that environment from the DLAMI. For a workaround, use the latest Ubuntu-based AMIs instead. Multi Framework DLAMIs supported ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - Operating System - Neuron Instances Supported - DLAMI Name * - Ubuntu 24.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning AMI Neuron (Ubuntu 24.04) .. _neuron-dlami-multifw-venvs: Virtual Environments pre-installed ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - Neuron Framework/Libraries supported - Virtual Environment * - PyTorch 2.9 Torch NeuronX, NxD Core (Ubuntu 24.04) - /opt/aws_neuronx_venv_pytorch_2_9 * - PyTorch 2.9 NxD Training, Torch NeuronX (Ubuntu 24.04) - /opt/aws_neuronx_venv_pytorch_2_9_nxd_training * - PyTorch 2.9 NxD Inference, Torch NeuronX (Ubuntu 24.04) - /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference * - JAX 0.7 NeuronX (Ubuntu 24.04) - /opt/aws_neuronx_venv_jax_0_7 * - vLLM 0.16.0 NxD Inference, Torch NeuronX (Ubuntu 24.04) - /opt/aws_neuronx_venv_pytorch_inference_vllm_0_16 We have included a setup script that installs required dependencies for the package within the PyTorch 2.9 NxD Training virtual environment. To run this script, activate the virtual environment and run ``setup_nxdt.sh`` and this will run :ref:`the setup steps here `. You can easily get started with the multi-framework DLAMI through AWS console by following this :doc:`setup guide `. If you are looking to use the Neuron DLAMI in your cloud automation flows, Neuron also supports :ref:`SSM parameters ` to easily retrieve the latest DLAMI id. ---- Neuron Single Framework DLAMI ----------------------------- Neuron Single Framework DLAMIs are optimized for specific framework versions, providing a streamlined environment when you know exactly which framework you'll be using. Each DLAMI is pre-installed with Neuron drivers and supports all Neuron instance types. These DLAMIs are ideal for production deployments where you want a focused, framework-specific environment. Single Framework DLAMIs supported ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - Framework - Operating System - Neuron Instances Supported - DLAMI Name * - PyTorch 2.9 - Ubuntu 24.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) * - JAX 0.7 - Amazon Linux 2023 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning AMI Neuron JAX 0.7 (Amazon Linux 2023) * - JAX 0.7 - Ubuntu 24.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04) * - vLLM 0.16.0 - Ubuntu 24.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04) Virtual Environments pre-installed ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - DLAMI Name - Neuron Libraries supported - Virtual Environment * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) - PyTorch 2.9 Torch NeuronX, NxD Core - /opt/aws_neuronx_venv_pytorch_2_9 * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) - PyTorch 2.9 NxD Training, Torch NeuronX - /opt/aws_neuronx_venv_pytorch_2_9_nxd_training * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) - PyTorch 2.9 NxD Inference, Torch NeuronX - /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference * - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04, Amazon Linux 2023) - JAX NeuronX 0.7 - /opt/aws_neuronx_venv_jax_0_7 * - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04) - vLLM NeuronX 0.16.0 - /opt/aws_neuronx_venv_pytorch_inference_vllm_0_16 Get started with the single framework DLAMI through AWS console by following one of the corresponding setup guides. If you want to use the Neuron DLAMI in your cloud automation flows, Neuron also supports :ref:`SSM parameters ` to retrieve the latest DLAMI id. ---- Neuron Base DLAMI ----------------- Neuron Base DLAMIs provide a minimal foundation with only the essential components: Neuron driver, EFA (Elastic Fabric Adapter), and Neuron tools. These DLAMIs are designed for advanced users who want to build custom environments, create containerized applications, or have specific framework version requirements not covered by the pre-configured DLAMIs. Base DLAMIs supported ^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left :class: table-smaller-font-size * - Operating System - Neuron Instances Supported - DLAMI Name * - Amazon Linux 2023 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning Base Neuron AMI (Amazon Linux 2023) * - Ubuntu 24.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning Base Neuron AMI (Ubuntu 24.04) * - Ubuntu 22.04 - Inf2, Trn1, Trn1n, Trn2, Trn3 - Deep Learning Base Neuron AMI (Ubuntu 22.04) .. _ssm-parameter-neuron-dlami: ---- Using SSM Parameters for Cloud Automation ------------------------------------------ Neuron DLAMIs support AWS Systems Manager (SSM) parameters for automated DLAMI discovery and deployment. This enables you to always use the latest Neuron SDK release in your infrastructure-as-code templates, CI/CD pipelines, and auto-scaling configurations without hardcoding AMI IDs. SSM parameters provide several key benefits: * **Always up-to-date**: Automatically reference the latest DLAMI with the newest Neuron SDK release * **Infrastructure-as-code friendly**: Use in CloudFormation, Terraform, and other IaC tools * **Auto Scaling integration**: Update Auto Scaling groups without modifying launch templates * **Multi-region support**: Available across all AWS regions where Neuron instances are supported Currently, SSM parameters support finding the latest DLAMI ID for each DLAMI type. Support for finding specific Neuron SDK version DLAMIs will be added in future releases. Finding specific DLAMI image id with the latest neuron release ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can find the DLAMI that supports latest Neuron SDK by using the SSM get-parameter. .. code-block:: aws ssm get-parameter \ --region us-east-1 \ --name /latest/image_id \ --query "Parameter.Value" \ --output text The SSM parameter prefix for each currently supported DLAMI can be seen below. To discover SSM parameters for older or end-of-life DLAMIs, you can filter by framework, version, or operating system using the path structure ``/aws/service/neuron/dlami/-/``: .. code-block:: # List all Neuron DLAMI SSM parameters aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron --recursive # Filter by framework (e.g., all PyTorch 2.8 DLAMIs) aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron/dlami/pytorch-2.8 --recursive # Filter by framework and OS aws ssm get-parameters-by-path --region us-east-1 --path /aws/service/neuron/dlami/pytorch-2.8/ubuntu-22.04 --recursive SSM Parameter Prefix """""""""""""""""""" .. list-table:: :widths: 20 39 :header-rows: 1 :align: left :class: table-smaller-font-size * - AMI Name - SSM parameter Prefix * - Deep Learning AMI Neuron (Ubuntu 24.04) - /aws/service/neuron/dlami/multi-framework/ubuntu-24.04 * - Deep Learning AMI Neuron PyTorch 2.9 (Ubuntu 24.04) - /aws/service/neuron/dlami/pytorch-2.9/ubuntu-24.04 * - Deep Learning AMI Neuron JAX 0.7 (Amazon Linux 2023) - /aws/service/neuron/dlami/jax-0.7/amazon-linux-2023 * - Deep Learning AMI Neuron JAX 0.7 (Ubuntu 24.04) - /aws/service/neuron/dlami/jax-0.7/ubuntu-24.04 * - Deep Learning AMI Neuron PyTorch Inference vLLM 0.16 (Ubuntu 24.04) - /aws/service/neuron/dlami/pytorch-inference-vllm-0.16/ubuntu-24.04 * - Deep Learning Base Neuron AMI (Amazon Linux 2023) - /aws/service/neuron/dlami/base/amazon-linux-2023 * - Deep Learning Base Neuron AMI (Ubuntu 24.04) - /aws/service/neuron/dlami/base/ubuntu-24.04 * - Deep Learning Base Neuron AMI (Ubuntu 22.04) - /aws/service/neuron/dlami/base/ubuntu-22.04 For example to find the latest DLAMI id for Multi-Framework DLAMI (Ubuntu 24.04) you can use the following: .. code-block:: aws ssm get-parameter \ --region us-east-1 \ --name /aws/service/neuron/dlami/multi-framework/ubuntu-24.04/latest/image_id \ --query "Parameter.Value" \ --output text You can find all available parameters supported in Neuron DLAMis via CLI .. code-block:: aws ssm get-parameters-by-path \ --region us-east-1 \ --path /aws/service/neuron \ --recursive You can also view the SSM parameters supported in Neuron through AWS parameter store by selecting the "Neuron" service. Use SSM Parameter to launch instance directly via CLI """"""""""""""""""""""""""""""""""""""""""""""""""""" You can use the AWS CLI to resolve the latest DLAMI ID and launch an instance in a single command. This is particularly useful for scripting and automation workflows. Below is an example of launching an Inf2 instance using the TensorFlow 2.10 single-framework DLAMI: .. code-block:: aws ec2 run-instances \ --region us-east-1 \ --image-id resolve:ssm:/aws/service/neuron/dlami/tensorflow-2.10/ubuntu-22.04/latest/image_id \ --count 1 \ --instance-type inf2.48xlarge \ --key-name \ --security-groups Use SSM alias in EC2 launch templates """"""""""""""""""""""""""""""""""""" SSM Parameters can be used directly in EC2 launch templates, enabling your Auto Scaling groups to automatically use the latest AMI IDs without requiring updates to launch templates or creating new versions each time an AMI ID changes. This significantly simplifies AMI lifecycle management in production environments. For more information, see: https://docs.aws.amazon.com/autoscaling/ec2/userguide/using-systems-manager-parameters.html ---- Other Resources --------------- Learn more about AWS Deep Learning AMIs and Systems Manager: * `AWS Deep Learning AMI Developer Guide `_ * `AWS DLAMI Release Notes `_ * `AWS Systems Manager Parameter Store `_ * :doc:`Neuron DLAMI Release Notes ` ================================================ FILE: frameworks/index.rst ================================================ .. meta:: :description: ML Framework support on AWS Neuron SDK - PyTorch and JAX integration for high-performance machine learning on AWS Inferentia and Trainium. :date-modified: 2026-03-12 :keywords: AWS Neuron, machine learning .. _frameworks-neuron-sdk: ML framework support on AWS Neuron SDK ======================================= AWS Neuron provides integration with popular machine learning frameworks, enabling you to accelerate your existing models on AWS Inferentia and Trainium with minimal code changes. Choose from our comprehensive framework support to optimize your inference and training workloads. Frameworks ----------- .. grid:: 2 :gutter: 2 .. grid-item-card:: PyTorch on AWS Neuron :link: torch/index :link-type: doc :class-header: sd-bg-primary sd-text-white Complete PyTorch integration for both inference and training on all Neuron hardware. * **TorchNeuron Native** - Native PyTorch backend with eager execution and ``torch.compile`` * **PyTorch NeuronX (torch-neuronx)** - ``Inf2``, ``Trn1``, ``Trn2`` (inference & training) * See: :doc:`/frameworks/torch/pytorch-native-overview` .. grid-item-card:: JAX on AWS Neuron :link: jax/index :link-type: doc :class-header: sd-bg-primary sd-text-white **Beta release** Experimental JAX support with Neuron Kernel Interface (NKI) integration. * **JAX NeuronX** - Neuron hardware support * Research and development focus * **Status**: Beta - active .. note:: Looking for TensorFlow, MXNet, or torch-neuron (Inf1) documentation? These frameworks have been archived. See :doc:`/archive/index` for legacy framework documentation. Hardware compatibility matrix ----------------------------- .. list-table:: :header-rows: 1 :class: compatibility-matrix * - Framework - Inf2 - Trn1/Trn1n - Trn2 - Inference - Training * - **torch-neuronx** - ✅ - ✅ - ✅ - ✅ - ✅ * - **JAX NeuronX** - ✅ - ✅ - N/A - ✅ - N/A ================================================ FILE: frameworks/jax/api-reference-guide/index.rst ================================================ .. _jax-neuronx-api-reference-guide: .. meta:: :description: API Reference Guide for JAX Neuronx - AWS Neuron SDK documentation :keywords: API reference, AWS Neuron, JAX, JAX NeuronX :date-modified: 2026-03-13 API Reference Guide for JAX Neuronx ==================================================== .. toctree:: :maxdepth: 1 :hidden: /frameworks/jax/api-reference-guide/neuron-envvars * :ref:`jax-neuronx-envvars` ================================================ FILE: frameworks/jax/api-reference-guide/neuron-envvars.rst ================================================ .. _jax-neuronx-envvars: .. meta:: :description: JAX NeuronX Environment Variables - AWS Neuron SDK documentation :keywords: API reference, AWS Neuron, JAX, JAX NeuronX :date-modified: 2026-03-13 JAX NeuronX Environment Variables ====================================== Environment variables allow modifications to JAX NeuronX behavior without requiring code change to user script. It is recommended to set them in code or just before invoking the python process, such as ``NEURON_RT_VISIBLE_CORES=8 python3 Get Started with PyTorch Neuron ("torch-neuron") on Ubuntu 20 ============================================================== ================================================ FILE: setup/torch-neuron.rst ================================================ .. _setup-torch-neuron: PyTorch Neuron (``torch-neuron``) Setup ======================================= .. warning:: ``torch-neuron`` is for Inf1 instances only (legacy NeuronCore v1). For new projects, use Inf2, Trn1, Trn2, or Trn3 with ``torch-neuronx``. See :doc:`/setup/pytorch/index` for current setup. For Inf1 setup instructions, see :doc:`/setup/legacy-inf1/pytorch`. ================================================ FILE: setup/torch-neuronx.rst ================================================ .. _setup-torch-neuronx: .. meta:: :description: Install PyTorch NeuronX (torch-neuronx) on AWS Trainium and Inferentia instances using DLAMI, DLC, or manual pip installation :keywords: pytorch, neuron, torch-neuronx, installation, setup, trainium, inferentia, trn1, trn2, trn3, inf2, DLAMI, pip :date-modified: 2026-03-30 PyTorch Neuron (``torch-neuronx``) Setup ======================================== Install PyTorch with Neuron support for training and inference on Inf2, Trn1, Trn2, and Trn3 instances. Choose from a pre-configured DLAMI, a Docker container, or a manual pip installation. For the full setup guide with all options, see :doc:`Install PyTorch for Neuron `. .. grid:: 1 :gutter: 3 .. grid-item-card:: 🚀 DLAMI Installation :link: /setup/pytorch/dlami :link-type: doc :class-card: sd-border-2 Pre-configured environment with all dependencies. Recommended for most users. .. grid-item-card:: 🚀 Multi-Framework DLAMI :link: /setup/multiframework-dlami :link-type: doc :class-card: sd-border-2 Pre-configured AMI with PyTorch, JAX, and vLLM virtual environments ready to use. .. grid-item-card:: � Deep Learning Container :link: /setup/pytorch/dlc :link-type: doc :class-card: sd-border-2 Pre-configured Docker images from AWS ECR for containerized deployments. .. grid-item-card:: 🔧 Manual Installation :link: /setup/pytorch/manual :link-type: doc :class-card: sd-border-2 Install on bare OS AMIs or existing systems with full control over dependencies. .. grid-item-card:: Rocky Linux 9 :link: setup-rocky-linux-9 :link-type: ref :class-card: sd-border-2 Install on Rocky Linux 9 using the Rocky-9-EC2-Base AMI. ================================================ FILE: setup/troubleshooting.rst ================================================ .. meta:: :description: Troubleshooting guide for AWS Neuron SDK installation issues :keywords: neuron, troubleshooting, installation, errors, debugging :content-type: troubleshooting :date-modified: 2026-03-03 Installation Troubleshooting ============================= Common issues and solutions for Neuron SDK installation. Module Import Errors -------------------- ModuleNotFoundError: No module named 'torch_neuronx' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: Python cannot find torch_neuronx module after installation. **Causes**: - Virtual environment not activated - Wrong Python version - Installation failed silently - Multiple Python installations **Solutions**: 1. **Verify virtual environment**: .. code-block:: bash which python # Should show virtual environment path, not system Python 2. **Check Python version**: .. code-block:: bash python --version # Should be 3.10, 3.11, or 3.12 3. **Reinstall torch-neuronx**: .. code-block:: bash pip install --force-reinstall torch-neuronx --extra-index-url=https://pip.repos.neuron.amazonaws.com 4. **Verify installation**: .. code-block:: bash pip list | grep neuron ImportError: cannot import name 'neuron' from 'torch' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: Import error when trying to use Neuron features. **Cause**: Using PyTorch/XLA syntax with Native PyTorch backend. **Solution**: Update code to use Native PyTorch syntax: .. code-block:: python # Old (PyTorch/XLA) import torch_xla.core.xla_model as xm device = xm.xla_device() # New (Native PyTorch) import torch device = torch.device('neuron') See :doc:`/frameworks/torch/index` for complete migration guide. Device and Runtime Errors -------------------------- No Neuron devices found ~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: ``neuron-ls`` shows no devices or returns error. **Causes**: - Wrong instance type - Neuron driver not loaded - Runtime not started **Solutions**: 1. **Verify instance type**: .. code-block:: bash curl http://169.254.169.254/latest/meta-data/instance-type # Should show inf2.*, trn1.*, trn2.*, trn3.*, or inf1.* 2. **Check Neuron driver**: .. code-block:: bash lsmod | grep neuron # Should show neuron driver loaded 3. **Install/reload driver**: .. code-block:: bash # Ubuntu/Debian sudo apt-get install -y aws-neuronx-dkms # Amazon Linux sudo yum install -y aws-neuronx-dkms 4. **Restart runtime**: .. code-block:: bash sudo systemctl restart neuron-monitor neuron-ls RuntimeError: Neuron runtime initialization failed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: Runtime fails to initialize when running models. **Causes**: - Insufficient permissions - Runtime version mismatch - Corrupted runtime state **Solutions**: 1. **Check runtime status**: .. code-block:: bash sudo systemctl status neuron-monitor 2. **Verify permissions**: .. code-block:: bash ls -l /dev/neuron* # Should be accessible by current user 3. **Reinstall runtime**: .. code-block:: bash sudo apt-get install --reinstall aws-neuronx-runtime-lib Version Compatibility Issues ----------------------------- Compiler version mismatch ~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: Error about incompatible compiler version. **Cause**: neuronx-cc version incompatible with framework version. **Solution**: Install compatible versions: .. code-block:: bash # For PyTorch 2.9 pip install neuronx-cc==2.15.* --extra-index-url=https://pip.repos.neuron.amazonaws.com See :doc:`/release-notes/index` for version compatibility matrix. Package dependency conflicts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: pip reports conflicting dependencies. **Solution**: Use fresh virtual environment: .. code-block:: bash python3 -m venv ~/fresh_neuron_venv source ~/fresh_neuron_venv/bin/activate pip install -U pip # Install packages in correct order pip install torch==2.9.0 pip install torch-neuronx neuronx-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com Network and Repository Issues ------------------------------ Cannot connect to Neuron repository ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: apt-get or pip cannot reach Neuron repositories. **Solutions**: 1. **Verify network connectivity**: .. code-block:: bash curl -I https://apt.repos.neuron.amazonaws.com curl -I https://pip.repos.neuron.amazonaws.com 2. **Check proxy settings** (if behind corporate proxy): .. code-block:: bash export https_proxy=http://proxy.example.com:8080 export http_proxy=http://proxy.example.com:8080 3. **Use alternative index URL**: .. code-block:: bash pip install torch-neuronx --index-url=https://pip.repos.neuron.amazonaws.com GPG key expired ~~~~~~~~~~~~~~~ **Symptoms**: "EXPKEYSIG" error during apt-get update. **Solution**: .. code-block:: bash wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add - sudo apt-get update -y Getting Help ------------ If issues persist: 1. **Check release notes**: :doc:`/release-notes/index` 2. **Review documentation**: :doc:`/frameworks/torch/index` 3. **GitHub Issues**: `aws-neuron-sdk/aws-neuron-sdk `_ 4. **AWS Support**: Open support case if you have AWS Support plan Diagnostic Information ---------------------- When reporting issues, include: .. code-block:: bash # System information uname -a cat /etc/os-release # Instance type curl http://169.254.169.254/latest/meta-data/instance-type # Neuron devices neuron-ls # Package versions pip list | grep -E "(torch|neuron)" # Driver status lsmod | grep neuron sudo systemctl status neuron-monitor ================================================ FILE: src/benchmark/helper_scripts/llmperf_dp.patch ================================================ diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py index f2e0a91..74c4027 100644 --- a/src/llmperf/ray_clients/openai_chat_completions_client.py +++ b/src/llmperf/ray_clients/openai_chat_completions_client.py @@ -1,5 +1,6 @@ import json import os +import random import time from typing import Any, Dict @@ -14,6 +15,9 @@ from llmperf import common_metrics @ray.remote class OpenAIChatCompletionsClient(LLMClient): """Client for OpenAI Chat Completions API.""" + def __init__(self): + self.addr_id = 0 + self.addr_select_strategy = 'round-robin' def llm_request(self, request_config: RequestConfig) -> Dict[str, Any]: prompt = request_config.prompt @@ -50,6 +54,13 @@ class OpenAIChatCompletionsClient(LLMClient): address = os.environ.get("OPENAI_API_BASE") if not address: raise ValueError("the environment variable OPENAI_API_BASE must be set.") + # if several addresses of model server exist, select one for each request (1) randomly or (2) round-robin + address_list = address.split(";") + if self.addr_select_strategy == 'round-robin': + address = address_list[self.addr_id] + self.addr_id = (self.addr_id + 1) % len(address_list) + else: + address = random.choice(address_list) key = os.environ.get("OPENAI_API_KEY") if not key: raise ValueError("the environment variable OPENAI_API_KEY must be set.") ================================================ FILE: src/benchmark/helper_scripts/llmperf_reasoning.patch ================================================ diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py index aeb5fbf..f1b4473 100644 --- a/src/llmperf/ray_clients/openai_chat_completions_client.py +++ b/src/llmperf/ray_clients/openai_chat_completions_client.py @@ -100,7 +100,7 @@ class OpenAIChatCompletionsClient(LLMClient): raise RuntimeError(data["error"]["message"]) delta = data["choices"][0]["delta"] - if delta.get("content", None): + if delta.get("content", None) or delta.get("reasoning_content", None): if not ttft: ttft = time.monotonic() - start_time # time_to_next_token.append(ttft) @@ -109,7 +109,11 @@ class OpenAIChatCompletionsClient(LLMClient): time.monotonic() - most_recent_received_token_time ) most_recent_received_token_time = time.monotonic() - generated_text += delta["content"] + if "reasoning_content" in delta and delta["reasoning_content"]: + chunk_content = delta["reasoning_content"] + else: + chunk_content = delta["content"] + generated_text += chunk_content total_request_time = time.monotonic() - start_time output_throughput = tokens_received / total_request_time ================================================ FILE: src/benchmark/helper_scripts/neuron_perf.patch ================================================ diff --git a/src/llmperf/ray_clients/openai_chat_completions_client.py b/src/llmperf/ray_clients/openai_chat_completions_client.py index f2e0a91..644d5a6 100644 --- a/src/llmperf/ray_clients/openai_chat_completions_client.py +++ b/src/llmperf/ray_clients/openai_chat_completions_client.py @@ -92,7 +92,7 @@ class OpenAIChatCompletionsClient(LLMClient): if delta.get("content", None): if not ttft: ttft = time.monotonic() - start_time - time_to_next_token.append(ttft) + # time_to_next_token.append(ttft) else: time_to_next_token.append( time.monotonic() - most_recent_received_token_time diff --git a/token_benchmark_ray.py b/token_benchmark_ray.py index 63216b1..11e0116 100644 --- a/token_benchmark_ray.py +++ b/token_benchmark_ray.py @@ -32,6 +32,7 @@ def get_token_throughput_latencies( stddev_input_tokens: int, mean_output_tokens: int, stddev_output_tokens: int, + tokenizer: str, additional_sampling_params: Optional[Dict[str, Any]] = None, num_concurrent_requests: int = 1, max_num_completed_requests: int = 500, @@ -60,10 +61,8 @@ def get_token_throughput_latencies( """ random.seed(11111) - tokenizer = LlamaTokenizerFast.from_pretrained( - "hf-internal-testing/llama-tokenizer" - ) - get_token_length = lambda text: len(tokenizer.encode(text)) + hf_tokenizer = LlamaTokenizerFast.from_pretrained(tokenizer) + get_token_length = lambda text: len(hf_tokenizer.encode(text)) if not additional_sampling_params: additional_sampling_params = {} @@ -84,7 +83,7 @@ def get_token_throughput_latencies( prompt_tokens_mean=mean_input_tokens, prompt_tokens_stddev=stddev_input_tokens, expect_output_tokens=num_output_tokens, - tokenizer=tokenizer + tokenizer=hf_tokenizer )) start_time = time.monotonic() pbar = tqdm(total=max_num_completed_requests) @@ -118,7 +117,7 @@ def get_token_throughput_latencies( with completed_requests_lock: if num_completed_requests < max_num_completed_requests: if num_output_tokens: - request_metrics[common_metrics.INTER_TOKEN_LAT] /= request_metrics[common_metrics.NUM_OUTPUT_TOKENS] + request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens - 1 else: request_metrics[common_metrics.INTER_TOKEN_LAT] = 0 request_metrics[common_metrics.NUM_OUTPUT_TOKENS] = num_output_tokens @@ -155,7 +154,7 @@ def get_token_throughput_latencies( with completed_requests_lock: if num_completed_requests < max_num_completed_requests: if num_output_tokens: - request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens + request_metrics[common_metrics.INTER_TOKEN_LAT] /= num_output_tokens - 1 else: request_metrics[common_metrics.INTER_TOKEN_LAT] = 0 request_metrics[common_metrics.NUM_OUTPUT_TOKENS] = num_output_tokens @@ -292,6 +291,7 @@ def run_token_benchmark( additional_sampling_params: str, results_dir: str, user_metadata: Dict[str, Any], + tokenizer: str, ): """ Args: @@ -327,6 +327,7 @@ def run_token_benchmark( stddev_output_tokens=stddev_output_tokens, num_concurrent_requests=num_concurrent_requests, additional_sampling_params=json.loads(additional_sampling_params), + tokenizer=tokenizer, ) if results_dir: @@ -462,6 +463,11 @@ args.add_argument( "name=foo,bar=1. These will be added to the metadata field of the results. " ), ) +args.add_argument( + "--tokenizer", + type=str, + default="hf-internal-testing/llama-tokenizer", +) if __name__ == "__main__": env_vars = dict(os.environ) @@ -488,4 +494,5 @@ if __name__ == "__main__": additional_sampling_params=args.additional_sampling_params, results_dir=args.results_dir, user_metadata=user_metadata, + tokenizer=args.tokenizer, ) ================================================ FILE: src/benchmark/tensorflow/distilbert-base-uncased-finetuned-sst-2-english_benchmark.py ================================================ # Add to these lists or change as needed model_names = ["distilbert-base-uncased-finetuned-sst-2-english"] sequence_lengths = [128] batch_sizes = [128] pipeline_sizes = [1] # Silence an irrelevant warning from transformers library import os os.environ["TOKENIZERS_PARALLELISM"] = "false" import numpy as np import neuronperf as npf import neuronperf.tensorflow from transformers import AutoTokenizer, TFAutoModelForSequenceClassification def get_batch(tokenizer, sequence_length, batch_size): sequence = "I am sorry. I really want to like it, but I just can not stand sushi." paraphrase = tokenizer.encode_plus( sequence, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="np", ) inputs = { "input_ids": np.concatenate([paraphrase["input_ids"]] * batch_size, axis=0), "attention_mask": np.concatenate([paraphrase["attention_mask"]] * batch_size, axis=0), } return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Benchmark print("Benchmarking {}".format(filename)) reports = npf.tensorflow.benchmark(filename, inputs) # View and save results print("======== {} ========".format(filename)) npf.print_reports(reports) npf.write_csv(reports) npf.write_json(reports) ================================================ FILE: src/benchmark/tensorflow/distilbert-base-uncased-finetuned-sst-2-english_compile.py ================================================ # Add to these lists or change as needed model_names = ["distilbert-base-uncased-finetuned-sst-2-english"] sequence_lengths = [128] batch_sizes = [128] pipeline_sizes = [1] # Silence an irrelevant warning from transformers library import os os.environ["TOKENIZERS_PARALLELISM"] = "false" import numpy as np import neuronperf as npf import neuronperf.tensorflow from transformers import AutoTokenizer, TFAutoModelForSequenceClassification def get_batch(tokenizer, sequence_length, batch_size): sequence = "I am sorry. I really want to like it, but I just can not stand sushi." paraphrase = tokenizer.encode_plus( sequence, max_length=sequence_length, padding="max_length", truncation=True, return_tensors="np", ) inputs = { "input_ids": np.concatenate([paraphrase["input_ids"]] * batch_size, axis=0), "attention_mask": np.concatenate([paraphrase["attention_mask"]] * batch_size, axis=0), } return inputs if __name__ == "__main__": for model_name in model_names: tokenizer = AutoTokenizer.from_pretrained(model_name) model = TFAutoModelForSequenceClassification.from_pretrained(model_name, return_dict=False) for sequence_length in sequence_lengths: inputs = [ get_batch(tokenizer, sequence_length, batch_size) for batch_size in batch_sizes ] filename = f"{model_name}_sl{sequence_length}.json" # Compile print("Compiling {}".format(filename)) npf.tensorflow.compile( model, inputs, batch_sizes=batch_sizes, pipeline_sizes=pipeline_sizes, filename=filename, model_name=model_name, ) ================================================ FILE: src/examples/mxnet/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/mxnet/data_parallel/benchmark_utils.py ================================================ import math from collections import Counter import numpy as np class Results(): def __init__(self, batch_size, num_cores=1): self.latency_array = [] self.end_times = [] self.start_times = [] self.batch_size = batch_size self.num_cores = num_cores def add_result(self, latency_array, end_times, start_times): self.latency_array.extend(latency_array) self.end_times.extend(end_times) self.start_times.extend(start_times) def report(self, f, window_size=1): assert(len(self.latency_array) != 0) p50_latency = np.percentile(self.latency_array, 50) p90_latency = np.percentile(self.latency_array, 90) p95_latency = np.percentile(self.latency_array, 95) p99_latency = np.percentile(self.latency_array, 99) p100_latency = np.percentile(self.latency_array, 100) def get_bucket(start, end): bucketed_start = math.floor(start / window_size) * window_size bucketed_end = math.ceil(end / window_size) * window_size # The check is to make sure that we ignore timestamps that are larger than the window size if bucketed_end - bucketed_start == window_size: return bucketed_start else: return None # Divide the timestamps into different buckets bucketed_timestamps = [get_bucket(start, end) for start, end in zip(self.start_times, self.end_times)] # Count the values in each bucket counted_buckets = Counter( item for item in bucketed_timestamps if item is not None) # Normalize each bucket bucket_throughputs = [(key, value / window_size) for key, value in sorted(counted_buckets.items())] busy_throughputs = [value for _, value in bucket_throughputs] max_throughput = max(busy_throughputs) * self.batch_size avg_throughput = sum(busy_throughputs) * self.batch_size / len(busy_throughputs) f.write("\n") f.write( "Maximum throughput = {} sentences/sec\n".format(int(max_throughput))) f.write("Average throughput = {} sentences/sec\n".format(int(avg_throughput))) f.write("\n") f.write("Latency Percentiles:\n") f.write("===\n") f.write("P50 = {} milliseconds\n".format(int(1000*p50_latency))) f.write("P90 = {} milliseconds\n".format(int(1000*p90_latency))) f.write("P95 = {} milliseconds\n".format(int(1000*p95_latency))) f.write("P99 = {} milliseconds\n".format(int(1000*p99_latency))) f.write("P100 = {} milliseconds\n".format(int(1000*p100_latency))) f.write("\n") f.write("Sanity test:\n") f.write("===\n") f.write("Processed - num batches {}\n".format(len(self.latency_array))) f.write(" - batch size {}\n".format(self.batch_size)) f.write(" - num cores {}\n".format(self.num_cores)) ================================================ FILE: src/examples/mxnet/data_parallel/data_parallel_tutorial.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Data Parallel Mode with Gluon MXNet\n", "\n", "In this tutorial, you will compile a Gluon BERT model and run in data-parallel mode to completely utilize the NeuronCores. Here you will benchmark a multi-worker setup and compare it with a single worker.\n", "\n", "This tutorial is intended only for MXNet-1.8.\n", "\n", "In this tutorial, we will be using an inf1.2xlarge with the latest AWS Deep Learning AMI (DLAMI). The inf1.2xlarge instance has 1 AWS Inferentia Chip with 4 NeuronCores.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up your environment\n", "\n", "To run this tutorial, please make sure you deactivate any existing MXNet conda environments you already using. Install MXNet 1.8 by following the instructions at [MXNet Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/mxnet-setup/mxnet-install.html#develop-on-aws-ml-accelerator-instance). You would also need to change your kernel to use the correct Python environment setup earlier by clicking Kerenel->Change Kernel->Python (Neuron MXNet)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install dependencies\n", "\n", "We have to install gluon-nlp to get the BERT model. Run the following command to install:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pip install gluonnlp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compiling BERT Model\n", "\n", "Next, we compile the Gluon BERT model and save it. Once the model is compiled, we use the same model across the entire tutorial.\n", "In this tutorial, we will be using a BERT model with sequence length 32" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import mxnet as mx\n", "import mx_neuron\n", "import gluonnlp as nlp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "BERT_MODEL = 'bert_12_768_12'\n", "BERT_DATA = 'book_corpus_wiki_en_uncased'\n", "batch_size = 1\n", "seq_len = 32\n", "num_cores = 1\n", "dtype = 'float32'\n", "\n", "compiled_model_path = '{}.compiled.{}.{}'.format(BERT_MODEL, batch_size, seq_len)\n", "\n", "model, vocab = nlp.model.get_model(BERT_MODEL,\n", " dataset_name=BERT_DATA,\n", " use_classifier=False,\n", " use_decoder=False, ctx=mx.cpu())\n", " \n", "# Create sample inputs for compilation\n", "words = mx.nd.ones([batch_size, seq_len], name='words', dtype=dtype)\n", "valid_len = mx.nd.ones([batch_size,], name='valid_len', dtype=dtype)\n", "segments = mx.nd.ones([batch_size, seq_len], name='segments', dtype=dtype)\n", "inputs = {'data0': words, 'data1': segments, 'data2': valid_len}\n", "\n", "# Compiler Args ~~ \n", "options = {}\n", "embeddingNames = ['bertmodel0_word_embed_embedding0_fwd', 'bertmodel0_token_type_embed_embedding0_fwd', 'bertencoder0_embedding0']\n", "options.update({'force_incl_node_names': embeddingNames})\n", "options.update({'flags': ['--fp32-cast matmult']}) \n", "\n", "# Compile and save ~~ \n", "model = mx_neuron.compile(model, inputs=inputs, **options)\n", "model.export(compiled_model_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Parallel Mode\n", "\n", "Data Parallel Mode is a setup in which you launch multiple copies of the same model, such that each model is running independently of the other. In other words, each model has its own resources to run inference. \n", "\n", "On an inf1.2xlarge instance, we have 4 NeuronCores. Hence, we can launch 4 models such that each model is loaded on a single NeuronCore. This unables us to process 4 request concurrently without linear increase in latency. As a result, the throughput of the system increases when compared to a single model inference. This would also allow us to utilize all the 4 NeuronCores on the instance.\n", "\n", "Run through the next set of cells to see the difference in throughput as we scale from one model to 4 models running in parallel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def get_sample_inputs(batch_size, seq_len):\n", " words = np.ones([batch_size, seq_len], dtype=np.float32)\n", " valid_len = np.ones([batch_size,], dtype=np.float32)\n", " segments = np.ones([batch_size, seq_len], dtype=np.float32)\n", " inputs = {'data0': words, 'data1': segments, 'data2': valid_len}\n", " return inputs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next for comparison purposes, we run the setup with 1 worker. To do this, we set the num_cores=1. This would launch only 1 model running on a single NeuronCore. After running the below cell, note down the latency and throughput for the system" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from parallel import NeuronSimpleDataParallel\n", "from benchmark_utils import Results\n", "import time\n", "import functools\n", "import os\n", "import numpy as np\n", "import warnings\n", "\n", "num_cores = 1\n", "batch_size=1\n", "\n", "# Each worker process should use one core, hence we set\n", "# os.environ['NEURON_RT_NUM_CORES'] = \"1\"\n", "os.environ[\"NEURON_RT_NUM_CORES\"] = \"1\"\n", "\n", "#Result aggregation class (code in bert_benchmark_utils.py)\n", "results = Results(batch_size, num_cores)\n", "def result_handler(output, start, end):\n", " elapsed = end - start\n", " results.add_result([elapsed], [end], [start])\n", "\n", "inputs = get_sample_inputs(batch_size, seq_len)\n", "parallel_neuron_model = NeuronSimpleDataParallel(compiled_model_path, num_cores, inputs)\n", "\n", "#Starting the inference threads\n", "parallel_neuron_model.start_continuous_inference()\n", "\n", "# Warm up the cores\n", "for _ in range(num_cores*4):\n", " parallel_neuron_model.warmup(inputs)\n", " \n", "# Need to run for high number of iterations to benchmark the models\n", "for _ in range(1000):\n", " parallel_neuron_model.infer(inputs)\n", " # Passing the result_handler as a callback function\n", " parallel_neuron_model.add_result(result_handler)\n", "\n", "# Stop inference \n", "parallel_neuron_model.stop()\n", "# Since we are using a multi-process execution with a shared queue, some inferences\n", "# may still be in execution phase. Hence we need to wait till all the inputs are processed\n", "# add_all_results() will collect all the results of requests which are in this state\n", "parallel_neuron_model.add_all_results(result_handler)\n", "\n", "\n", "with open(\"benchmark.txt\", \"w\") as f:\n", " results.report(f, window_size=1)\n", "\n", "with open(\"benchmark.txt\", \"r\") as f:\n", " for line in f:\n", " print(line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we run the setup with 4 workers. To do this, we set the num_cores=4. This would launch 4 model running each running on individual NeuronCore. All the 4 models are running in individual processes, in other words the models are running in parallel. \n", "\n", "To feed the models efficiently, we use the producer-consumer setup, in which all processes running a model act as consumers. All consumers are fed using a sharing input queue.\n", "\n", "Now we run the below setup. You may notice, that the throughput increase by >2x when compared to a single worker setup." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from parallel import NeuronSimpleDataParallel\n", "from benchmark_utils import Results\n", "import time\n", "import functools\n", "import os\n", "import numpy as np\n", "\n", "num_cores = 4\n", "batch_size=1\n", "\n", "os.environ[\"NEURON_RT_NUM_CORES\"] = \"1\"\n", "\n", "#Result aggregation class (code in bert_benchmark_utils.py)\n", "results = Results(batch_size, num_cores)\n", "def result_handler(output, start, end):\n", " elapsed = end - start\n", " results.add_result([elapsed], [end], [start])\n", "\n", "inputs = get_sample_inputs(batch_size, seq_len)\n", "parallel_neuron_model = NeuronSimpleDataParallel(compiled_model_path, num_cores, inputs)\n", "\n", "#Starting the inference threads\n", "parallel_neuron_model.start_continuous_inference()\n", "\n", "# Warm up the cores\n", "for _ in range(num_cores*4):\n", " parallel_neuron_model.warmup(inputs)\n", " \n", "# Need to run for high number of iterations to benchmark the models\n", "for _ in range(5000):\n", " parallel_neuron_model.infer(inputs)\n", " # Passing the result_handler as a callback function\n", " parallel_neuron_model.add_result(result_handler)\n", "\n", "# Stop inference \n", "parallel_neuron_model.stop()\n", "# Since we are using a multi-process execution with a shared queue, some inferences\n", "# may still be in execution phase. Hence we need to wait till all the inputs are processed\n", "# add_all_results() will collect all the results of requests which are in this state\n", "parallel_neuron_model.add_all_results(result_handler)\n", "\n", "\n", "with open(\"benchmark.txt\", \"w\") as f:\n", " results.report(f, window_size=1)\n", "\n", "with open(\"benchmark.txt\", \"r\") as f:\n", " for line in f:\n", " print(line)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/mxnet/data_parallel/parallel.py ================================================ import mxnet as mx import mx_neuron import os from time import time from queue import Queue from multiprocessing import Process, Manager def consumer(model_file, sample_input, input_queue, result_queue): sym, args, aux = mx.model.load_checkpoint(model_file, 0) sample_input = {key: mx.nd.array(v) for key, v in sample_input.items()} args.update(sample_input) model = sym.bind(mx.cpu(), args=args, aux_states=aux, grad_req="null") while True: inputs, input_id = input_queue.get() input_queue.task_done() # Stop execution if stopping condition is recieved if inputs == "stop": break inputs = {key: mx.nd.array(v) for key, v in inputs.items()} start = time() results = model.forward(**inputs) results[0].wait_to_read() # Make the output iterable - if it is not already a tuple or list if not isinstance(results, tuple) or isinstance(results, list): results = [results] end = time() if input_id != -1: result_queue.put((results, start, end, input_id)) class NeuronSimpleDataParallel: def __init__(self, model_file, num_neuron_cores, sample_input): self.num_neuron_cores = num_neuron_cores self.sample_input = sample_input self.model_path = model_file # Create shared input queue and output queue manager = Manager() self.input_queue = manager.Queue(maxsize=num_neuron_cores * 16) self.result_queue = manager.Queue(maxsize=num_neuron_cores * 16) self.processes = [ Process( target=consumer, args=( self.model_path, self.sample_input, self.input_queue, self.result_queue, ), ) for _ in range(num_neuron_cores) ] self.input_id = 0 self.input_dict = set() def start_continuous_inference(self): for p in self.processes: p.start() def warmup(self, batch): self.input_queue.put((batch, -1)) def infer(self, batch): self.input_id += 1 self.input_dict.add(self.input_id) self.input_queue.put((batch, self.input_id)) def stop(self): for _ in range(self.num_neuron_cores): self.input_queue.put(("stop", -1)) def add_result(self, callback_fn): if not self.result_queue.empty(): result, start, end, input_id = self.result_queue.get() self.input_dict.remove(input_id) self.result_queue.task_done() callback_fn(result, start, end) def add_all_results(self, callback_fn): results = [] while len(self.input_dict): self.add_result(callback_fn) for p in self.processes: p.join() ================================================ FILE: src/examples/mxnet/mxnet-gluon-tutorial.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "4dcf9bb1", "metadata": {}, "source": [ "## MXNet 1.8: Getting Started with Gluon Tutorial\n", "\n", "In this tutorial you will compile and deploy resnet-50 using the newly supported MXNet 1.8 and Gluon API on an Inf1 instance. This tutorial is only supported with MXNet 1.8.\n", "\n", "This Jupyter notebook should be run on an inf1.6xlarge instance since you will be loading and compiling several large models.\n", "\n", "To run this tutorial, please make sure you deactivate any existing MXNet conda environments you already using. Install MXNet 1.8 by following the instructions at [MXNet Setup Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/mxnet-setup/mxnet-install.html#install-neuron-mxnet). You would also need to change your kernel to use the correct Python environment setup earlier by clicking Kerenel->Change Kernel->Python (Neuron MXNet)" ] }, { "cell_type": "markdown", "id": "83eb578b", "metadata": {}, "source": [ "## Compile\n", "\n", "A trained model must be compiled to Inferentia target before it can run on Inferentia. In this step we compile a pre-trained ResNet50 and export it as a compiled MXNet checkpoint.\n", "\n", "Compilation will take a few minutes. At the end of compilation, the files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory.\n", "\n", "To check the supported operations for the uncompiled model or information on Neuron subgraphs for the compiled model, please see [Neuron Check Model](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-tools/tutorial-neuron-check-model.html#neuron-check-model)." ] }, { "cell_type": "code", "execution_count": null, "id": "88c41e01", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import os\n", "import mxnet as mx\n", "import mx_neuron as neuron\n", "import numpy as np\n", "\n", "path='http://data.mxnet.io/models/imagenet/'\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\n", "block = mx.gluon.nn.SymbolBlock.imports('resnet-50-symbol.json',\\\n", " ['data', 'softmax_label'], 'resnet-50-0000.params', ctx=mx.cpu())\n", "\n", "block.hybridize()\n", "\n", "# Compile for Inferentia using Neuron\n", "inputs = { \"data\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32'), 'softmax_label' : mx.nd.ones([1], name='data', dtype='float32') }\n", "block = neuron.compile(block, inputs=inputs)\n", "\n", "#save compiled model\n", "block.export(\"resnet-50_compiled\", 0, block)" ] }, { "cell_type": "code", "execution_count": null, "id": "6337e0ec", "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "id": "5a9af0c7", "metadata": {}, "source": [ "## Deploy\n", "\n", "Deply on Infenrentia to see the inference results as below:\n", "```\n", "probability=0.643591, class=n02123045 tabby, tabby cat\n", "probability=0.184392, class=n02123159 tiger cat\n", "probability=0.105063, class=n02124075 Egyptian cat\n", "probability=0.030101, class=n02127052 lynx, catamount\n", "probability=0.016112, class=n02129604 tiger, Panthera tigris\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "960c6aa9", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import mxnet as mx\n", "import mx_neuron as neuron\n", "\n", "path='http://data.mxnet.io/models/imagenet/'\n", "mx.test_utils.download(path+'synset.txt')\n", "\n", "fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\n", "img = mx.image.imread(fname)# convert into format (batch, RGB, width, height)\n", "img = mx.image.imresize(img, 224, 224) # resize\n", "img = img.transpose((2, 0, 1)) # Channel first\n", "img = img.expand_dims(axis=0) # batchify\n", "img = img.astype(dtype='float32')\n", "\n", "block = mx.gluon.nn.SymbolBlock.imports('resnet-50_compiled-symbol.json',\\\n", " ['data', 'softmax_label'], 'resnet-50_compiled-0000.params', ctx=mx.cpu())\n", "softmax = mx.nd.random_normal(shape=(1,))\n", "\n", "out = block(img, softmax).asnumpy()\n", "\n", "with open('synset.txt', 'r') as f:\n", " labels = [l.rstrip() for l in f]\n", "\n", "out = block(img, softmax).asnumpy()\n", "\n", "prob = np.squeeze(out)\n", "a = np.argsort(prob)[::-1]\n", "for i in a[0:5]:\n", " print('probability=%f, class=%s' %(prob[i], labels[i]))" ] }, { "cell_type": "raw", "id": "4f15e776", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/mxnet/resnet50/resnet50.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "wrapped-soccer", "metadata": {}, "source": [ "# Running Neuron Apache MXNet ResNet50 on Inferentia " ] }, { "cell_type": "markdown", "id": "appreciated-daily", "metadata": {}, "source": [ "## Introduction:\n", "In this tutorial we will compile and deploy ResNet50 model for Inferentia.\n", "In this tutorial we provide two main sections:\n", "\n", "1.Compile the ResNet50 model.\n", "\n", "2.Infer the compiled model.\n", "\n", "Before running the following verify this Jupyter notebook is running “conda_aws_neuron_mxnet_p36” kernel. You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\n", "Neuron supports Python module, Symbol APIs and the C predict API. The following quick start example uses the Symbol API.\n", "\n", "### Warning\n", "This tutorial was tested on MXNet-1.5\n", "\n", "MXNet-1.5 entered maintenance mode and require Neuron runtime 1.0, please see : [MXNet-1.5 enters maintainence mode](../../../../release-notes/maintenance.html)\n", "\n", "To setup development environment for MXNet-1.5 see installation instructions for Neuron 1.15.1 : [Neuron-1.15.1 MXNet install](../../../../archive/mxnet-neuron/setup/mxnet-install.html)" ] }, { "cell_type": "markdown", "id": "advance-rebound", "metadata": {}, "source": [ "## Compile model on Neuron\n", "The following step will compile the resnet50 model. Compilation will take a few minutes on inf1.6xlarge. At the end of compilation, the files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory." ] }, { "cell_type": "code", "execution_count": null, "id": "alpha-publication", "metadata": {}, "outputs": [], "source": [ "import mxnet as mx\n", "import numpy as np\n", "\n", "path='http://data.mxnet.io/models/imagenet/'\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\n", "sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)\n", "\n", "# Compile for Inferentia using Neuron\n", "inputs = { \"data\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }\n", "sym, args, aux = mx.contrib.neuron.compile(sym, args, aux, inputs)\n", "\n", "#save compiled model\n", "mx.model.save_checkpoint(\"resnet-50_compiled\", 0, sym, args, aux)" ] }, { "cell_type": "code", "execution_count": null, "id": "technical-reason", "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "id": "meaningful-substance", "metadata": {}, "source": [ "## Deploy on Inferentia\n", "Using same instance to deploy the model. " ] }, { "cell_type": "code", "execution_count": null, "id": "cooked-jonathan", "metadata": {}, "outputs": [], "source": [ "import mxnet as mx\n", "import numpy as np\n", "\n", "path='http://data.mxnet.io/models/imagenet/'\n", "mx.test_utils.download(path+'synset.txt')\n", "\n", "fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\n", "img = mx.image.imread(fname)# convert into format (batch, RGB, width, height)\n", "img = mx.image.imresize(img, 224, 224) # resize\n", "img = img.transpose((2, 0, 1)) # Channel first\n", "img = img.expand_dims(axis=0) # batchify\n", "img = img.astype(dtype='float32')\n", "\n", "sym, args, aux = mx.model.load_checkpoint('resnet-50_compiled', 0)\n", "softmax = mx.nd.random_normal(shape=(1,))\n", "args['softmax_label'] = softmax\n", "args['data'] = img\n", "\n", "# Inferentia context\n", "ctx = mx.neuron()\n", "\n", "exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')\n", "\n", "with open('synset.txt', 'r') as f:\n", " labels = [l.rstrip() for l in f]\n", "\n", "exe.forward(data=img)\n", "prob = exe.outputs[0].asnumpy()# print the top-5\n", "prob = np.squeeze(prob)\n", "a = np.argsort(prob)[::-1]\n", "for i in a[0:5]:\n", " print('probability=%f, class=%s' %(prob[i], labels[i]))\n", " \n", "# Sample output will look like below:\n", "#probability=0.634792, class=n02123045 tabby, tabby cat\n", "#probability=0.193601, class=n02123159 tiger cat\n", "#probability=0.103627, class=n02124075 Egyptian cat\n", "#probability=0.031604, class=n02127052 lynx, catamount\n", "#probability=0.015892, class=n02129604 tiger, Panthera tigris" ] } ], "metadata": { "kernelspec": { "display_name": "Environment (conda_aws_neuron_mxnet_p36)", "language": "python", "name": "conda_aws_neuron_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/mxnet/resnet50_neuroncore_groups.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Neuron Apache MXNet - Configurations for NeuronCore Groups Using Resnet50\n", "\n", "\n", "\n", "## Introduction:\n", "\n", "In this tutorial we will compile and deploy Resnet-50 model in parallel using the concept of NeuronCore Groups on an Inf1 instance. This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs. \n", "\n", "Set environment variable NEURON_RT_NUM_CORES to the total number of Neuron cores that will be utilized. The consecutive NeuronCore groups will be created by Neuron Runtime and place the models to the cores according to the compiled size.\n", "\n", "Note that in order to map a model to a group, the model must be compiled to fit within the group size. To limit the number of NeuronCores during compilation, use compiler_args dictionary with field “–neuroncore-pipeline-cores“ set to the group size. For exmaple, if NEURON_RT_NUM_CORES=4 and two models compiled with “–neuroncore-pipeline-cores=3“ and “–neuroncore-pipeline-cores=1“ were loaded, the first model would occupy NC0-2 and the second model would occupy NC3. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```\n", "compile_args = {'--neuroncore-pipeline-cores' : 2}\n", "sym, args, auxs = neuron.compile(sym, args, auxs, inputs, **compile_args)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In this tutorial we provide two main sections:\n", "\n", "1. Compile the Resnet50 model for Neuron\n", "\n", "2. Run inference using NeuronCore Groups\n", "\n", "Please use environment `conda_aws_neuron_mxnet_p36`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile model for Neuron\n", "\n", "Model must be compiled to Inferentia target before it can be used on Inferentia. In the following we will compile the the flag, --neuroncore-pipeline-cores set to 2 and run it. The files resnet-50_compiled-0000.params and resnet-50_compiled-symbol.json will be created in local directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from packaging import version\n", "import mxnet as mx\n", "import numpy as np\n", "\n", "import mx_neuron as neuron\n", "\n", "path='http://data.mxnet.io/models/imagenet/'\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params')\n", "mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json')\n", "sym, args, aux = mx.model.load_checkpoint('resnet-50', 0)\n", "\n", "# Compile for Inferentia using Neuron, fit to NeuronCore group size of 2\n", "inputs = { \"data\" : mx.nd.ones([1,3,224,224], name='data', dtype='float32') }\n", "compile_args = {'--neuroncore-pipeline-cores' : 2}\n", "sym, args, aux = neuron.compile(sym, args, aux, inputs, **compile_args)\n", "\n", "#save compiled model\n", "mx.model.save_checkpoint(\"resnet-50_compiled\", 0, sym, args, aux)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run inference using NeuronCore Groups\n", "\n", "Within the framework, the model can be mapped to specific cores using ```ctx=mx.neuron(N)``` context where N specifies the index of the Neuron core to deploy. For more information, see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/flex-eg.html .\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import warnings\n", "\n", "mx.test_utils.download(path+'synset.txt')\n", "\n", "fname = mx.test_utils.download('https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg?raw=true')\n", "img = mx.image.imread(fname) # convert into format (batch, RGB, width, height)\n", "img = mx.image.imresize(img, 224, 224) # resize\n", "img = img.transpose((2, 0, 1)) # Channel first\n", "img = img.expand_dims(axis=0) # batchify\n", "img = img.astype(dtype='float32')\n", "\n", "sym, args, aux = mx.model.load_checkpoint('resnet-50_compiled', 0)\n", "softmax = mx.nd.random_normal(shape=(1,))\n", "args['softmax_label'] = softmax\n", "args['data'] = img\n", "\n", "os.environ[\"NEURON_RT_NUM_CORES\"] = '4'\n", "\n", "\n", "# Inferentia context - group index 1 (size 2) would skip NC0 and place the \n", "# compiled model onto NC1,2\n", "ctx = mx.neuron(1)\n", "\n", "exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null')\n", "\n", "with open('synset.txt', 'r') as f:\n", " labels = [l.rstrip() for l in f]\n", "\n", "exe.forward(data=img)\n", "prob = exe.outputs[0].asnumpy()# print the top-5\n", "prob = np.squeeze(prob)\n", "a = np.argsort(prob)[::-1]\n", "for i in a[0:5]:\n", " print('probability=%f, class=%s' %(prob[i], labels[i]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can experiment with different Neuron core group combinations and different models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Troubleshooting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If not enough NeuronCores are provided, an error message will be displayed:\n", "\n", "```\n", "mxnet.base.MXNetError: [04:01:39] src/operator/subgraph/neuron/./neuron_util.h:541: Check failed: rsp.status().code() == 0: Failed load model with Neuron-RTD Error. Neuron-RTD Status Code: 9, details: \"\"\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Environment (conda_aws_neuron_mxnet_p36)", "language": "python", "name": "conda_aws_neuron_mxnet_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/neuron-monitor/neuron-monitor-grafana.json ================================================ { "annotations": { "list": [ { "builtIn": 1, "datasource": "-- Grafana --", "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "type": "dashboard" } ] }, "editable": true, "gnetId": null, "graphTooltip": 0, "id": 2, "iteration": 1605138719380, "links": [], "panels": [ { "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": { "align": null, "filterable": false }, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "red", "value": 80 } ] } }, "overrides": [ { "matcher": { "id": "byName", "options": "Value" }, "properties": [ { "id": "custom.width", "value": 163 } ] }, { "matcher": { "id": "byName", "options": "Field" }, "properties": [ { "id": "custom.width", "value": 450 } ] }, { "matcher": { "id": "byName", "options": "ami_id" }, "properties": [ { "id": "custom.width", "value": 217 } ] }, { "matcher": { "id": "byName", "options": "instance_type" }, "properties": [ { "id": "custom.width", "value": 391 } ] }, { "matcher": { "id": "byName", "options": "Prometheus instance" }, "properties": [ { "id": "custom.width", "value": 641 } ] } ] }, "gridPos": { "h": 8, "w": 24, "x": 0, "y": 0 }, "id": 8, "options": { "showHeader": true, "sortBy": [] }, "pluginVersion": "7.2.1", "repeat": null, "targets": [ { "expr": "instance_info", "format": "table", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "Instance Info", "transformations": [ { "id": "organize", "options": { "excludeByName": { "Time": true, "Value": true, "__name__": true, "ami_id": false, "instance": true, "job": true }, "indexByName": { "Time": 0, "Value": 7, "__name__": 1, "availability_zone": 8, "instance": 5, "instance_id": 2, "instance_name": 3, "instance_type": 4, "job": 6, "region": 9, "subnet_id": 10 }, "renameByName": { "Value": "", "availability_zone": "Availability Zone", "instance": "", "instance_id": "Instance ID", "instance_name": "Instance Name", "instance_type": "Instance Type", "region": "Region", "subnet_id": "Subnet" } } } ], "type": "table" }, { "datasource": null, "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "super-light-yellow", "value": null } ] } }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 0, "y": 8 }, "id": 36, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "last" ], "fields": "", "values": false }, "textMode": "auto" }, "pluginVersion": "7.2.1", "targets": [ { "expr": "count(instance_info)\n", "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "Instance Count", "type": "stat" }, { "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "light-blue", "value": null } ] }, "unit": "none" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 3, "y": 8 }, "id": 10, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "center", "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "textMode": "auto" }, "pluginVersion": "7.2.1", "targets": [ { "expr": "sum (system_vcpu_count)", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "vCPU Count", "type": "stat" }, { "datasource": null, "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "percentage", "steps": [ { "color": "green", "value": null }, { "color": "#EAB839", "value": 70 }, { "color": "orange", "value": 80 }, { "color": "semi-dark-red", "value": 90 } ] }, "unit": "percentunit" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 6, "y": 8 }, "id": 20, "options": { "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "showThresholdLabels": true, "showThresholdMarkers": true }, "pluginVersion": "7.2.1", "targets": [ { "expr": "avg(sum by (instance_id) (system_vcpu_usage_ratio))", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "vCPU Utilization", "type": "gauge" }, { "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "percentage", "steps": [ { "color": "green", "value": null }, { "color": "yellow", "value": 70 }, { "color": "orange", "value": 80 }, { "color": "red", "value": 90 } ] }, "unit": "percentunit" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 9, "y": 8 }, "id": 16, "options": { "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "showThresholdLabels": true, "showThresholdMarkers": true }, "pluginVersion": "7.2.1", "targets": [ { "expr": "avg(system_memory_used_bytes / system_memory_total_bytes)", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "Host Memory Usage", "type": "gauge" }, { "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "rgb(191, 151, 105)", "value": null } ] } }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 12, "y": 8 }, "id": 12, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "center", "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "textMode": "auto" }, "pluginVersion": "7.2.1", "targets": [ { "expr": "count(neuroncore_utilization_ratio > 0)", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "NeuronCores in Use", "transformations": [], "type": "stat" }, { "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": { "align": null, "filterable": false }, "mappings": [], "thresholds": { "mode": "percentage", "steps": [ { "color": "red", "value": null }, { "color": "orange", "value": 5 }, { "color": "yellow", "value": 20 }, { "color": "green", "value": 35 } ] }, "unit": "percentunit" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 15, "y": 8 }, "id": 4, "interval": "", "options": { "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "showThresholdLabels": true, "showThresholdMarkers": true }, "pluginVersion": "7.2.1", "targets": [ { "expr": "avg(neuroncore_utilization_ratio)", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "NeuronCore Utilization", "type": "gauge" }, { "datasource": "Prometheus", "description": "", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "percentage", "steps": [ { "color": "green", "value": null } ] }, "unit": "cps" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 18, "y": 8 }, "id": 6, "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "textMode": "auto" }, "pluginVersion": "7.2.1", "targets": [ { "expr": "sum(rate(execution_status_total{status_type=\"completed\"}[1m]))", "hide": false, "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "Execution Success Rate", "transformations": [], "type": "stat" }, { "datasource": "Prometheus", "description": "", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "red", "value": 1 } ] }, "unit": "cps" }, "overrides": [] }, "gridPos": { "h": 5, "w": 3, "x": 21, "y": 8 }, "id": 18, "options": { "colorMode": "value", "graphMode": "area", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "mean" ], "fields": "", "values": false }, "textMode": "auto" }, "pluginVersion": "7.2.1", "targets": [ { "expr": "sum(rate(execution_status_total{status_type!=\"completed\"}[1m]))", "instant": true, "interval": "", "legendFormat": "", "refId": "A" } ], "timeFrom": null, "timeShift": null, "title": "Execution Error Rate", "type": "stat" }, { "aliasColors": { "Inf Error Rate": "semi-dark-red", "Inf Success Rate": "light-green" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 12, "x": 0, "y": 13 }, "hiddenSeries": false, "id": 32, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "sum(rate(execution_status_total{status_type=\"completed\"}[1m]))", "interval": "", "legendFormat": "Execution Success Rate", "refId": "A" }, { "expr": "sum(rate(execution_status_total{status_type!=\"completed\"}[1m]))", "interval": "", "legendFormat": "Execution Error Rate", "refId": "B" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Execution Status Rates", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:547", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:548", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "p0": "dark-green", "p1": "semi-dark-green", "p100": "semi-dark-red", "p25": "light-green", "p50": "super-light-green", "p75": "super-light-red", "p99": "light-red", "{percentile=\"p0\"}": "dark-green", "{percentile=\"p1\"}": "semi-dark-green", "{percentile=\"p100\"}": "dark-red", "{percentile=\"p25\"}": "light-green", "{percentile=\"p50\"}": "super-light-green", "{percentile=\"p75\"}": "light-red", "{percentile=\"p99\"}": "semi-dark-red" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "description": "", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "red", "value": 80 } ] }, "unit": "s" }, "overrides": [] }, "fill": 0, "fillGradient": 0, "gridPos": { "h": 12, "w": 12, "x": 12, "y": 13 }, "hiddenSeries": false, "id": 34, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 1, "points": true, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "avg by (percentile) (execution_latency_seconds)", "interval": "", "legendFormat": "{{percentile}}", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Execution Latency", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:61", "format": "s", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:62", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "fieldConfig": { "defaults": { "custom": {}, "unit": "percentunit" }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 0, "y": 25 }, "hiddenSeries": false, "id": 30, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "avg by (neuroncore) (neuroncore_utilization_ratio)", "interval": "", "legendFormat": "nc{{neuroncore}}", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "NeuronCore Utilization", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:493", "format": "percentunit", "label": null, "logBase": 1, "max": "1", "min": "0", "show": true }, { "$$hashKey": "object:494", "format": "short", "label": null, "logBase": 1, "max": "100", "min": "0", "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "Runtime system CPU usage ": "light-red", "Runtime user CPU usage ": "light-green" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "red", "value": 80 } ] }, "unit": "percentunit" }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 8, "y": 25 }, "hiddenSeries": false, "id": 2, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": true, "steppedLine": false, "targets": [ { "expr": "avg by (usage_type) (neuron_runtime_vcpu_usage_ratio)", "format": "time_series", "instant": false, "interval": "", "legendFormat": "Neuron Runtime {{usage_type}} CPU usage ", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Neuron Runtime vCPU Usage", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:385", "format": "percentunit", "label": null, "logBase": 1, "max": "1", "min": "0", "show": true }, { "$$hashKey": "object:386", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "host": "rgb(0, 217, 255)", "neuron_device": "super-light-orange" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "fieldConfig": { "defaults": { "custom": {}, "unit": "bytes" }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 16, "y": 25 }, "hiddenSeries": false, "id": 28, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "avg by (memory_location) (sum by (instance_id, memory_location) (neuron_runtime_memory_used_bytes))", "interval": "", "legendFormat": "{{memory_location}}", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Neuron Runtime Used Memory", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:439", "format": "bytes", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:440", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "Memory Usage": "rgb(0, 217, 255)", "NeuronCore Usage": "light-orange", "vCPU Usage": "light-blue" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": null, "fieldConfig": { "defaults": { "custom": {}, "unit": "percentunit" }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 0, "y": 37 }, "hiddenSeries": false, "id": 22, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "avg(system_memory_used_bytes / system_memory_total_bytes)", "instant": false, "interval": "", "legendFormat": "Memory Usage", "refId": "A" }, { "expr": "avg(sum by (instance_id) (system_vcpu_usage_ratio))", "instant": false, "interval": "", "legendFormat": "vCPU Usage", "refId": "B" }, { "expr": "avg(neuroncore_utilization_ratio)", "instant": false, "interval": "", "legendFormat": "NeuronCore Usage", "refId": "C" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Host System Utilization", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:664", "format": "percentunit", "label": null, "logBase": 1, "max": "1", "min": "0", "show": true }, { "$$hashKey": "object:665", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "system": "light-red", "user": "light-green" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "unit": "percentunit" }, "overrides": [] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 8, "y": 37 }, "hiddenSeries": false, "id": 24, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": true, "steppedLine": false, "targets": [ { "expr": "avg by (usage_type) (system_vcpu_usage_ratio)", "interval": "", "legendFormat": "{{usage_type}}", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Host vCPU Usage", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:876", "format": "percentunit", "label": null, "logBase": 1, "max": "1", "min": "0", "show": true }, { "$$hashKey": "object:877", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": { "Memory Usage Bytes": "rgb(223, 180, 0)", "Memory Usage Percent": "rgb(0, 217, 255)" }, "bars": false, "dashLength": 10, "dashes": false, "datasource": "Prometheus", "fieldConfig": { "defaults": { "custom": {}, "unit": "short" }, "overrides": [ { "matcher": { "id": "byName", "options": "Memory Usage Percent" }, "properties": [ { "id": "unit", "value": "percentunit" } ] }, { "matcher": { "id": "byName", "options": "Memory Usage Bytes" }, "properties": [ { "id": "unit", "value": "bytes" } ] } ] }, "fill": 1, "fillGradient": 0, "gridPos": { "h": 12, "w": 8, "x": 16, "y": 37 }, "hiddenSeries": false, "id": 26, "legend": { "avg": false, "current": false, "max": false, "min": false, "show": true, "total": false, "values": false }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [ { "$$hashKey": "object:711" }, { "$$hashKey": "object:931", "alias": "Memory Usage Bytes", "yaxis": 2 } ], "spaceLength": 10, "stack": false, "steppedLine": false, "targets": [ { "expr": "avg(system_memory_used_bytes / system_memory_total_bytes)", "instant": false, "interval": "", "legendFormat": "Memory Usage Percent", "refId": "A" }, { "expr": "avg(system_memory_used_bytes)", "instant": false, "interval": "", "legendFormat": "Memory Usage Bytes", "refId": "B" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Host Memory Usage", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:689", "format": "percentunit", "label": "", "logBase": 1, "max": "1", "min": "0", "show": true }, { "$$hashKey": "object:690", "decimals": null, "format": "bytes", "label": "", "logBase": 1, "max": null, "min": "0", "show": true } ], "yaxis": { "align": false, "alignLevel": null } } ], "refresh": "5s", "schemaVersion": 26, "style": "dark", "tags": [], "templating": { "list": [ { "datasource": "Prometheus", "filters": [], "hide": 0, "label": "", "name": "Filters", "skipUrlSync": false, "type": "adhoc" } ] }, "time": { "from": "now-6h", "to": "now" }, "timepicker": {}, "timezone": "", "title": "neuron-monitor", "uid": "EqWNYf5Mz", "version": 68 } ================================================ FILE: src/examples/pytorch/bert_tutorial/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/pytorch/bert_tutorial/THIRD ================================================ ================================================ FILE: src/examples/pytorch/bert_tutorial/THIRD PARTY LICENSE.txt ================================================ ** transformers; version 2.8.0 -- https://github.com/huggingface/transformers Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ================================================ FILE: src/examples/pytorch/bert_tutorial/bert_benchmark_utils.py ================================================ import torch import torch.neuron import os import sys import csv import math from collections import Counter import numpy as np class BertTestDataset(torch.utils.data.Dataset): """Bert test dataset.""" def __init__(self, tsv_file, tokenizer, max_length=128, transform=None): """ Args: csv_file (string): Path to the csv file with annotations. tokenizer (callable = hugging face tokenizer): Takes a string and encodes to standard input tensor set max_length (int): Maximum length that all input tensors will be padded to transform (callable, optional): Optional transform to be applied on a sample. """ with open(tsv_file, "r") as f: reader = csv.reader(f, delimiter="\t", quotechar=None) self.lines = list(reader) self.lines.pop(0) self.tokenizer = tokenizer self.max_length = max_length self.transform = transform def __len__(self): return len(self.lines) def __getitem__(self, idx): if torch.is_tensor(idx): idx = idx.tolist() s1_raw = self.lines[idx][3] if isinstance(s1_raw, bytes): s1_raw = s1_raw.decode("utf-8", "ignore") s2_raw = self.lines[idx][4] if isinstance(s2_raw, bytes): s2_raw = s2_raw.decode("utf-8", "ignore") quality = self.lines[idx][0] encoded = self.tokenizer.encode_plus(s1_raw, s2_raw, add_special_tokens=True, return_tensors='pt', max_length=self.max_length, padding='max_length', truncation=True) sample = {'encoded': encoded, 'quality': quality} if self.transform: sample = self.transform(sample) return sample class BertResults(): def __init__(self, batch_size, num_cores=1): self.correct_count = 0 self.inference_count = 0 self.latency_array = [] self.end_times = [] self.start_times = [] self.batch_size = batch_size self.num_cores = num_cores def add_result(self, correct_count, inference_count, latency_array, end_times, start_times): self.correct_count += correct_count self.inference_count += inference_count self.latency_array.extend(latency_array) self.end_times.extend(end_times) self.start_times.extend(start_times) def report(self, f, window_size=1): assert(len(self.latency_array) != 0) p50_latency = np.percentile(self.latency_array, 50) p90_latency = np.percentile(self.latency_array, 90) p95_latency = np.percentile(self.latency_array, 95) p99_latency = np.percentile(self.latency_array, 99) p100_latency = np.percentile(self.latency_array, 100) def get_bucket(start, end): bucketed_start = math.floor(start / window_size) * window_size bucketed_end = math.ceil(end / window_size) * window_size # The check is to make sure that we ignore timestamps that are larger than the window size if bucketed_end - bucketed_start == window_size: return bucketed_start else: return None # Divide the timestamps into different buckets bucketed_timestamps = [get_bucket(start, end) for start, end in zip(self.start_times, self.end_times)] # Count the values in each bucket counted_buckets = Counter( item for item in bucketed_timestamps if item is not None) # Normalize each bucket bucket_throughputs = [(key, value / window_size) for key, value in sorted(counted_buckets.items())] busy_throughputs = [value for _, value in bucket_throughputs] max_throughput = max(busy_throughputs) * self.batch_size avg_throughput = sum(busy_throughputs) * self.batch_size / len(busy_throughputs) f.write("\n") f.write( "Maximum throughput = {} sentences/sec\n".format(int(max_throughput))) f.write("Average throughput = {} sentences/sec\n".format(int(avg_throughput))) f.write("\n") f.write("Latency Percentiles:\n") f.write("===\n") f.write("P50 = {} milliseconds\n".format(int(1000*p50_latency))) f.write("P90 = {} milliseconds\n".format(int(1000*p90_latency))) f.write("P95 = {} milliseconds\n".format(int(1000*p95_latency))) f.write("P99 = {} milliseconds\n".format(int(1000*p99_latency))) f.write("P100 = {} milliseconds\n".format(int(1000*p100_latency))) f.write("\n") f.write("Accuracy:\n") f.write("===\n") if self.inference_count == 0: self.inference_count = 1 accuracy = float(self.correct_count) / float(self.inference_count) f.write("Accuracy = {}% \n".format(round(100*accuracy, 2))) f.write("\n") f.write("Sanity test:\n") f.write("===\n") f.write("Processed - num batches {}\n".format(len(self.latency_array))) f.write(" - batch size {}\n".format(self.batch_size)) f.write(" - num cores {}\n".format(self.num_cores)) ================================================ FILE: src/examples/pytorch/bert_tutorial/glue_mrpc_dev.tsv ================================================ Quality #1 ID #2 ID #1 String #2 String 1 1355540 1355592 He said the foodservice pie business doesn 't fit the company 's long-term growth strategy . " The foodservice pie business does not fit our long-term growth strategy . 0 2029631 2029565 Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war . His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war . 0 487993 487952 The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat . The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent . 1 1989515 1989458 The AFL-CIO is waiting until October to decide if it will endorse a candidate . The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries . 0 1783137 1782659 No dates have been set for the civil or the criminal trial . No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty . 1 3039165 3039036 Wal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed . It has also said it would review all of its domestic employees more than 1 million to ensure they have legal status . 0 1490811 1490840 While dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell . The Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s . 1 426112 426210 This integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET. IBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net. 1 1439663 1439808 The top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 . For residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent . 1 3147370 3147525 The results appear in the January issue of Cancer , an American Cancer Society journal , being published online today . The results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday . 1 3300040 3299992 The delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . Bin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . 0 524136 524119 " Sanitation is poor ... there could be typhoid and cholera , " he said . " Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . " 0 969512 969295 The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . The technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 . 1 1685339 1685429 The only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall . So far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall . 1 1967578 1967664 The decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July . Scotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July . 1 2047034 2046820 Unable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California . The judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California . 1 2046630 2046644 The decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing . The decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget . 0 2221603 2221633 In midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 . The Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 . 1 129995 129864 Morgan Stanley raised its rating on the beverage maker to " overweight " from " equal-weight " saying in part that pricing power with its bottlers should improve in 2004 . Morgan Stanley raised its rating on the company to " overweight " from " equal-weight , " saying the beverage maker 's pricing power with bottlers should improve in 2004 . 0 919683 919782 The pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 . The British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 . 0 970740 971209 Friday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 . Stanford ( 46-15 ) has a team full of such players this season . 1 2745055 2745022 Last month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion . At the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion . 0 2199097 2199072 The driver , Eugene Rogers , helped to remove children from the bus , Wood said . At the accident scene , the driver was " covered in blood " but helped to remove children , Wood said . 1 1609290 1609098 ONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader . Tens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader . 1 1597193 1597119 Saddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers . Hussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers . 1 2758944 2758975 Its closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean . Its closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean . 0 2584416 2584653 Cooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo . Lee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad . 1 86007 86373 " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " " Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , " Graham said . 1 1602860 1602844 He said they lied on a sworn affidavit that requires them to list prior marriages . Morgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages . 1 1201306 1201329 The association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes . The Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes . 0 461779 461815 With these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 . Funny Cide is looking to become horse racing 's first Triple Crown winner in a generation . 1 1438666 1438643 Intel was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel , " spokesman Chuck Mulloy said . Intel spokesman Chuck Mulloy said the company was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel . " 1 3261484 3261306 Mr Annan also warned the US should not use the war on terror as an excuse to suppress " long-cherished freedoms " . Annan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress " long-cherished " freedoms . 1 1277539 1277527 At community colleges , tuition will jump to $ 2,800 from $ 2,500 . Community college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent . 1 3035788 3035918 He made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol . Though Dean made a point of saying during the debate that the Confederate flag is a racist symbol . 0 132553 132725 Bush wanted " to see an aircraft landing the same way that the pilots saw an aircraft landing , " White House press secretary Ari Fleischer said yesterday . On Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing . 0 2259788 2259747 On Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office . Palestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise . 0 2307064 2307235 The civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 . The civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health . 1 3046488 3046824 Per-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning . Workplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 . 1 86020 86007 " Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , " Mr. Graham said . " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " 0 1100998 1100441 SARS has killed about 800 people and affected more than 8400 since being detected in China in November . SARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia . 1 2268396 2268480 Authorities had no evidence to suggest the two incidents were connected . There was no immediate evidence that the two incidents were connected , police said . 0 1984039 1983986 " Jeremy 's a good guy , " Barber said , adding : " Jeremy is living the dream life of the New York athlete . He also said Shockey is " living the dream life of a New York athlete . 0 2697659 2697747 Ratliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death . Peterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial . 0 2175939 2176090 After losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 . In midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 . 1 886618 886456 Rumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals . Rumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals . 1 588637 588864 Consumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 . Consumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April . 0 2252795 2252970 He has no immediate plans for television advertising , believing it is unnecessary this early . A Lieberman aide said there were no immediate plans for television advertising . 1 1756329 1756394 " I think it happened very quickly , " Houston Police Department homicide investigator Phil Yochum said of the crime . " I think it happened very quickly , " said Investigator Phil Yochum of the Houston Police Department 's homicide division . 1 1673112 1673068 United issued a statement saying it will " work professionally and cooperatively with all its unions . " Senior vice president Sara Fields said the airline " will work professionally and cooperatively with all our unions . " 1 2357324 2357271 " But they never climb out of the pot of beer again . " It 's just that they never climb out of the beer again . " 1 780408 780363 Chief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected . Bryant has said that hike had a greater effect on demand than officials expected . 1 821523 821385 Robert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD . NCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection . 1 2304696 2304863 HP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell . HPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell . 1 2531749 2531607 Chirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family . Chirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family . 1 3180014 3179967 The charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country . The government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries . 1 726966 726945 In the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points . It has a margin of error of plus or minus three to four percentage points . 1 2638861 2638982 Mr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities . Clinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities . 1 2495223 2495307 " This decision is clearly incorrect , " FTC Chairman Timothy Muris said in a written statement . The decision is " clearly incorrect , " FTC Chairman Tim Muris said . 1 55187 54831 Prosecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building . Prosecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building . 0 2763381 2763517 Terri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years . Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . 1 1990975 1991132 Secretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday . U.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens . 1 2204353 2204418 " Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . " " Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . " 1 60122 60445 That would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . The inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . 1 961836 962243 PeopleSoft also said its board had officially rejected Oracle 's offer . Thursday morning , PeopleSoft 's board rejected the Oracle takeover offer . 0 3140260 3140288 The Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday . The Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 . 1 1720166 1720115 Cortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest . Cortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest . 1 2573262 2573319 " The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , " Mr Howard said . " The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . " 0 1353356 1353174 " Biotech products , if anything , may be safer than conventional products because of all the testing , " Fraley said , adding that 18 countries have adopted biotechnology . " Biotech products , if anything , may be safer than conventional products because of all the testing , " said Robert Fraley , Monsanto 's executive vice president . 1 2738677 2738741 The rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study . The study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s . 1 1638813 1639087 We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , " Rumsfeld said . Rather , the US acted because the administration saw " existing evidence in a new light , through the prism of our experience on September 11 " . 1 1605350 1605425 Trans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat . Trans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat . 1 2494149 2494073 However , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market . A 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market . 1 3023029 3023229 Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . Peterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner . 1 1351550 1351155 Carlson on Tuesday said he would not recuse himself from the case . Service officials said Carlson refused to recuse himself from the case . 1 981185 981234 The program will grow to include ports in Dubai , Turkey and Malaysia , among others . The program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said . 0 2111629 2111786 McCabe said he was considered a witness , not a suspect . " He is not considered a suspect , " McCabe said . 1 655498 655391 The woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health . The woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health . 1 533823 533909 He added that those " are not solely American principles , nor are they exclusively Western . " " These are not solely American principles nor are they exclusively Western , " Rumsfeld said . 1 581592 581570 " If we don 't march into Tehran , I think we will be in pretty good shape , " he said . " As long as we don 't march on Tehran , I think we are going to be in pretty good shape , " he said . 0 1010655 1010430 On Saturday , a 149mph serve against Agassi equalled Rusedski 's world record . On Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi . 1 2241925 2242066 Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently . Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate . 1 2796978 2797024 " APEC leaders are painfully aware that security and prosperity are inseparable , " Thai Prime Minister Thaksin Shinawatra told business leaders . " APEC leaders are painfully aware that security and prosperity are inseparable , " Thaksin said . 0 101746 101775 Danbury prosecutor Warren Murray could not be reached for comment Monday . Prosecutors could not be reached for comment after the legal papers were obtained late Monday afternoon . 1 327839 327748 Wittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business . Wittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company . 0 2988297 2988555 Shattered Glass , " starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters . " Shattered Glass " _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters . 1 2217613 2217659 He was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston . He was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife . 0 2128530 2128455 However , EPA officials would not confirm the 20 percent figure . Only in the past few weeks have officials settled on the 20 percent figure . 1 2208376 2208198 University of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , " Our fundamental values haven 't changed . " Our fundamental values haven 't changed , " Mary Sue Coleman , president of the university , said in a statement in Ann Arbor . 1 1980654 1980641 The first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs . The first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs . 0 589579 589557 However , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co . Lapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda . 1 1636060 1635946 Michel , who remains in the government , denied that US pressure had provoked the government 's move . Michel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move . 1 1630585 1630657 Some of the computers also are used to send spam e-mail messages to drum up traffic to the sites . Some are also used to send spam e-mail messages to boost traffic to the sites . 0 447728 447699 Indonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations . Indonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied . 1 1606495 1606619 Bush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease . President Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic . 1 1550897 1550977 Later this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions . This fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence . 0 490376 490490 The reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday . Stocks slipped at the open after the euro hit record highs against the dollar . 1 3084554 3084612 Sales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros . Sales rose 37 per cent year-on-year to 1.76bn , beating expectations . 1 315647 315778 If the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back . If the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change . 1 3428298 3428362 Robert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus . Walsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night . 1 2523564 2523358 The Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box . The µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box . 1 2079200 2079131 U.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America . U.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America . 1 818091 817811 The company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results . The company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July . 1 1580638 1580663 " I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . " I stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , " Blair said . 0 1919740 1919926 " I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , " Parrish said . " I don 't know whether the person I 'm talking to now may end up being someone else , " Parrish said . 1 2748287 2748550 " I think it 's going to be a close vote , but I think the grant proposal is going to win , " McConnell said . " I think it 's going to be a close vote , but I think the grant proposal 's going to win , " said Sen. Mitch McConnell , assistant majority leader . 1 3394891 3394775 Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins . Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through . 0 2963943 2963880 One , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday . Her 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition . 0 1865364 1865251 The United States finally relented during President Bush 's visit to Africa earlier this month . During President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase . 1 263690 263819 " There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , " he said . He also said there is no conscious policy by the United States to move the value of the dollar . 1 283751 283290 It 's the first such drill since the September 11 terrorist attacks on New York and Washington . It is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks . 1 2517014 2516995 Myanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said . Myanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday . 1 1330643 1330622 According to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . The Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . 1 3111452 3111428 In an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites . In an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages . 0 1167835 1167651 Kansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year . Statistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year . 0 1423836 1423708 A European Union spokesman said the Commission was consulting EU member states " with a view to taking appropriate action if necessary " on the matter . Laos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter . 1 2090911 2091154 Waiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades . But waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades . 1 2265271 2265152 Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States . Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market . 1 3062202 3062308 By skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is " less predictable " than for those obtained in the United States . By skirting the FDA 's oversight , Eagan said the quality of the imported drugs is " less predictable " than U.S. drugs . 1 2155514 2155377 He said : " For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . " " For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , " Dyke said . 1 1552068 1551928 Three such vigilante-style attacks forced the hacker organizer , who identified himself only as " Eleonora [ 67 ] , " to extend the contest until 7 p.m. EST Sunday . Three such vigilante-style attacks forced the hacker organiser , who identified himself only as " Eleonora67 ] , " to extend the contest until 8am ( AEST ) today . 1 936978 937500 Eric Gagne pitched a perfect ninth for his 23rd save in as many opportunities . Gagne struck out two in a perfect ninth inning for his 23rd save . 0 985015 984975 One way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday . Just about everything about " Harry Potter and the Order of the Phoenix " will set records . 1 1430357 1430425 " Allison just proves you don 't need to wait until August or September to have a disaster , " said Josh Lichter , a meteorologist with the Houston-Galveston weather office . " Allison just proves you don 't need to wait until August or September to have a disaster , " Lichter said . 1 3039310 3039413 Today , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks . On Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 . 1 34513 34742 Police say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States . Mr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US . 1 368067 368018 Chiron already has nearly 20 percent acceptances from PowderJect 's shareholders . Chiron has acceptances from holders of nearly 20 percent of PowderJect shares . 0 611663 611716 Ernst & Young has denied any wrongdoing and plans to fight the allegations . Ernst & Young has denied the SEC 's claims , and called its recommendations " irresponsible " . 1 98432 98657 The attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence . The attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence . 1 3039007 3038845 No company employee has received an individual target letter at this time . She said no company official had received " an individual target letter at this time . " 1 1708040 1708062 Second-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share . The second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share . 0 1757264 1757375 He allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement . The two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement . 1 383417 383558 Worldwide , more than 50 million people have seen " Les Miz , " with gross receipts of $ 1.8 billion . Worldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion . 0 2766112 2766084 In fiction : Edward P. Jones ( " The Known World " ) and Scott Spencer ( " A Ship Made of Paper " ) . The fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper . 1 1261116 1261234 " Overwhelmingly the Windows brand really resonated with them . " " Windows was the part of the experience that really resonated with people . " 1 3028143 3028234 The Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes . The Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year . 0 249699 249623 Vivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing . During difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing . 0 3448488 3448449 The Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months . The Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 . 1 2749322 2749663 The Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission . The Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission . 0 2204592 2204588 Sun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition . The vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) . 1 2889005 2888954 Prosecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings . Prosecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings . 0 1657632 1657619 The Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today . Goodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment . 0 555617 555528 The 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 2396937 2396818 " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the Fed said in a statement accompanying the unanimous decision . " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the policy-setting Federal Open Market Committee said . 0 2339738 2339771 " It is bad for Symbian , " said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein . " Motorola has displayed clear disloyalty " to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London . 0 1616174 1616206 Bob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling . Bob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment . 1 635783 635802 But Ms Ward said the headroom under its financial covenants was " tight " and that there could be another downgrade if Southcorp breached any of its banking covenants . But Ms Ward said the headroom under its financial covenants was " tight " and that there could be a rating downgrade if Southcorp did breach any banking covenants . 1 3444633 3444733 He added : ``I 've never heard of more reprehensiblebehaviour by a doctor . The Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor . 1 555553 555528 Broomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 1112021 1111925 Other staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue . Some E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue . 0 2749410 2749625 President Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday . President Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday . 1 1629064 1629043 An episode is declared when the ozone reaches .20 parts per million parts of air for one hour . A Stage 1 episode is declared when ozone levels reach 0.20 parts per million . 1 789691 789665 " He may not have been there , " the defence official said on Thursday . " He may not have been there , " said a defence official speaking on condition of anonymity . 1 844421 844679 The U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence . The troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence . 1 58540 58567 North American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight . North American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight . 1 781439 781461 Xerox itself paid a $ 10 million fine last year to settle similar SEC charges . Xerox itself previously paid a $ 10-million penalty to settle the SEC accusations . 1 1909579 1909408 " This deal makes sense for both companies , " said National Chief Executive Brian Halla . " This deal makes sense for both companies , " Halla said in a prepared statement . 0 787432 787464 The blasts killed two people and injured more than 150 others . The Atlanta Olympic Games attack killed one woman and injured more than 100 other people . 0 52758 52343 Morrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service . At the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her . 1 1675025 1675047 Spansion products are to be available from both AMD and Fujitsu , AMD said . Spansion Flash memory solutions are available worldwide from AMD and Fujitsu . 1 2131318 2131372 About 1,500 police will be deployed for the visit . Around 1,500 police are to be deployed at Niigata for the ferry 's visit . 1 325763 325928 Gamarekian told The News she remembers only the woman 's first name - and refused to reveal it . She told the New York Daily News she remembers only the intern 's first name , which she refused to reveal . 1 2638975 2638855 One of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding . One of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding . 1 2198694 2198937 A nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year . A nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 . 1 1825432 1825301 A man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday . The Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said . 1 2906104 2906322 They were being held Sunday in the Camden County Jail on $ 100,000 bail . They remained in Camden County Jail on Sunday on $ 100,000 bail . 1 722278 722383 Ms Stewart , the chief executive , was not expected to attend . Ms Stewart , 61 , its chief executive officer and chairwoman , did not attend . 0 101747 101777 Christina 's aunt , Shelley Riling , said the defense 's claims were preposterous . Christina 's aunt , Shelley Riling , said she will address the court . 1 2224884 2224819 The Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights . The Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights . 0 977938 978162 Lord Falconer hailed the changes as " a new beginning as far as the courts , Crown Prosecution Service and police are concerned " . " It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . " 0 1015010 1014963 GE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 1513190 1513246 At least 27 US troops have been killed in hostile fire since Bush 's statement . At least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 . 1 2385348 2385394 A recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday . A recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday . 1 2317018 2317252 November 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 . November 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 . 0 1831696 1831660 The agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies . The agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify . 1 1528383 1528083 Zulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards . Witness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards . 1 917965 918315 For the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase . For the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral . 0 3218713 3218830 Q : Can I buy coverage for prescription drugs right away ? Congress has added a new benefit - an option to buy insurance coverage for prescription drugs . 1 221079 221003 The airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers . The airline has the option to buy 380 more , split evenly between the two manufacturers . 1 2546175 2546198 Dr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions . Dr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function . 0 799346 799268 The chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion . The chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue . 0 2673104 2673130 All patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea . Symptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping . 1 1354501 1354476 Federal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc . Federal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream . 1 3070979 3070949 Environmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK . Environmental campaigners used the eclipse to highlight the surge in light pollution across Britain . 0 1264509 1264471 Available July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems . The OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers . 1 103280 103431 Justice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use . Justice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot . 0 110731 110648 But Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic . Billups scored 77 points in the final two games of the first-round series against the Magic . 1 2274844 2274714 Kelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war . He killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq . 0 1050307 1050144 And it 's going to be a wild ride , " said Allan Hoffenblum , a Republican consultant . Now the rest is just mechanical , " said Allan Hoffenblum , a Republican consultant . 1 2810634 2810670 While the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each . Instead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each . 1 3073773 3073779 Lay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination . Lay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination . 0 261202 260995 The WHO experts didn 't say how many cases in Hebei were in rural areas . Hebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas . 1 1824224 1824209 Nearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours . Mutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired . 1 548867 548785 In three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th . In the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list . 0 2796658 2796682 About two hours later , his body , wrapped in a blanket , was found dumped a few blocks away . Then his body was dumped a few blocks away , found in a driveway on Argyle Road . 1 1808166 1808434 Columbia broke up over Texas upon re-entry on Feb. 1 . Columbia broke apart in the skies above Texas on Feb. 1 . 1 853475 853342 A year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs . Within two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs . 0 977772 977804 The Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign . Falconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign . 1 577854 578500 Cindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents . She started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents . 1 2829194 2829229 The two are not related , but have referred to each other as father and son . He 's not related to Malvo , but the two have referred to each other as father and son . 1 2074182 2074668 Gibson said last month in a press statement that " neither I nor my film are anti-Semitic . Gibson said in a June statement that he and his film are not anti-Semitic . 0 2758265 2758282 The world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates . The world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them . 1 1958079 1958143 The Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data . The blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 . 1 544217 544325 The vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council . The vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council . 1 2385288 2385256 Large swells and dangerous surf already were being felt along sections of the coast . Already large swells and dangerous surf have arrived along the mid-Atlantic . 0 2324708 2325028 Based on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent . Labor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent . 1 2139506 2139427 " We will work with the board to ensure a smooth transition . " He said federal regulators would work with the corporation to ensure a " smooth transition . " 1 2965576 2965701 Gasps could be heard in the courtroom when the photo was displayed . Gasps could be heard as the photo was projected onto the screen . 1 2931098 2931144 Gilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter . Quarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said . 0 644788 644816 " I had one bad stretch of holes that put me out of contention to win , " Woods said . " I had one bad stretch of holes that put me out of contention , " Woods said , referring to his 42 on the front nine Saturday . 0 2551891 2551563 The poll had a margin of error of plus or minus 2 percentage points . It had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday . 1 1089053 1089297 Sen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic . Sen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered . 1 3435735 3435717 The broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 . The Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 . 0 1954 2142 Watertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country . Along with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday . 1 3400796 3400822 That is evident from their failure , three times in a row , to get a big enough turnout to elect a president . Three times in a row , they failed to get a big _ enough turnout to elect a president . 1 1220668 1220801 We firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . " We firmly believe that we have an absolute right to use the common word ' spike ' to name our network . 1 1889954 1889847 Sources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE . Late last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets . 1 315785 315653 But MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found . MTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said . 0 1521034 1520582 White , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman . White , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke . 1 2083598 2083810 About 10 percent of high school and 16 percent of elementary students must be proficient at math . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 1910610 1910455 The legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company . The legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company . 1 3113791 3113782 The European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached . The European Commission is expected to issue its decision in the case next spring — unless a settlement is reached . 1 3214517 3214483 " So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , " she told jurors . " Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , " Ms. Richardson declared . 0 2083612 2083810 Twenty percent of Latino students and 23 percent of black students performed at proficient or higher . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 661390 661218 He is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama . He is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama . 1 1269572 1269682 The men were remanded in custody and are due to appear again before court on July 8 . They were remanded in custody and will appear in court again on July 8 . 1 1095780 1095652 " No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , " Wheeler said in a statement . No matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday . 1 116294 116332 The Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 . The Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 . 1 941617 941673 He said his hatred for such people grew from these discussions and had helped convince him violence was the answer . His hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea . 1 2640607 2640576 " There is no need for one deadline for all to create the ASEAN Economic Community , " Thaksin said . Thus , he said , there did not have to one deadline to create the economic community . 1 3310210 3310286 The announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said . The broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said . 1 3376093 3376101 The additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes . The donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 . 1 1549586 1549609 Leon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville . The dead man , Leon Williams , was found in his third-floor apartment . 1 460211 460445 The player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said . He failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said . 1 1196962 1197061 But Virgin wants to operate Concorde on routes to New York , Barbados and Dubai . Branson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai . 0 862804 862715 He tried to fight off officers and was taken to a hospital after a police dog bit him but was later released . Cruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said . 1 1726935 1726879 The announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs . Economists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs . 0 331980 332110 Asked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission . Asked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : " Of course they may not go . 1 173879 173832 Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar . Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates . 0 2834988 2835026 Iran has until the end of the month to satisfy the agency it has no plans for nuclear weapons . The Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities . 1 2587300 2587243 Her father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will . Her father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will . 0 554905 554627 Claire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee . One by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee . 1 1912524 1912648 Citigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group . Citigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business . 1 3255597 3255668 " They 've been in the stores for over six weeks , " says Carney . The quarterlies usually stay in stores for between six to eight weeks , " Carney added . 1 629316 629289 Let me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community . " The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , " he said . 1 54181 53570 Ridge said no actual explosives or other harmful substances will be used . Ridge said no real explosives or harmful devices will be used in the exercise . 1 723557 724115 Thus far , Stewart 's company appears ready to stand behind her . For now , the company 's management appears to be standing behind Stewart . 0 2607718 2607708 But late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement . But late yesterday , the campaign and the state Democratic Party said there would be no news conference . 1 753858 753890 There 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 587009 586969 Another $ 100-million in savings will come from management layoffs and pay cuts . The airline expects to save another $ 100-million a year through management layoffs and pay cuts . 1 308567 308525 He called on Prime Minister John Howard to establish a royal commission on child sex abuse . The Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse . 0 665419 665612 " We think that the United States of America should support the free speech of all groups , " Mr. White said , objecting to Mr. Olson 's recommendation . We think that the United States of America should support the free speech of all groups , he said . 1 2763517 2763576 Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . The tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years . 0 3107118 3107136 After 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries . After 18 months , the atorvastatin patients had no change in the plaque in their arteries . 1 780604 780466 Toll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail . Toll last week offered to buy the company for NZ75c a share , or $ NZ158 million . 0 1989213 1989116 " This child was literally neglected to death , " Armstrong County District Attorney Scott Andreassi said . Armstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen . 1 1462409 1462504 Wal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday . Wal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday . 1 260952 260924 Metro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported . Subway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said . 1 1224743 1225510 In the undergraduate case , Rehnquist said the use of race was not " narrowly tailored " to achieve the university 's asserted interest in diversity . Rehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity . 0 3329379 3329416 SP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) . The firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) . 1 2362761 2362698 A landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said . In central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said . 0 1465073 1464854 They will help draft a plan to attack obesity that Kraft will implement over three to four years . The team will help draft a plan by the end of the year to attack obesity . 1 195728 196099 But that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion . Such an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion . 1 2587767 2587673 In the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs . In Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs . 0 1490044 1489975 Corixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market . Shares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 . 1 958161 957782 Committee approval , expected today , would set the stage for debate on the Senate floor beginning Monday . That would clear the way for debate in the full Senate beginning on Monday . 1 1033204 1033365 O 'Brien was charged with leaving the scene of a fatal accident , a felony . Bishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident . 0 2996241 2996734 Tom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning . Bethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday . 0 2015389 2015410 The Calgary woman , who is in her twenties , donated blood on Aug. 7 . The woman -- who has no symptoms of illness -- donated blood Aug. 7 . 1 221515 221509 Quattrone lawyer John W. Keker said his client is innocent . In a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent . 0 2283737 2283794 In the weeks leading up to the execution , several Florida officials received anonymous threatening letters . Several Florida officials connected to the case have received threatening letters , accompanied by rifle bullets . 1 2826681 2826474 The disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday . The fight over online music sales was disclosed in documents made available Monday by the court . 1 2249237 2249305 Parson was charged with intentionally causing and attempting to cause damage to protected computers . Parson is charged with one count of intentionally causing damage to a protected computer . 1 389239 389299 " The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , " the judge said . " The court and the public need to know more of the defendants ' seemingly massive fraud , " he said . 1 2652187 2652218 The U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users . The high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users . 1 2945693 2945847 The IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts . The IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts . 1 2065523 2065836 " More than 70,000 men and women from bases in Southern California were deployed in Iraq . In all , more than 70,000 troops based in Southern California were deployed to Iraq . 1 2222998 2223097 BP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange . BP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange . 1 2561999 2561941 Because of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 . Including the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 . 0 2324704 2325023 Friday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate . U.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery . 1 2336453 2336545 Federal Emergency Management Administration designated $ 20 million to establish the registry . The registry was launched with $ 20 million from the Federal Emergency Management Agency . 1 720572 720486 BREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday . Cases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time . 1 1605818 1605806 " It was never our intention to sell the product , " said Health Minister Anne McClellan , a skeptic of medical marijuana use . " It was never the intention of us to sell product , " federal Health Minister Anne McLellan said yesterday in Edmonton . 0 2440680 2440474 GM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses . They cover more than 300,000 UAW workers and 500,000 retirees and spouses . 0 726399 726078 Rosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , " Breyer said to tumultuous cheers in the courtroom . " Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . " 1 533903 533818 " We are committed to helping the Iraqi people get on the path to a free society , " Rumsfeld said in a speech to the Council on Foreign Relations . " We are committed to helping the Iraqi people get on the path to a free society , " he said . 1 1166473 1166857 Mr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money . Young said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money . 1 144089 143697 The 12-nation currency has risen by 33 percent against the dollar over the past 15 months . The euro is up 9 percent against the dollar in the past six weeks . 1 3439854 3439874 In February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing . The officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges . 1 3464314 3464302 I was surprised it turned out me talking and the president just listening . " I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . " 1 2008984 2009175 The state 's House delegation currently consists of 17 Democrats and 15 Republicans . Democrats hold a 17-15 edge in the state 's U.S. House delegation . 0 816867 816831 Freddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board . He replaces Leland Brendsel , 61 , who retired as chairman and chief executive . 1 192285 192327 We 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting . " We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . " 1 2688145 2688162 In that position , Elias will report to Joe Tucci , president and CEO of EMC . As executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive . 1 3294207 3294290 But with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made . But with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made . 0 205100 205145 A pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote . Miodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent . 0 3242051 3241897 Mr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board . Kerkorian and Tracinda had also tried to take over Chrysler in 1995 . 0 1076861 1077018 Glover spoke at a news conference that included about 20 relatives of the victims . About 20 family members of the victims were invited to the news conference . 1 2095803 2095786 Drax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe . Drax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe . 1 2112330 2112376 But I would rather be talking about high standards than low standards . " " I would rather be talking about positive numbers rather than negative . 1 3389318 3389271 It was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew . It was still not known exactly how many people were on the plane , which could carry 141 passengers and crew . 1 698948 698933 The market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March . The market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March . 1 539585 539355 Witnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew . Witnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew . 1 684848 684557 As Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times . As he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted " Takbir ! " , or " Proclaim ! " , a religious rallying cry . 1 347017 347002 In hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet . In hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty . 1 1592037 1592076 In a statement , Lee said he " no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . " Spike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture " Spike TV , " according to a statement read in court Tuesday . 0 3013483 3013540 Singapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries . HAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia . 1 2020252 2020081 The worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about . The worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July . 0 2614947 2614904 The premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 . The premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 . 0 1744257 1744378 In the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion . In the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share . 0 1119721 1119714 Sony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning . Its capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning . 1 1186754 1187056 Amazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history . Amazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history . 1 2842562 2842582 The show 's closure affected third-quarter earnings per share by a penny . The company said this impacted earnings by a penny a share . 0 431076 431242 After the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances . The committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd . 1 1393764 1393984 It 's been a busy couple of days for security gurus assigned to keep their companies safe and sound . It 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound . 0 2916199 2916164 Lu reclined in a soft chair wearing a woolly coat near the blackened capsule . " It 's great to be back home , " said Lu , dressed in a woolly coat near the blackened capsule . 1 2530671 2530542 Gov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 . After Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs . 1 219064 218969 " It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , " he said . " It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , " Parsons told reporters at NASA headquarters . 0 2377289 2377259 Estonia 's place in the European mainstream and safeguard its independence regained in 1991 . Estonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 . 0 2110220 2110199 Franklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center . A county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center . 0 1864253 1863810 Police suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs . Nobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs . 0 3150803 3150839 During this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations . During the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states . 0 969381 969512 The technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 . The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . 1 271891 271839 Sony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots . It also features a 4.5 in back-lit LCD screen and memory expansion facilities . 0 2829648 2829613 Clinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill . Two Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans . 1 886904 887158 Some of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit . Some of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit . 0 2632692 2632767 Wal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County . At least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County . 1 2240399 2240149 Cintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process . Cintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process . 1 805457 805985 The opposition would resort to rolling mass action " at strategic times of our choice and without warning to the dictatorship , " he said . " From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , " he said . 1 2896308 2896334 Federal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 . He said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 . 1 2110775 2110924 Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering . Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario . 1 1762569 1762526 Hester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers . Hester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers . 0 2706154 2706185 The other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said . After the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said . 1 1057995 1057778 The hearing , expected to last a week , will determine whether Akbar faces a court-martial . The purpose of the hearing is to determine whether Akbar should be court-martialled . 1 1386884 1386857 He said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed . He said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed . 1 3093023 3092996 Speaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried . Brigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney . 1 1661381 1661317 " Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . " Close cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said . 0 2926039 2925982 The mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks . The parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month . 0 637168 637447 We strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community . McBride characterized Novell 's move as " a desperate measure to curry favor with the Linux community . " 1 696677 696932 After more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday . After more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion . 1 3122429 3122305 Mr Russell , 46 , a coal miner from Brisbane , said : " They are obviously hurting , so we are basically going over there to help them . " " They are obviously hurting so we are basically going over there to help them , " Russell , 46 , said . 1 1348909 1348954 The New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years . The former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on . 0 162203 162101 It does not affect the current Windows Media Player 9.0 Series . Windows Media Player has had security problems before . 0 71501 71627 The seizure took place at 4 a.m. on March 18 , just hours before the first American air assault . The time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital . 1 2907762 2907649 Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively . Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent . 1 2167771 2167744 In May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown . Last May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood . 1 3320577 3320553 " I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , " he said . " If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . " 1 849291 849442 IBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets . IBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets . 0 763948 763991 Costa 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final . Costa will play Juan Carlos Ferrero next in a rematch of last year 's final . 1 1908763 1908744 A former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year . A former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics . 0 1876120 1876059 Thyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat . Thyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too . 1 518089 518133 Judge Craig Doran said it wasn 't his role to determine if Hovan was " an evil man " but maintained that " he has committed an evil act . " Judge Craig Doran said he couldn 't determine if Hovan was " an evil man " but said he " has committed an evil act . " 0 224932 224868 The Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange . Shares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading . 1 1771131 1771091 It also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip . The S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip . 0 2728425 2728251 It decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency . It decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status . 0 953733 953537 Altria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser . Its shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser . 1 349215 349241 It will be followed in November by a third movie , " The Matrix Revolutions . " The film is the second of a trilogy , which will wrap up in November with " The Matrix Revolutions . " 1 2919853 2919804 Massachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading . State and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal . 1 954526 954607 He is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise . He is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard . 1 69773 69792 Cisco pared spending to compensate for sluggish sales . In response to sluggish sales , Cisco pared spending . 0 2823575 2823513 The study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said . The study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research . 1 2455942 2455978 My decision today is not based on any one event . " Governor Rowland said his decision was " not based on any one event . " 1 131979 131957 Nelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death . Nelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death . 0 2010705 2010779 " The government elements who have been causing trouble are still in place . The government elements who have been causing trouble are still in place , they are attacking us . " 1 54142 53641 Next Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . Around the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . 1 1015249 1015204 Wal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations . Wal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations . 0 753928 753890 The patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 3022833 3023029 Peterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying . Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . 0 751520 751373 SPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems . The DirectBand network was developed with the assistance of SCA Data Systems . 0 218848 218851 He replaces Ron Dittemore , who announced his resignation in April . Dittemore announced his plans to resign on April 23 . 1 3181118 3181443 Detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended . Shortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development . 1 515581 515752 They were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches . He said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches . 1 347022 347003 Taiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket . Taiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April . 1 3311600 3311633 Mr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty . Rowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan . 0 3439114 3439084 Ross Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue . Ross Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal . 0 487951 488007 The euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session . The euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session . 0 314997 315030 On the stand Wednesday , she said she was referring only to the kissing . On the stand Wednesday , she testified that she was referring to the kissing before the alleged rape . 0 4733 4557 Garner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader . The group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader . 1 2820371 2820525 Blair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union . Blair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week . 1 801552 801516 " There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , " Baker said . " There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills . 1 1704987 1705268 Charles O. Prince , 53 , was named as Mr. Weill 's successor . Mr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor . 1 396041 396188 Officials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . Canadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . 0 1014983 1014963 GE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 2320654 2320666 The Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . The Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . 1 1057876 1057778 The hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding . The purpose of the hearing is to determine whether Akbar should be court-martialled . 0 2116843 2116883 In the United States , heart attacks kill about 460,000 year , in Canada about 80,000 . In the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health . 1 1461629 1461781 Ninety-five percent of international cargo to the United States is carried by ship . Ships carry 95 percent of international cargo to the United States . 0 374015 374162 " It 's a major victory for Maine , and it 's a major victory for other states . The Maine program could be a model for other states . 1 2493369 2493428 News that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street . News that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday . 1 490355 490378 They note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher . After several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery . 1 2691044 2691264 Most economists had expected a more dire report , with many anticipating the fifth month of job losses in six months . Most economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September . 1 1831453 1831491 But software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million . License sales , a key measure of demand , fell 21 percent to $ 107.6 million . 1 2380695 2380822 King , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters . Stephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation . 1 2577517 2577531 The Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission . The natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC . 1 3267026 3266930 The steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned . The U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned . 1 360875 360943 Business Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday . BusinessWeek Online has learned that the settlement could come as early as Monday , May 19 . 1 162632 162653 Only one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site . Only one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site . 1 1128884 1128865 Shares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 . Since the initial takeover offer , Salix shares have risen about 35 percent . 1 3264732 3264648 The jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself . The quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself . 1 1721433 1721267 It 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season . It 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season . 0 146112 146127 The broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 . The technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 . 1 389117 389052 The company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States . McDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States . 1 872784 872834 Gregory Parseghian , a former investment banker , was appointed chief executive . Greg Parseghian was appointed the new chief executive . 0 2977500 2977547 Their contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 . " It has outraged the membership , " said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 . 1 3107137 3107119 But plaque volume increased by 2.7 percent in pravastatin patients . The volume of plaque in Pravachol patients ' arteries rose by 3 % . 1 1619244 1619274 Today in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores . Tomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores . 0 3061836 3062031 The S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points . On the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points . 1 485999 486011 Ex-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' . In Soviet times the Beatles ' music " was considered propaganda of an alien ideology . ================================================ FILE: src/examples/pytorch/bert_tutorial/parallel.py ================================================ from concurrent import futures import torch import torch.neuron import os from time import time from queue import Queue import warnings def consumer(model, input_queue): while True: inputs, input_id, callback_fn = input_queue.get() input_queue.task_done() # Stop execution if stopping condition is recieved if inputs == "stop": break start = time() results = model(*inputs) # Make the output iterable - if it is not already a tuple or list if not isinstance(results, tuple) or isinstance(results, list): results = [results] end = time() if callback_fn is not None: callback_fn(results, input_id, start, end) class NeuronSimpleDataParallel(): def __init__(self, model_file, num_neuron_cores, batch_size=1): self.num_neuron_cores = num_neuron_cores self.batch_size = batch_size os.environ['NEURON_RT_NUM_CORES'] = str(num_neuron_cores) # Construct a list of models self.models = [torch.jit.load(model_file) for i in range(num_neuron_cores)] # Create shared input queue self.input_queue = Queue(maxsize=num_neuron_cores*16) self.executor = futures.ThreadPoolExecutor( max_workers=num_neuron_cores) def eval(self): for model in self.models: model.eval() def train(self): for model in self.models: model.train() def start_continuous_inference(self): for model in self.models: self.executor.submit(consumer, model, self.input_queue) def infer(self, batch, input_id, callback_fn): self.input_queue.put((batch, input_id, callback_fn)) def stop(self): for _ in range(self.num_neuron_cores): self.input_queue.put(("stop", -1, None)) ================================================ FILE: src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Compiling and Deploying HuggingFace Pretrained BERT\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction\n", "\n", "In this tutorial we will compile and deploy BERT-base version of HuggingFace 🤗 Transformers BERT for Inferentia. The full list of HuggingFace's pretrained BERT models can be found in the BERT section on this page https://huggingface.co/transformers/pretrained_models.html. \n", "\n", "This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. The compile part of this tutorial requires inf1.6xlarge and not the inference itself. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Dependencies:\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuron`\n", "- `neuron-cc[tensorflow]`\n", "- `transformers`\n", "\n", "Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install --upgrade \"transformers==4.6.0\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compile the model into an AWS Neuron optimized TorchScript\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tensorflow # to workaround a protobuf version conflict issue\n", "import torch\n", "import torch.neuron\n", "from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\n", "import transformers\n", "import os\n", "import warnings\n", "\n", "# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\n", "num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\n", "os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)\n", "\n", "# Build tokenizer and model\n", "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\n", "model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n", "\n", "# Setup some example inputs\n", "sequence_0 = \"The company HuggingFace is based in New York City\"\n", "sequence_1 = \"Apples are especially bad for your health\"\n", "sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n", "\n", "max_length=128\n", "paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "\n", "# Run the original PyTorch model on compilation exaple\n", "paraphrase_classification_logits = model(**paraphrase)[0]\n", "\n", "# Convert example inputs to a format that is compatible with TorchScript tracing\n", "example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\n", "example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']\n", "\n", "# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\n", "model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)\n", "\n", "# Verify the TorchScript works on both example inputs\n", "paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\n", "not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\n", "\n", "# Save the TorchScript for later use\n", "model_neuron.save('bert_neuron.pt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may inspect `model_neuron.graph` to see which part is running on CPU versus running on the accelerator. All native `aten` operators in the graph will be running on CPU." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(model_neuron.graph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Deploy the AWS Neuron optimized TorchScript\n", "\n", "To deploy the AWS Neuron optimized TorchScript, you may choose to load the saved TorchScript from disk and skip the slow compilation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load TorchScript back\n", "model_neuron = torch.jit.load('bert_neuron.pt')\n", "# Verify the TorchScript works on both example inputs\n", "paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\n", "not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\n", "classes = ['not paraphrase', 'paraphrase']\n", "paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()\n", "not_paraphrase_prediction = not_paraphrase_classification_logits_neuron[0][0].argmax().item()\n", "print('BERT says that \"{}\" and \"{}\" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))\n", "print('BERT says that \"{}\" and \"{}\" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's run the model in parallel on four cores" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_input_with_padding(batch, batch_size, max_length):\n", " ## Reformulate the batch into three batch tensors - default batch size batches the outer dimension\n", " encoded = batch['encoded']\n", " inputs = torch.squeeze(encoded['input_ids'], 1)\n", " attention = torch.squeeze(encoded['attention_mask'], 1)\n", " token_type = torch.squeeze(encoded['token_type_ids'], 1)\n", " quality = list(map(int, batch['quality']))\n", "\n", " if inputs.size()[0] != batch_size:\n", " print(\"Input size = {} - padding\".format(inputs.size()))\n", " remainder = batch_size - inputs.size()[0]\n", " zeros = torch.zeros( [remainder, max_length], dtype=torch.long )\n", " inputs = torch.cat( [inputs, zeros] )\n", " attention = torch.cat( [attention, zeros] )\n", " token_type = torch.cat( [token_type, zeros] )\n", "\n", " assert(inputs.size()[0] == batch_size and inputs.size()[1] == max_length)\n", " assert(attention.size()[0] == batch_size and attention.size()[1] == max_length)\n", " assert(token_type.size()[0] == batch_size and token_type.size()[1] == max_length)\n", "\n", " return (inputs, attention, token_type), quality\n", "\n", "def count(output, quality):\n", " assert output.size(0) >= len(quality)\n", " correct_count = 0\n", " count = len(quality)\n", " \n", " batch_predictions = [ row.argmax().item() for row in output ]\n", "\n", " for a, b in zip(batch_predictions, quality):\n", " if int(a)==int(b):\n", " correct_count += 1\n", "\n", " return correct_count, count" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data parallel inference\n", "In the below cell, we use the data parallel approach for inference. In this approach, we load multiple models, all of them running in parallel. Each model is loaded onto a single NeuronCore. In the below implementation, we launch 16 models, thereby utilizing all the 16 cores on an inf1.6xlarge.\n", "\n", "> Note: Now if you try to decrease the num_cores in the above cells, please restart the notebook and run `!sudo rmmod neuron; sudo modprobe neuron` step in cell 2 to clear the Neuron cores.\n", "\n", "Since, we can run more than 1 model concurrently, the throughput for the system goes up. To achieve maximum gain in throughput, we need to efficiently feed the models so as to keep them busy at all times. In the below setup, this is done by using a producer-consumer model. We maintain a common python queue shared across all the models. The common queue enables feeding data continuously to the models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from parallel import NeuronSimpleDataParallel\n", "from bert_benchmark_utils import BertTestDataset, BertResults\n", "import time\n", "import functools\n", "\n", "max_length = 128\n", "num_cores = 16\n", "batch_size = 1\n", "\n", "tsv_file=\"glue_mrpc_dev.tsv\"\n", "\n", "data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\n", "data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\n", "\n", "#Result aggregation class (code in bert_benchmark_utils.py)\n", "results = BertResults(batch_size, num_cores)\n", "def result_handler(output, result_id, start, end, input_dict):\n", " correct_count, inference_count = count(output[0], input_dict.pop(result_id))\n", " elapsed = end - start\n", " results.add_result(correct_count, inference_count, [elapsed], [end], [start])\n", "\n", "parallel_neuron_model = NeuronSimpleDataParallel('bert_neuron.pt', num_cores)\n", "\n", "#Starting the inference threads\n", "parallel_neuron_model.start_continuous_inference()\n", "\n", "# Warm up the cores\n", "z = torch.zeros( [batch_size, max_length], dtype=torch.long )\n", "batch = (z, z, z)\n", "for _ in range(num_cores*4):\n", " parallel_neuron_model.infer(batch, -1, None)\n", " \n", "input_dict = {}\n", "input_id = 0\n", "for _ in range(30):\n", " for batch in data_loader:\n", " batch, quality = get_input_with_padding(batch, batch_size, max_length)\n", " input_dict[input_id] = quality\n", " callback_fn = functools.partial(result_handler, input_dict=input_dict)\n", " parallel_neuron_model.infer(batch, input_id, callback_fn)\n", " input_id+=1\n", "\n", "# Stop inference \n", "parallel_neuron_model.stop()\n", "\n", "\n", "with open(\"benchmark.txt\", \"w\") as f:\n", " results.report(f, window_size=1)\n", "\n", "with open(\"benchmark.txt\", \"r\") as f:\n", " for line in f:\n", " print(line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now recompile with a larger batch size of six sentence pairs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "batch_size = 6\n", "\n", "example_inputs_paraphrase = (\n", " torch.cat([paraphrase['input_ids']] * batch_size,0), \n", " torch.cat([paraphrase['attention_mask']] * batch_size,0), \n", " torch.cat([paraphrase['token_type_ids']] * batch_size,0)\n", ")\n", "\n", "# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\n", "model_neuron_batch = torch.neuron.trace(model, example_inputs_paraphrase)\n", "\n", "## Save the batched model\n", "model_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rerun inference with batch 6" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from parallel import NeuronSimpleDataParallel\n", "from bert_benchmark_utils import BertTestDataset, BertResults\n", "import time\n", "import functools\n", "\n", "max_length = 128\n", "num_cores = 16\n", "batch_size = 6\n", "\n", "data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\n", "data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\n", "\n", "#Result aggregation class (code in bert_benchmark_utils.py)\n", "results = BertResults(batch_size, num_cores)\n", "def result_handler(output, result_id, start, end, input_dict):\n", " correct_count, inference_count = count(output[0], input_dict.pop(result_id))\n", " elapsed = end - start\n", " results.add_result(correct_count, inference_count, [elapsed], [end], [start])\n", "\n", "parallel_neuron_model = NeuronSimpleDataParallel('bert_neuron_b{}.pt'.format(batch_size), num_cores)\n", "\n", "#Starting the inference threads\n", "parallel_neuron_model.start_continuous_inference()\n", "\n", "# Adding to the input queue to warm all cores\n", "z = torch.zeros( [batch_size, max_length], dtype=torch.long )\n", "batch = (z, z, z)\n", "for _ in range(num_cores*4):\n", " parallel_neuron_model.infer(batch, -1, None)\n", "\n", "input_dict = {}\n", "input_id = 0\n", "for _ in range(30):\n", " for batch in data_loader:\n", " batch, quality = get_input_with_padding(batch, batch_size, max_length)\n", " input_dict[input_id] = quality\n", " callback_fn = functools.partial(result_handler, input_dict=input_dict)\n", " parallel_neuron_model.infer(batch, input_id, callback_fn)\n", " input_id+=1\n", "\n", "# Stop inference \n", "parallel_neuron_model.stop()\n", "\n", "with open(\"benchmark_b{}.txt\".format(batch_size), \"w\") as f:\n", " results.report(f, window_size=1)\n", "\n", "with open(\"benchmark_b{}.txt\".format(batch_size), \"r\") as f:\n", " for line in f:\n", " print(line)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert_shared_weights.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Parallel HuggingFace Pretrained BERT with Weight Sharing (Deduplication)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction\n", "\n", "In this tutorial we will compile and deploy BERT-base version of HuggingFace 🤗 Transformers BERT for Inferentia, with additional demonstration of using Weight Sharing (Deduplication) feature.\n", "\n", "To use the [Weight Sharing (Deduplication) feature](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-configurable-parameters.html#shared-weights-neuron-rt-multi-instance-shared-weights), you must set the Neuron Runtime environmental variable NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS to \"TRUE\" together with the [core placement API](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/api-core-placement.html) (``torch_neuron.experimental.neuron_cores_context()``).\n", "\n", "This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger. The compile part of this tutorial requires inf1.6xlarge and not the inference itself. For simplicity we will run this tutorial on inf1.6xlarge but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Dependencies:\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuron`\n", "- `neuron-cc[tensorflow]`\n", "- `transformers`\n", "\n", "Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install --upgrade \"transformers==4.6.0\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compile the model into an AWS Neuron optimized TorchScript\n", "\n", "This step compiles the model into an AWS Neuron optimized TorchScript, and saves it in the filed ``bert_neuron.pt``. This step is the same as the pretrained BERT tutorial without Shared Weights feature. We use batch 1 for simplicity." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import tensorflow # to workaround a protobuf version conflict issue\n", "import torch\n", "import torch.neuron\n", "from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig\n", "import transformers\n", "import os\n", "import warnings\n", "\n", "\n", "# Build tokenizer and model\n", "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\n", "model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n", "\n", "# Setup some example inputs\n", "sequence_0 = \"The company HuggingFace is based in New York City\"\n", "sequence_1 = \"Apples are especially bad for your health\"\n", "sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n", "\n", "max_length=128\n", "paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "\n", "# Run the original PyTorch model on compilation exaple\n", "paraphrase_classification_logits = model(**paraphrase)[0]\n", "\n", "# Convert example inputs to a format that is compatible with TorchScript tracing\n", "example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\n", "example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']\n", "\n", "# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\n", "model_neuron = torch.neuron.trace(model, example_inputs_paraphrase)\n", "\n", "# Verify the TorchScript works on both example inputs\n", "paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\n", "not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\n", "\n", "# Save the TorchScript for later use\n", "model_neuron.save('bert_neuron.pt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### Deploy the AWS Neuron optimized TorchScript\n", "\n", "To deploy the AWS Neuron optimized TorchScript, you may choose to load the saved TorchScript from disk and skip the slow compilation. This step is the same as the pretrained BERT tutorial without Shared Weights feature" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load TorchScript back\n", "model_neuron = torch.jit.load('bert_neuron.pt')\n", "# Verify the TorchScript works on both example inputs\n", "paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase)\n", "not_paraphrase_classification_logits_neuron = model_neuron(*example_inputs_not_paraphrase)\n", "classes = ['not paraphrase', 'paraphrase']\n", "paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item()\n", "not_paraphrase_prediction = not_paraphrase_classification_logits_neuron[0][0].argmax().item()\n", "print('BERT says that \"{}\" and \"{}\" are {}'.format(sequence_0, sequence_2, classes[paraphrase_prediction]))\n", "print('BERT says that \"{}\" and \"{}\" are {}'.format(sequence_0, sequence_1, classes[not_paraphrase_prediction]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define two helper functions to pad input and to count correct results." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def get_input_with_padding(batch, batch_size, max_length):\n", " ## Reformulate the batch into three batch tensors - default batch size batches the outer dimension\n", " encoded = batch['encoded']\n", " inputs = torch.squeeze(encoded['input_ids'], 1)\n", " attention = torch.squeeze(encoded['attention_mask'], 1)\n", " token_type = torch.squeeze(encoded['token_type_ids'], 1)\n", " quality = list(map(int, batch['quality']))\n", "\n", " if inputs.size()[0] != batch_size:\n", " print(\"Input size = {} - padding\".format(inputs.size()))\n", " remainder = batch_size - inputs.size()[0]\n", " zeros = torch.zeros( [remainder, max_length], dtype=torch.long )\n", " inputs = torch.cat( [inputs, zeros] )\n", " attention = torch.cat( [attention, zeros] )\n", " token_type = torch.cat( [token_type, zeros] )\n", "\n", " assert(inputs.size()[0] == batch_size and inputs.size()[1] == max_length)\n", " assert(attention.size()[0] == batch_size and attention.size()[1] == max_length)\n", " assert(token_type.size()[0] == batch_size and token_type.size()[1] == max_length)\n", "\n", " return (inputs, attention, token_type), quality\n", "\n", "def count(output, quality):\n", " assert output.size(0) >= len(quality)\n", " correct_count = 0\n", " count = len(quality)\n", " \n", " batch_predictions = [ row.argmax().item() for row in output ]\n", "\n", " for a, b in zip(batch_predictions, quality):\n", " if int(a)==int(b):\n", " correct_count += 1\n", "\n", " return correct_count, count" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data parallel inference\n", "In the below cell, we use the data parallel approach for inference. In this approach, we load multiple models, all of them running in parallel. Each model is loaded onto a single NeuronCore via the core placement API (``torch_neuron.experimental.neuron_cores_context()``). We also set Neuron Runtime environment variable ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS`` to \"TRUE\" as required to use the Weight Sharing feature.\n", "\n", "In the below implementation, we launch 16 models, thereby utilizing all the 16 cores on an inf1.6xlarge.\n", "\n", "> Note: Now if you try to decrease the num_cores in the below cells, please restart the notebook and run `!sudo rmmod neuron; sudo modprobe neuron` step in cell 2 to clear the Neuron cores.\n", "\n", "Since, we can run more than 1 model concurrently, the throughput for the system goes up. To achieve maximum gain in throughput, we need to efficiently feed the models so as to keep them busy at all times. In the below setup, we use parallel threads to feed data continuously to the models.\n", "\n", "When running the cell below, you can monitor the Inferentia device activities by running ``neuron-top`` in another terminal. You will see that \"Device Used Memory\" is 1.6GB total, and the model instance loaded onto NeuronDevice 0 NeuronCore 0 uses the most device memory (272MB) while the other model instances loaded onto other NeuronCores use less device memory (92MB). This shows the effect of using Shared Weights as the device memory usage is lower. If you change ``NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS`` to \"FALSE\" you will see that \"Device Used Memory\" is 3.2GB, and the model instances loaded onto NeuronDevice 0 NeuronCore 0 and 1 use the most device memory (360MB) while the other model instances now use 180MB each." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from bert_benchmark_utils import BertTestDataset, BertResults\n", "import time\n", "import functools\n", "import os\n", "import torch.neuron as torch_neuron\n", "from concurrent import futures\n", "\n", "# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\n", "num_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\n", "os.environ['NEURON_RT_NUM_CORES'] = str(num_cores)\n", "os.environ['NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS'] = 'TRUE'\n", "#os.environ['NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS'] = 'FALSE'\n", "\n", "max_length = 128\n", "num_cores = 16\n", "batch_size = 1\n", "\n", "tsv_file=\"glue_mrpc_dev.tsv\"\n", "\n", "data_set = BertTestDataset( tsv_file=tsv_file, tokenizer=tokenizer, max_length=max_length )\n", "data_loader = torch.utils.data.DataLoader(data_set, batch_size=batch_size, shuffle=True)\n", "\n", "#Result aggregation class (code in bert_benchmark_utils.py)\n", "results = BertResults(batch_size, num_cores)\n", "def result_handler(output, result_id, start, end, input_dict):\n", " correct_count, inference_count = count(output[0], input_dict.pop(result_id))\n", " elapsed = end - start\n", " results.add_result(correct_count, inference_count, [elapsed], [end], [start])\n", "\n", "with torch_neuron.experimental.neuron_cores_context(start_nc=0, nc_count=num_cores):\n", " model = torch.jit.load('bert_neuron.pt')\n", "\n", "# Warm up the cores\n", "z = torch.zeros( [batch_size, max_length], dtype=torch.long )\n", "batch = (z, z, z)\n", "for _ in range(num_cores*4):\n", " model(*batch)\n", "\n", "# Prepare the input data\n", "batch_list = []\n", "for batch in data_loader:\n", " batch, quality = get_input_with_padding(batch, batch_size, max_length)\n", " batch_list.append((batch, quality))\n", "\n", "# One thread running a model on one core\n", "def one_thread(feed_data, quality):\n", " start = time.time()\n", " result = model(*feed_data)\n", " end = time.time() \n", " return result[0], quality, start, end\n", "\n", "# Launch more threads than models/cores to keep them busy\n", "processes = []\n", "with futures.ThreadPoolExecutor(max_workers=num_cores*2) as executor:\n", " # extra loops to help you see activities in neuron-top\n", " for _ in range(10):\n", " for input_id, (batch, quality) in enumerate(batch_list):\n", " processes.append(executor.submit(one_thread, batch, quality))\n", "\n", "results = BertResults(batch_size, num_cores)\n", "for _ in futures.as_completed(processes): \n", " (output, quality, start, end) = _.result() \n", " correct_count, inference_count = count(output, quality)\n", " results.add_result(correct_count, inference_count, [start - end], [start], [end])\n", "\n", "with open(\"benchmark.txt\", \"w\") as f:\n", " results.report(f, window_size=1)\n", "\n", "with open(\"benchmark.txt\", \"r\") as f:\n", " for line in f:\n", " print(line)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python (torch-neuron)", "language": "python", "name": "aws_neuron_venv_pytorch_inf1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/pytorch/byoc_sm_bert_tutorial/code/inference.py ================================================ import os import json import tensorflow # to workaround a protobuf version conflict issue import torch import torch.neuron from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig JSON_CONTENT_TYPE = 'application/json' def model_fn(model_dir): tokenizer_init = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model_file =os.path.join(model_dir, 'neuron_compiled_model.pt') model_neuron = torch.jit.load(model_file) # print("using {}".format(model_file)) return (model_neuron, tokenizer_init) def input_fn(serialized_input_data, content_type=JSON_CONTENT_TYPE): if content_type == JSON_CONTENT_TYPE: input_data = json.loads(serialized_input_data) # print(input_data) return input_data else: raise Exception('Requested unsupported ContentType in Accept: ' + content_type) return def predict_fn(input_data, models): # print('Got input Data: {}'.format(input_data)) model_bert, tokenizer = models sequence_0 = input_data[0] sequence_1 = input_data[1] max_length=128 paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") # Convert example inputs to a format that is compatible with TorchScript tracing example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids'] # Verify the TorchScript works on example inputs paraphrase_classification_logits_neuron = model_bert(*example_inputs_paraphrase) classes = ['not paraphrase', 'paraphrase'] paraphrase_prediction = paraphrase_classification_logits_neuron[0][0].argmax().item() out_str = 'BERT says that "{}" and "{}" are {}'.format(sequence_0, sequence_1, classes[paraphrase_prediction]) return out_str def output_fn(prediction_output, accept=JSON_CONTENT_TYPE): if accept == JSON_CONTENT_TYPE: return json.dumps(prediction_output), accept raise Exception('Requested unsupported ContentType in Accept: ' + accept) ================================================ FILE: src/examples/pytorch/byoc_sm_bert_tutorial/container/Dockerfile ================================================ FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuron:1.7.1-neuron-py36-ubuntu18.04 # Install packages RUN pip install "transformers==4.7.0" # CMD ["/usr/local/bin/entrypoint.sh"] ================================================ FILE: src/examples/pytorch/byoc_sm_bert_tutorial/sagemaker_container_neuron.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "4674f667", "metadata": {}, "source": [ "# Deploy a pretrained PyTorch BERT model from HuggingFace on Amazon SageMaker with Neuron container" ] }, { "cell_type": "markdown", "id": "b3e39838", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "id": "a92c454f", "metadata": {}, "source": [ "In this tutotial we will deploy on SageMaker a pretraine BERT Base model from HuggingFace Transformers, using the [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers). We will use the same model as shown in the [Neuron Tutorial \"PyTorch - HuggingFace Pretrained BERT Tutorial\"](../../../../frameworks/torch/torch-neuronx/tutorials/training/bert.html#). We will compile the model and build a custom AWS Deep Learning Container, to include the HuggingFace Transformers Library. \n", "\n", "This Jupyter Notebook should run on a ml.c5.4xlarge SageMaker Notebook instance. You can set up your SageMaker Notebook instance by following the [Get Started with Amazon SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html) documentation. \n", "\n", "> We recommend increasing the size of the base root volume of you SM notebook instance, to accomodate the models and containers built locally. A root volume of 10Gb should suffice. \n" ] }, { "cell_type": "markdown", "id": "37445ad2", "metadata": {}, "source": [ "## Install Dependencies:" ] }, { "cell_type": "markdown", "id": "3ecd765f", "metadata": {}, "source": [ "This tutorial requires the following pip packages:" ] }, { "cell_type": "markdown", "id": "cae3092c", "metadata": {}, "source": [ "- torch-neuron\n", "- neuron-cc[tensorflow]\n", "- transformers" ] }, { "cell_type": "code", "execution_count": null, "id": "066c3731", "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch --extra-index-url=https://pip.repos.neuron.amazonaws.com\n", "!pip install --upgrade --no-cache-dir 'transformers==4.6.0'" ] }, { "cell_type": "markdown", "id": "a4796d3a", "metadata": {}, "source": [ "## Compile the model into an AWS Neuron optimized TorchScript" ] }, { "cell_type": "code", "execution_count": null, "id": "6fe85f8e", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuron\n", "\n", "from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig" ] }, { "cell_type": "code", "execution_count": null, "id": "0c5c253a", "metadata": {}, "outputs": [], "source": [ "# Build tokenizer and model\n", "tokenizer = AutoTokenizer.from_pretrained(\"bert-base-cased-finetuned-mrpc\")\n", "model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-cased-finetuned-mrpc\", return_dict=False)\n", "\n", "# Setup some example inputs\n", "sequence_0 = \"The company HuggingFace is based in New York City\"\n", "sequence_1 = \"Apples are especially bad for your health\"\n", "sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n", "\n", "max_length=128\n", "paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors=\"pt\")\n", "\n", "# Run the original PyTorch model on compilation exaple\n", "paraphrase_classification_logits = model(**paraphrase)[0]\n", "\n", "# Convert example inputs to a format that is compatible with TorchScript tracing\n", "example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']\n", "example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']" ] }, { "cell_type": "code", "execution_count": null, "id": "44255ada", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron\n", "# This step may need 3-5 min\n", "model_neuron = torch.neuron.trace(model, example_inputs_paraphrase, verbose=1, compiler_workdir='./compilation_artifacts')" ] }, { "cell_type": "markdown", "id": "5c4752ac", "metadata": {}, "source": [ "You may inspect **model_neuron.graph** to see which part is running on CPU versus running on the accelerator. All native **aten** operators in the graph will be running on CPU." ] }, { "cell_type": "code", "execution_count": null, "id": "dc00889e", "metadata": {}, "outputs": [], "source": [ "# See which part is running on CPU versus running on the accelerator.\n", "print(model_neuron.graph)" ] }, { "cell_type": "markdown", "id": "775fb30d", "metadata": {}, "source": [ "Save the compiled model, so it can be packaged and sent to S3." ] }, { "cell_type": "code", "execution_count": null, "id": "027c4f53", "metadata": {}, "outputs": [], "source": [ "# Save the TorchScript for later use\n", "model_neuron.save('neuron_compiled_model.pt')" ] }, { "cell_type": "markdown", "id": "d362c579", "metadata": {}, "source": [ "### Package the pre-trained model and upload it to S3\n", "\n", "To make the model available for the SageMaker deployment, you will TAR the serialized graph and upload it to the default Amazon S3 bucket for your SageMaker session. " ] }, { "cell_type": "code", "execution_count": null, "id": "29c7f7b4", "metadata": {}, "outputs": [], "source": [ "# Now you'll create a model.tar.gz file to be used by SageMaker endpoint\n", "!tar -czvf model.tar.gz neuron_compiled_model.pt" ] }, { "cell_type": "code", "execution_count": null, "id": "1beadca0", "metadata": {}, "outputs": [], "source": [ "import boto3\n", "import time\n", "from sagemaker.utils import name_from_base\n", "import sagemaker" ] }, { "cell_type": "code", "execution_count": null, "id": "06ad87d4", "metadata": {}, "outputs": [], "source": [ "# upload model to S3\n", "role = sagemaker.get_execution_role()\n", "sess=sagemaker.Session()\n", "region=sess.boto_region_name\n", "bucket=sess.default_bucket()\n", "sm_client=boto3.client('sagemaker')" ] }, { "cell_type": "code", "execution_count": null, "id": "5205ec55", "metadata": {}, "outputs": [], "source": [ "model_key = '{}/model/model.tar.gz'.format('inf1_compiled_model')\n", "model_path = 's3://{}/{}'.format(bucket, model_key)\n", "boto3.resource('s3').Bucket(bucket).upload_file('model.tar.gz', model_key)\n", "print(\"Uploaded model to S3:\")\n", "print(model_path)" ] }, { "cell_type": "markdown", "id": "e8b425d4", "metadata": {}, "source": [ "## Build and Push the container" ] }, { "cell_type": "markdown", "id": "430e6ed2", "metadata": {}, "source": [ "The following shell code shows how to build the container image using docker build and push the container image to ECR using docker push.\n", "The Dockerfile in this example is available in the ***container*** folder.\n", "Here's an example of the Dockerfile:\n", "\n", "```Dockerfile\n", "FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuron:1.7.1-neuron-py36-ubuntu18.04\n", "\n", "# Install packages \n", "RUN pip install \"transformers==4.7.0\"\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "3970025d", "metadata": {}, "outputs": [], "source": [ "!cat container/Dockerfile" ] }, { "cell_type": "markdown", "id": "62f78b0f", "metadata": {}, "source": [ "Before running the next cell, make sure your SageMaker IAM role has access to ECR. If not, you can attache the role `AmazonEC2ContainerRegistryPowerUser` to your IAM role ARN, which allows you to upload image layers to ECR. \n", "\n", "It takes 5 minutes to build docker images and upload image to ECR" ] }, { "cell_type": "code", "execution_count": null, "id": "ecd51acf", "metadata": {}, "outputs": [], "source": [ "%%sh\n", "\n", "# The name of our algorithm\n", "algorithm_name=neuron-py36-inference\n", "\n", "cd container\n", "\n", "account=$(aws sts get-caller-identity --query Account --output text)\n", "\n", "# Get the region defined in the current configuration (default to us-west-2 if none defined)\n", "region=$(aws configure get region)\n", "region=${region:-us-west-2}\n", "\n", "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n", "\n", "# If the repository doesn't exist in ECR, create it.\n", "\n", "aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n", "\n", "if [ $? -ne 0 ]\n", "then\n", " aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n", "fi\n", "\n", "# Get the login command from ECR in order to pull down the SageMaker PyTorch image\n", "aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com\n", "# Build the docker image locally with the image name and then push it to ECR\n", "# with the full name.\n", "docker build -t ${algorithm_name} . --build-arg REGION=${region}\n", "docker tag ${algorithm_name} ${fullname}\n", "\n", "# Get the login command from ECR and execute it directly\n", "aws ecr get-login-password --region ${region} | docker login --username AWS --password-stdin ${account}.dkr.ecr.${region}.amazonaws.com\n", "docker push ${fullname}" ] }, { "cell_type": "markdown", "id": "e4f6bbda", "metadata": {}, "source": [ "## Deploy Container and run inference based on the pretrained model" ] }, { "cell_type": "markdown", "id": "64e65e31", "metadata": {}, "source": [ "To deploy a pretrained PyTorch model, you'll need to use the PyTorch estimator object to create a PyTorchModel object and set a different entry_point.\n", "\n", "You'll use the PyTorchModel object to deploy a PyTorchPredictor. This creates a SageMaker Endpoint -- a hosted prediction service that we can use to perform inference." ] }, { "cell_type": "code", "execution_count": null, "id": "f343d3b1", "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "!{sys.executable} -m pip install Transformers" ] }, { "cell_type": "code", "execution_count": null, "id": "2bd73b77", "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "import sagemaker\n", "\n", "role = sagemaker.get_execution_role()\n", "sess = sagemaker.Session()\n", "\n", "bucket = sess.default_bucket()\n", "prefix = \"inf1_compiled_model/model\"\n", "\n", "# Get container name in ECR\n", "client=boto3.client('sts')\n", "account=client.get_caller_identity()['Account']\n", "\n", "my_session=boto3.session.Session()\n", "region=my_session.region_name\n", "\n", "algorithm_name=\"neuron-py36-inference\"\n", "ecr_image='{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)\n", "print(ecr_image)" ] }, { "cell_type": "markdown", "id": "9298f2a7", "metadata": {}, "source": [ "An implementation of *model_fn* is required for inference script.\n", "We are going to implement our own **model_fn** and **predict_fn** for Hugging Face Bert, and use default implementations of **input_fn** and **output_fn** defined in sagemaker-pytorch-containers.\n", "\n", "In this example, the inference script is put in ***code*** folder. Run the next cell to see it:\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cfea75b6", "metadata": {}, "outputs": [], "source": [ "!pygmentize code/inference.py" ] }, { "cell_type": "markdown", "id": "1b31a7b8", "metadata": {}, "source": [ "Path of compiled pretrained model in S3:" ] }, { "cell_type": "code", "execution_count": null, "id": "61f3556e", "metadata": {}, "outputs": [], "source": [ "key = os.path.join(prefix, \"model.tar.gz\")\n", "pretrained_model_data = \"s3://{}/{}\".format(bucket, key)\n", "print(pretrained_model_data)" ] }, { "cell_type": "markdown", "id": "e7557a5f", "metadata": {}, "source": [ "The model object is defined by using the SageMaker Python SDK's PyTorchModel and pass in the model from the estimator and the entry_point. The endpoint's entry point for inference is defined by model_fn as seen in the previous code block that prints out **inference.py**. The model_fn function will load the model and required tokenizer.\n", "\n", "Note, **image_uri** must be user's own ECR images." ] }, { "cell_type": "code", "execution_count": null, "id": "0bd99768", "metadata": {}, "outputs": [], "source": [ "from sagemaker.pytorch.model import PyTorchModel\n", "\n", "pytorch_model = PyTorchModel(\n", " model_data=pretrained_model_data,\n", " role=role,\n", " source_dir=\"code\",\n", " framework_version=\"1.7.1\",\n", " entry_point=\"inference.py\",\n", " image_uri=ecr_image\n", ")\n", "\n", "# Let SageMaker know that we've already compiled the model via neuron-cc\n", "pytorch_model._is_compiled_model = True" ] }, { "cell_type": "markdown", "id": "67439fe7", "metadata": {}, "source": [ "The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint.\n", "\n", "Here you will deploy the model to a single **ml.inf1.2xlarge** instance.\n", "It may take 6-10 min to deploy." ] }, { "cell_type": "code", "execution_count": null, "id": "d771fc7c", "metadata": {}, "outputs": [], "source": [ "%%time\n", "\n", "predictor = pytorch_model.deploy(initial_instance_count=1, instance_type=\"ml.inf1.2xlarge\")" ] }, { "cell_type": "code", "execution_count": null, "id": "ab6342f3", "metadata": {}, "outputs": [], "source": [ "print(predictor.endpoint_name)" ] }, { "cell_type": "markdown", "id": "059537d9", "metadata": {}, "source": [ "Since in the input_fn we declared that the incoming requests are json-encoded, we need to use a json serializer, to encode the incoming data into a json string. Also, we declared the return content type to be json string, we Need to use a json deserializer to parse the response." ] }, { "cell_type": "code", "execution_count": null, "id": "29e82f90", "metadata": {}, "outputs": [], "source": [ "predictor.serializer = sagemaker.serializers.JSONSerializer()\n", "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()" ] }, { "cell_type": "markdown", "id": "d006ea03", "metadata": {}, "source": [ "Using a list of sentences, now SageMaker endpoint is invoked to get predictions." ] }, { "cell_type": "code", "execution_count": null, "id": "325a87f8", "metadata": {}, "outputs": [], "source": [ "%%time\n", "result = predictor.predict(\n", " [\n", " \"Never allow the same bug to bite you twice.\",\n", " \"The best part of Amazon SageMaker is that it makes machine learning easy.\",\n", " ]\n", ")\n", "print(result)" ] }, { "cell_type": "code", "execution_count": null, "id": "4a12410d", "metadata": {}, "outputs": [], "source": [ "%%time\n", "result = predictor.predict(\n", " [\n", " \"The company HuggingFace is based in New York City\",\n", " \"HuggingFace's headquarters are situated in Manhattan\",\n", " ]\n", ")\n", "print(result)" ] }, { "cell_type": "markdown", "id": "a72dfd16", "metadata": {}, "source": [ "## Benchmarking your endpoint\n", "\n", "The following cells create a load test for your endpoint. You first define some helper functions: `inference_latency` runs the endpoint request, collects cliend side latency and any errors, `random_sentence` builds random to be sent to the endpoint. " ] }, { "cell_type": "code", "execution_count": null, "id": "088d0e75", "metadata": {}, "outputs": [], "source": [ "import numpy as np \n", "import datetime\n", "import math\n", "import time\n", "import boto3 \n", "import matplotlib.pyplot as plt\n", "from joblib import Parallel, delayed\n", "import numpy as np\n", "from tqdm import tqdm\n", "import random" ] }, { "cell_type": "code", "execution_count": null, "id": "038d9953", "metadata": {}, "outputs": [], "source": [ "def inference_latency(model,*inputs):\n", " \"\"\"\n", " infetence_time is a simple method to return the latency of a model inference.\n", "\n", " Parameters:\n", " model: torch model onbject loaded using torch.jit.load\n", " inputs: model() args\n", "\n", " Returns:\n", " latency in seconds\n", " \"\"\"\n", " error = False\n", " start = time.time()\n", " try:\n", " results = model(*inputs)\n", " except:\n", " error = True\n", " results = []\n", " return {'latency':time.time() - start, 'error': error, 'result': results}" ] }, { "cell_type": "code", "execution_count": null, "id": "d6b200ac", "metadata": {}, "outputs": [], "source": [ "def random_sentence():\n", " \n", " s_nouns = [\"A dude\", \"My mom\", \"The king\", \"Some guy\", \"A cat with rabies\", \"A sloth\", \"Your homie\", \"This cool guy my gardener met yesterday\", \"Superman\"]\n", " p_nouns = [\"These dudes\", \"Both of my moms\", \"All the kings of the world\", \"Some guys\", \"All of a cattery's cats\", \"The multitude of sloths living under your bed\", \"Your homies\", \"Like, these, like, all these people\", \"Supermen\"]\n", " s_verbs = [\"eats\", \"kicks\", \"gives\", \"treats\", \"meets with\", \"creates\", \"hacks\", \"configures\", \"spies on\", \"retards\", \"meows on\", \"flees from\", \"tries to automate\", \"explodes\"]\n", " p_verbs = [\"eat\", \"kick\", \"give\", \"treat\", \"meet with\", \"create\", \"hack\", \"configure\", \"spy on\", \"retard\", \"meow on\", \"flee from\", \"try to automate\", \"explode\"]\n", " infinitives = [\"to make a pie.\", \"for no apparent reason.\", \"because the sky is green.\", \"for a disease.\", \"to be able to make toast explode.\", \"to know more about archeology.\"]\n", " \n", " return (random.choice(s_nouns) + ' ' + random.choice(s_verbs) + ' ' + random.choice(s_nouns).lower() or random.choice(p_nouns).lower() + ' ' + random.choice(infinitives))\n", "\n", "print([random_sentence(), random_sentence()])" ] }, { "cell_type": "markdown", "id": "e2945dde", "metadata": {}, "source": [ "The following cell creates `number_of_clients` concurrent threads to run `number_of_runs` requests. Once completed, a `boto3` CloudWatch client will query for the server side latency metrics for comparison. " ] }, { "cell_type": "code", "execution_count": null, "id": "69c047e3", "metadata": {}, "outputs": [], "source": [ "# Defining Auxiliary variables\n", "number_of_clients = 2\n", "number_of_runs = 1000\n", "t = tqdm(range(number_of_runs),position=0, leave=True)\n", "\n", "# Starting parallel clients\n", "cw_start = datetime.datetime.utcnow()\n", "\n", "results = Parallel(n_jobs=number_of_clients,prefer=\"threads\")(delayed(inference_latency)(predictor.predict,[random_sentence(), random_sentence()]) for mod in t)\n", "avg_throughput = t.total/t.format_dict['elapsed']\n", "\n", "cw_end = datetime.datetime.utcnow() \n", "\n", "# Computing metrics and print\n", "latencies = [res['latency'] for res in results]\n", "errors = [res['error'] for res in results]\n", "error_p = sum(errors)/len(errors) *100\n", "p50 = np.quantile(latencies[-1000:],0.50) * 1000\n", "p90 = np.quantile(latencies[-1000:],0.95) * 1000\n", "p95 = np.quantile(latencies[-1000:],0.99) * 1000\n", "\n", "print(f'Avg Throughput: :{avg_throughput:.1f}\\n')\n", "print(f'50th Percentile Latency:{p50:.1f} ms')\n", "print(f'90th Percentile Latency:{p90:.1f} ms')\n", "print(f'95th Percentile Latency:{p95:.1f} ms\\n')\n", "print(f'Errors percentage: {error_p:.1f} %\\n')\n", "\n", "# Querying CloudWatch\n", "print('Getting Cloudwatch:')\n", "cloudwatch = boto3.client('cloudwatch')\n", "statistics=['SampleCount', 'Average', 'Minimum', 'Maximum']\n", "extended=['p50', 'p90', 'p95', 'p100']\n", "\n", "# Give 5 minute buffer to end\n", "cw_end += datetime.timedelta(minutes=5)\n", "\n", "# Period must be 1, 5, 10, 30, or multiple of 60\n", "# Calculate closest multiple of 60 to the total elapsed time\n", "factor = math.ceil((cw_end - cw_start).total_seconds() / 60)\n", "period = factor * 60\n", "print('Time elapsed: {} seconds'.format((cw_end - cw_start).total_seconds()))\n", "print('Using period of {} seconds\\n'.format(period))\n", "\n", "cloudwatch_ready = False\n", "# Keep polling CloudWatch metrics until datapoints are available\n", "while not cloudwatch_ready:\n", " time.sleep(30)\n", " print('Waiting 30 seconds ...')\n", " # Must use default units of microseconds\n", " model_latency_metrics = cloudwatch.get_metric_statistics(MetricName='ModelLatency',\n", " Dimensions=[{'Name': 'EndpointName',\n", " 'Value': predictor.endpoint_name},\n", " {'Name': 'VariantName',\n", " 'Value': \"AllTraffic\"}],\n", " Namespace=\"AWS/SageMaker\",\n", " StartTime=cw_start,\n", " EndTime=cw_end,\n", " Period=period,\n", " Statistics=statistics,\n", " ExtendedStatistics=extended\n", " )\n", " # Should be 1000\n", " if len(model_latency_metrics['Datapoints']) > 0:\n", " print('{} latency datapoints ready'.format(model_latency_metrics['Datapoints'][0]['SampleCount']))\n", " side_avg = model_latency_metrics['Datapoints'][0]['Average'] / number_of_runs\n", " side_p50 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p50'] / number_of_runs\n", " side_p90 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p90'] / number_of_runs\n", " side_p95 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p95'] / number_of_runs\n", " side_p100 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p100'] / number_of_runs\n", " \n", " print(f'50th Percentile Latency:{side_p50:.1f} ms')\n", " print(f'90th Percentile Latency:{side_p90:.1f} ms')\n", " print(f'95th Percentile Latency:{side_p95:.1f} ms\\n')\n", "\n", " cloudwatch_ready = True\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "9035e681", "metadata": {}, "source": [ "### Cleanup\n", "Endpoints should be deleted when no longer in use, to avoid costs." ] }, { "cell_type": "code", "execution_count": null, "id": "1284ef3f", "metadata": {}, "outputs": [], "source": [ "predictor.delete_endpoint(predictor.endpoint)" ] }, { "cell_type": "code", "execution_count": null, "id": "5af53873", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/pytorch/libtorch_demo/bert_neuronx/compile.py ================================================ import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig import transformers import os import warnings from detect_instance import get_instance_type, get_num_neuroncores instance_type = get_instance_type() print(f"Detected instance type: {instance_type}") if 'inf1' in instance_type: print(" - using torch_neuron.trace") from torch_neuron import trace else: print(" - using torch_neuronx.xla_impl.trace") from torch_neuronx.xla_impl.trace import trace print() os.environ['TOKENIZERS_PARALLELISM']='false' batch_size = 6 # Setting up NeuronCore groups for inf1.6xlarge with 16 cores num_cores = get_num_neuroncores(instance_type) print(f"Number of cores = {num_cores}") os.environ['NEURON_RT_NUM_CORES'] = str(num_cores) # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "Apples are especially bad for your health" sequence_2 = "HuggingFace's headquarters are situated in Manhattan" max_length=128 paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") # Convert example inputs to a format that is compatible with TorchScript tracing example_inputs_paraphrase = ( torch.cat([paraphrase['input_ids']] * batch_size,0), torch.cat([paraphrase['attention_mask']] * batch_size,0), torch.cat([paraphrase['token_type_ids']] * batch_size,0) ) # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron try: model_neuron = trace(model, example_inputs_paraphrase) except Exception as e: print(e) print("libtorch_demo: Model tracing failed - check tutorial steps and preconditions") print("libtorch_demo: If this does not resolve your issue - Report a bug at ") print("https://github.com/aws-neuron/aws-neuron-sdk/issues") exit(1) # Verify the TorchScript works on both example inputs try: paraphrase_classification_logits_neuron = model_neuron(*example_inputs_paraphrase) except: print("libtorch_demo: Neuron runtime failed - check tutorial steps and preconditions") print("libtorch_demo: If this does not resolve your issue - Report a bug at ") print("https://github.com/aws-neuron/aws-neuron-sdk/issues") exit(1) # Save the TorchScript for later use model_neuron.save(f'bert_neuron_b{batch_size}.pt') ================================================ FILE: src/examples/pytorch/libtorch_demo/bert_neuronx/detect_instance.py ================================================ import torch import torch_neuronx from typing import Optional INSTANCETYPE_TO_NEURONCORES = { "inf1.xlarge": 4, "inf1.2xlarge": 4, "inf1.6xlarge": 16, "inf2.xlarge": 2, "inf2.8xlarge": 2, "inf2.24xlarge": 12, "inf2.48xlarge": 24, "inf1.24xlarge": 64, "trn1.2xlarge": 2, "trn1.32xlarge": 32, } def get_instance_type() -> str: """Try to obtain the instance type.""" try: from urllib.request import Request, urlopen req = Request("http://169.254.169.254/latest/api/token", method="PUT") req.add_header("X-aws-ec2-metadata-token-ttl-seconds", "21600") with urlopen(req) as response: token = response.read().decode("utf-8") req = Request("http://169.254.169.254/latest/meta-data/instance-type") req.add_header("X-aws-ec2-metadata-token", token) with urlopen(req) as response: instance_type = response.read().decode("utf-8") return instance_type except: # noqa: E722, there are various ways above code can fail and we don't care return None def get_num_neuroncores(instance_type: Optional[str] = None) -> int: """ Try to obtain the maximum number of NeuronCores available on this instance. Args: instance_type: The Neuron instance type. Autodetermined from current instance if not provided. Returns: The number of NeuronCores (or 2 if the type is unknown). """ try: if not instance_type: instance_type = get_instance_type() return INSTANCETYPE_TO_NEURONCORES[instance_type] except KeyError: num_cores = get_num_neuroncores_v3() return num_cores def get_num_neuroncores_v3() -> int: """ Retrieve the number of NeuronCores visible to this process. Returns: The number of visible neuron cores. Raises: RuntimeError: If the Neuron runtime cannot be initialized. This most commonly occurs when executing on an instance with no Neuron devices available or when no Neuron devices are visible to the process. """ runtime = torch.classes.neuron.Runtime() try: nc_count = runtime.get_visible_nc_count() except RuntimeError as e: raise RuntimeError( "Neuron runtime cannot be initialized; cannot determine the number of available NeuronCores" # noqa: E501 ) from e return nc_count ================================================ FILE: src/examples/pytorch/libtorch_demo/clean.sh ================================================ #!/bin/bash echo "Clean up constructed files" rm -rf bert_neuron_b6.pt example-app tokenizers venv/ libtorch/ tokenizers_binding/lib/ tokenizers_binding/venv all_metrics.csv venv ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/README.txt ================================================ AWS NEURON TORCHLIB DEMO FOR C++ ================================ For the full tutorial, please refer to: https://awsdocs-neuron.readthedocs-hosted.com ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/build.sh ================================================ #!/bin/bash # Installation script to build with torch dependency from /usr/local set -x # Find paths for local packages PATH_TOKENIZERS_LIB=../tokenizers_binding/lib PATH_TORCH=../libtorch PATH_TORCH_INC=${PATH_TORCH}/include PATH_TORCH_LIB=${PATH_TORCH}/lib PATH_NEURON_LIB=${PATH_TORCH}/lib if [ ! -e "${PATH_TORCH_LIB}/libnrt.so.1" ] && [ -e "/opt/aws/neuron/lib/libnrt.so.1" ] then PATH_NEURON_LIB=/opt/aws/neuron/lib/ fi g++ utils.cpp example_app.cpp \ -o ../example-app \ -O2 \ -D_GLIBCXX_USE_CXX11_ABI=1 \ -I${PATH_TORCH_INC} \ -L${PATH_TOKENIZERS_LIB} \ -L${PATH_NEURON_LIB} \ -L${PATH_TORCH_LIB} \ -Wl,-rpath,libtorch/lib \ -Wl,-rpath,tokenizers_binding/lib \ -Wl,-rpath,$PATH_NEURON_LIB \ -Wl,-no-as-needed \ -ltokenizers \ -ltorchneuron \ -ltorch_cpu \ -lc10 \ -lpthread \ -lnrt \ -std=c++17 ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/core_count.hpp ================================================ #pragma once /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #ifdef __cplusplus extern "C" { #endif typedef enum { NRT_SUCCESS = 0, NRT_FAILURE = 1, NRT_INVALID = 2, NRT_INVALID_HANDLE = 3, NRT_RESOURCE = 4, NRT_TIMEOUT = 5, NRT_HW_ERROR = 6, NRT_QUEUE_FULL = 7, NRT_LOAD_NOT_ENOUGH_NC = 9, NRT_UNSUPPORTED_NEFF_VERSION = 10, NRT_FAIL_HOST_MEM_ALLOC = 11, NRT_EXEC_BAD_INPUT = 1002, NRT_EXEC_COMPLETED_WITH_NUM_ERR = 1003, NRT_EXEC_COMPLETED_WITH_ERR = 1004, NRT_EXEC_NC_BUSY = 1005, NRT_COLL_PENDING = 1100, } NRT_STATUS; NRT_STATUS nrt_get_total_nc_count(uint32_t *nc_count); #ifdef __cplusplus } #endif ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/example_app.cpp ================================================ #include #include #include #include #include "utils.hpp" #include "core_count.hpp" #include "../tokenizers_binding/remote_rust_tokenizer.h" typedef std::vector> Input; namespace { // some hardcoded parameters that could be read from a config file const size_t seq_len = 128; const size_t batch_size = 6; uint32_t num_neuron_cores = 0; const size_t cores_per_model = 1; const size_t num_runs_per_neuron_core = 2000; // these token ids are particular to a vocabulary, could be parsed from vocab file const long start_token = 101; const long end_token = 102; } // construct a single input: input_ids, attention_mask, and token_type_ids from two input sentences Input get_input(const std::string& sentence_1, const std::string& sentence_2) { // ensure the concatenated sentences + separator tokens do not exceed the compiled sequence length assert(sentence_1.size() + sentence_2.size() + 3 <= seq_len); // tokenize the input sentence using the HuggingFace Tokenizers library std::vector input_ids(seq_len, 0); input_ids[0] = start_token; size_t pos = 1; // current write position in input_ids // tokenize sentence_1 and copy to output buffer std::vector buffer(seq_len, 0); remote_rust_encode(sentence_1.c_str(), buffer.data(), buffer.size()); for (size_t i = 0; i < seq_len && buffer[i]; i++, pos++) { input_ids[pos] = buffer[i]; } // mark end of sentence_1 input_ids[pos++] = end_token; const size_t sentence_2_start = pos; // tokenize sentence_2 and copy to output buffer std::fill(buffer.begin(), buffer.end(), 0); remote_rust_encode(sentence_2.c_str(), buffer.data(), buffer.size()); for (size_t i = 0; i < seq_len && buffer[i]; i++, pos++) { input_ids[pos] = buffer[i]; } // mark end of sentence_2 input_ids[pos++] = end_token; // construct attention mask std::vector attention_mask(seq_len, 0); for (size_t i = 0; i < seq_len; ++i) attention_mask[i] = input_ids[i] ? 1 : 0; // token type ids are 0s for sentence_1 (incl. separators), 1s for sentence_2 std::vector token_type_ids(seq_len, 0); for (size_t i = sentence_2_start; i < seq_len; i++) { if (!attention_mask[i]) break; token_type_ids[i] = 1; } return {input_ids, attention_mask, token_type_ids}; } // reshape a vector of inputs into a proper batch std::vector get_batch(const std::vector& inputs) { // must be given a full batch assert(inputs.size() == batch_size); torch::Tensor input_ids_tensor = torch::zeros({batch_size, seq_len}, at::kLong); torch::Tensor attention_mask_tensor = torch::zeros({batch_size, seq_len}, at::kLong); torch::Tensor token_type_ids_tensor = torch::zeros({batch_size, seq_len}, at::kLong); const auto opts = torch::TensorOptions().dtype(torch::kLong); for (size_t i = 0; i < batch_size; i++) { input_ids_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][0].data(), {seq_len}, opts); attention_mask_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][1].data(), {seq_len}, opts); token_type_ids_tensor.slice(0, i, i+1) = torch::from_blob((void*)inputs[i][2].data(), {seq_len}, opts); } return {input_ids_tensor, attention_mask_tensor, token_type_ids_tensor}; } int sanity_check(const std::string& model_filename) { // load the model auto model = get_model(model_filename); // construct some example inputs const std::string sentence_1 = "The company HuggingFace is based in New York City"; const std::string sentence_2 = "Apples are especially bad for your health"; const std::string sentence_3 = "HuggingFace's headquarters are situated in Manhattan"; const auto paraphrase = get_input(sentence_1, sentence_3); const auto not_paraphrase = get_input(sentence_1, sentence_2); // batch the inputs 50/50 positive/negative std::vector inputs(batch_size); for (size_t i = 0; i < batch_size; ++i) { if (i < batch_size / 2) { inputs[i] = paraphrase; } else { inputs[i] = not_paraphrase; } } const auto batch = get_batch(inputs); // forward pass const auto output = model.forward(batch); // interpret output const auto output_tensor = output.toTuple()->elements()[0].toTensor(); const auto paraphrase_probabilities = torch::softmax(output_tensor[0], 0); const auto not_paraphrase_probabilities = torch::softmax(output_tensor[batch_size-1], 0); const auto paraphrase_0 = std::round(paraphrase_probabilities[0].item() * 100); const auto paraphrase_1 = std::round(paraphrase_probabilities[1].item() * 100); const auto not_paraphrase_0 = std::round(not_paraphrase_probabilities[0].item() * 100); const auto not_paraphrase_1 = std::round(not_paraphrase_probabilities[1].item() * 100); std::cout << sentence_1 << std::endl << sentence_3 << std::endl; std::cout << "not paraphrase: " << paraphrase_0 << "%" << std::endl; std::cout << "paraphrase: " << paraphrase_1 << "%" << std::endl; if (paraphrase_0 >= paraphrase_1) return -1; std::cout << std::endl; std::cout << sentence_1 << std::endl << sentence_2 << std::endl; std::cout << "not paraphrase: " << not_paraphrase_0 << "%" << std::endl; std::cout << "paraphrase: " << not_paraphrase_1 << "%" << std::endl; if (not_paraphrase_0 <= not_paraphrase_1) return -2; return 0; } void benchmark(const std::string& model_filename, const std::vector& batch, std::condition_variable& warmup_cv, std::atomic_size_t& warmup_count, std::condition_variable& ready_cv) { // load model and warmup auto model = get_model(model_filename); model.forward(batch); std::cout << "." << std::flush; --warmup_count; warmup_cv.notify_one(); // wait for ready signal std::mutex ready_mutex; std::unique_lock lk(ready_mutex); ready_cv.wait(lk); // benchmark for (size_t i = 0; i < num_runs_per_neuron_core; i++) { if (i == num_runs_per_neuron_core/2) std::cout << "." << std::flush; model.forward(batch); } } int main(int argc, char *argv[]) { if (argc < 2) { std::cerr << "Usage: ./example_app neuron_traced_model.pt [--sanity]" << std::endl; return -1; } if( nrt_get_total_nc_count( &num_neuron_cores ) != NRT_SUCCESS ) { std::cerr << "Could not determine number of cores - aborting!" << std::endl; return -1; } // let runtime know we want M models / core for N cores (e.g. "1,1,1,1") setenv("NEURON_RT_VISIBLE_CORES", get_visible_cores_str(num_neuron_cores, cores_per_model).c_str(), true); if (argc >= 3 && std::string("--sanity") == argv[2]) { return sanity_check(argv[1]); } /*************************************************************************/ // prepare inputs, prepare models, and perform warmup inference std::cout << "Getting ready" << std::flush; const auto input = get_input("This sentence is for benchmarking.", "For benchmarking, use this sentence."); const auto batch = get_batch(std::vector(batch_size, input)); std::condition_variable warmup_cv, ready_cv; std::atomic_size_t warmup_count(num_neuron_cores); std::vector threads(num_neuron_cores); for (size_t i = 0; i < threads.size(); i++) { threads[i] = std::move(std::thread(benchmark, argv[1], batch, std::ref(warmup_cv), std::ref(warmup_count), std::ref(ready_cv))); } // wait for warmup to complete auto is_warmup_complete = [](std::atomic_size_t& warmup_count) { return warmup_count.load() == 0; }; std::mutex warmup_mutex; std::unique_lock lk(warmup_mutex); warmup_cv.wait(lk, std::bind(is_warmup_complete, std::ref(warmup_count))); std::cout << std::endl; /*************************************************************************/ // begin timed benchmarking std::cout << "Benchmarking" << std::flush; // signal workers to begin benchmarking and wait for completion const auto start_time = std::chrono::high_resolution_clock::now(); ready_cv.notify_all(); for (auto& thread : threads) thread.join(); const auto end_time = std::chrono::high_resolution_clock::now(); std::cout << std::endl; // report statistics const float elapsed = (end_time - start_time) / std::chrono::seconds(1); const size_t num_inferences = num_neuron_cores * num_runs_per_neuron_core; const float throughput = (float)(num_inferences * batch_size) / elapsed; std::cout << "Completed " << num_inferences << " operations in " << elapsed << " seconds => " << throughput << " pairs / second" << std::endl; std::cout << std::endl; std::cout << "====================" << std::endl; std::cout << "Summary information:" << std::endl; std::cout << "====================" << std::endl; std::cout << "Batch size = " << batch_size << std::endl; std::cout << "Num neuron cores = " << num_neuron_cores << std::endl; std::cout << "Num runs per neuron core = " << num_runs_per_neuron_core << std::endl; return 0; } ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/utils.cpp ================================================ #include "utils.hpp" #include "../tokenizers_binding/remote_rust_tokenizer.h" #include #include #include #include std::string get_visible_cores_str(size_t num_neuron_cores, size_t cores_per_model) { std::ostringstream oss; oss << "0-" << ((num_neuron_cores * cores_per_model) - 1); return oss.str(); } std::string get_uuid() { // xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx // M = version = 4, (4 bits, 0100 = 0x4) // N = variant = 1, (2 bits, 10XX = 0x{8, 9, A, B}) static const char *chars = "0123456789abcdef"; static std::random_device rd; static std::mt19937 mt(rd()); static std::uniform_int_distribution<> dist(0, 15); std::stringstream ss; for (size_t i = 0; i < 37; i++) { const int index = dist(mt); ss << chars[index]; } // variant bits are 10XX std::stringstream variant_ss; size_t variant; variant_ss << std::hex << chars[dist(mt)]; variant_ss >> variant; variant = 0x8 | (0x3 & variant); ss.seekp(9); ss << "-"; ss.seekp(14); ss << "-4"; ss.seekp(19); ss << "-" << std::hex << variant; ss.seekp(24); ss << "-"; return ss.str(); } torch::jit::script::Module get_model(const std::string& filename) { torch::jit::script::Module model = torch::jit::load(filename); // If you're using a model traced with torch-neuron >= 1.8, // the section below is no longer necessary. It was a workaround // for a runtime issue when loading identical copies of a model. // This is redundant in the new flow, but left to provide future // pointer on torchscript graph manipulation if needed // this next section adds a unique uuid to the graph, so that the neuron runtime // will load the graph multiple times instead of reusing a previously loaded copy /* auto fwd = model.get_method("forward"); auto& fn = static_cast(fwd.function()); auto graph = fn.graph(); torch::jit::Inline(*graph); for (auto node : graph->nodes()) { if (std::string(node->kind().toQualString()).rfind("neuron::forward") == 0) { auto uuid_input_tensor = node->inputs()[1]; if (std::string(uuid_input_tensor->node()->kind().toQualString()).rfind("prim::Constant") == 0) { // we clone the tensor to retain ownership of "the blob" after it goes out of scope const std::string uuid = get_uuid(); torch::Tensor t = torch::from_blob((void*)uuid.c_str(), {36}, torch::kUInt8).clone(); // if we don't move the insertion point so that the copy of the constant appears after the operator, // the inference will crash graph->setInsertPoint(node); torch::jit::Value *val = graph->insertConstant(t); node->replaceInputWith(uuid_input_tensor, val); // ensure a valid graph graph->lint(); } } } */ return model; } ================================================ FILE: src/examples/pytorch/libtorch_demo/example_app/utils.hpp ================================================ #ifndef __UTILS_HPP__ #define __UTILS_HPP__ #include std::string get_visible_cores_str(size_t num_neuron_cores, size_t cores_per_model); std::string get_uuid(); torch::jit::script::Module get_model(const std::string& filename); #endif // __UTILS_HPP__ ================================================ FILE: src/examples/pytorch/libtorch_demo/neuron.patch ================================================ From 3f126613c47e4261d0e86520cb6e85c5713e2b15 Mon Sep 17 00:00:00 2001 From: Stephen Dunn Date: Tue, 26 Jan 2021 22:55:40 +0000 Subject: [PATCH] Adds AWS Neuron native C++ interface --- diff --git a/tokenizers/Cargo.toml b/tokenizers/Cargo.toml index c0f1aff..9767da7 100644 --- a/tokenizers/Cargo.toml +++ b/tokenizers/Cargo.toml @@ -19,6 +19,7 @@ exclude = [ "rust-toolchain", "target/*", "Cargo.lock", "benches/*.txt", "benche name = "tokenizers" path = "src/lib.rs" bench = false +crate-type = ["rlib", "cdylib"] [[bench]] name = "bpe_benchmark" diff --git a/tokenizers/src/lib.rs b/tokenizers/src/lib.rs index eb89b93..2392f28 100644 --- a/tokenizers/src/lib.rs +++ b/tokenizers/src/lib.rs @@ -145,6 +145,8 @@ pub mod tokenizer; // Re-export from tokenizer pub use tokenizer::*; +mod neuron; + // Re-export also parallelism utils pub use utils::parallelism; diff --git a/b_tokenizers/tokenizers/src/neuron.rs b/tokenizers/src/neuron.rs new file mode 100644 index 0000000..af4a679 --- /dev/null +++ b/tokenizers/src/neuron.rs @@ -0,0 +1,25 @@ +use crate::tokenizer::Tokenizer; +use std::ffi::CStr; +use std::os::raw::c_char; + +// cached tokenizer +static mut TOKENIZER: Option = None; + +#[no_mangle] +pub unsafe extern "C" fn remote_rust_encode(input_arr: *const c_char, output_arr: *mut u32, output_arr_len: u32) { + // load the pretrained tokenizer up if we haven't already + let tokenizer = TOKENIZER.get_or_insert_with(|| Tokenizer::from_file("./tokenizer.json").unwrap()); + + // convert input from C -> Rust + let cstr = CStr::from_ptr(input_arr); + let input = cstr.to_str().unwrap(); + + // tokenize raw text + let encoding = tokenizer.encode(input, false).unwrap(); + + // hand the output back to C across shared memory + let output = std::slice::from_raw_parts_mut(output_arr, output_arr_len as usize); + for (i, token) in &mut encoding.get_ids().to_vec().iter().enumerate() { + output[i] = *token; + } +} \ No newline at end of file ================================================ FILE: src/examples/pytorch/libtorch_demo/run_tests.sh ================================================ #!/bin/bash set -e if [ "$#" -ne 1 ]; then echo "usage: ./run_tests.sh model_filename.pt" exit 1 fi echo -e "\nRunning tokenization sanity checks.\n" pushd tokenizers_binding 2>&1 >/dev/null chmod +x run_python.sh run.sh (./run_python.sh && ./run.sh) || { echo "Sanity checks failed."; exit 2; } popd 2>&1 >/dev/null echo -e "\nTokenization sanity checks passed." echo -e "Running end-to-end sanity check.\n" (./example-app $1 --sanity) || { echo "Sanity check failed."; exit 3; } echo -e "\nSanity check passed.\n" ================================================ FILE: src/examples/pytorch/libtorch_demo/setup.sh ================================================ #!/bin/bash set -eEx # Fail on error set -e TORCH_VERSION=$(python -c "import torch; v=torch.__version__.split('+')[0]; print(f'{v}')") #Parse cli while [ "$1" != "" ]; do case $1 in --torch-version ) shift TORCH_VERSION=$1 ;; esac shift done echo "Using PyTorch version ${TORCH_VERSION}" # Python setup PYTHON=python3 PYTHON_VERSION=$($PYTHON --version | cut -f2 -d' ' | cut -f1,2 -d'.') echo "Python version is '$PYTHON_VERSION'" OLD_TOOL_CHAIN=$($PYTHON -c \ "from bert_neuronx.detect_instance import get_instance_type; print('inf1' in get_instance_type())") if [ "$OLD_TOOL_CHAIN" == "True" ]; then TORCH_VERSION="1.13" echo "- Detected inf1 - using version ${TORCH_VERSION}" else echo "- Detected inf2 or trn1 - using version ${TORCH_VERSION}" fi # checkout tokenizers and apply neuron patch if [ ! -e "tokenizers" ]; then git clone https://github.com/huggingface/tokenizers.git cp neuron.patch tokenizers/neuron.patch pushd tokenizers git checkout d8c4388166cad8f0216dfc485efd6207a3275af2 git apply neuron.patch rm neuron.patch popd fi # build tests pushd tokenizers_binding chmod +x build.sh ./build.sh popd cp -f tokenizers_binding/tokenizer.json . # setup torch if [ ! -e "libtorch" ]; then # Use different download paths based on PyTorch version MAJOR_VERSION=$(echo "${TORCH_VERSION}" | cut -d. -f1) MINOR_VERSION=$(echo "${TORCH_VERSION}" | cut -d. -f2) if [ "$MAJOR_VERSION" -gt 2 ] || ([ "$MAJOR_VERSION" -eq 2 ] && [ "$MINOR_VERSION" -ge 8 ]); then wget -q https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-${TORCH_VERSION}%2Bcpu.zip unzip -q libtorch-shared-with-deps-${TORCH_VERSION}+cpu.zip rm -f libtorch-shared-with-deps-${TORCH_VERSION}+cpu.zip else wget -q https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}%2Bcpu.zip unzip -q libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}+cpu.zip rm -f libtorch-cxx11-abi-shared-with-deps-${TORCH_VERSION}+cpu.zip fi fi # get libneuron_op.so and install into libtorch $PYTHON -m pip install --upgrade "transformers==4.40.0" $PYTHON bert_neuronx/compile.py site_pkgs_dir=$($PYTHON -c "import site; print(site.getsitepackages()[0])") if [ "$OLD_TOOL_CHAIN" == "True" ] then cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libtorchneuron.so' \; -quit | grep torch_neuron) libtorch/lib/ cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libnrt.so' \; -quit ) libtorch/lib/ cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libnrt.so.1' \; -quit ) libtorch/lib/ else cp -f $(find $site_pkgs_dir -exec find {} -type f -name 'libtorchneuron.so' \; -quit | grep torch_neuronx) libtorch/lib/ fi # compile example app pushd example_app chmod +x build.sh ./build.sh popd chmod +x run_tests.sh echo "Successfully completed setup" ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/build.sh ================================================ #!/bin/bash # clean old artifacts rm tokenizer_test 2>&1 >/dev/null rm -rf lib 2>&1 >/dev/null # build shared library if [ $# -eq 0 ]; then pushd ../tokenizers/tokenizers echo "Building release test..." cargo build --release popd cp -r ../tokenizers/tokenizers/target/release lib g++ -O3 -o tokenizer_test tokenizer_test.cpp -L./lib -ltokenizers else pushd ../tokenizers/tokenizers echo "Building debug test..." cargo build popd cp -r ../tokenizers/tokenizers/target/debug lib g++ -O0 -o tokenizer_test tokenizer_test.cpp -L./lib -ltokenizers fi if [ ! -e "tokenizer.json" ]; then wget https://huggingface.co/bert-base-cased-finetuned-mrpc/raw/main/tokenizer.json fi ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/remote_rust_tokenizer.h ================================================ #ifndef __REMOTE_RUST_TOKENIZER_H__ #define __REMOTE_RUST_TOKENIZER_H__ #include extern "C" { extern void remote_rust_encode(const char *input_arr, uint32_t* output_arr, uint32_t output_arr_len); } #endif // __REMOTE_RUST_TOKENIZER_H__ ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/run.sh ================================================ #!/bin/bash set -e LD_LIBRARY_PATH=./lib ./tokenizer_test ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/run_python.sh ================================================ #!/bin/bash set -e python tokenizer_test.py ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/tokenizer_test.cpp ================================================ #include #include // timing #include // rust interface #include // std::setprecision #include // parse args #include #include "remote_rust_tokenizer.h" #define DEFAULT_NUM_TESTS 10000u int main(int argc, char *argv[]) { // prepare some input to tokenize const uint32_t seq_len = 128; const std::vector ground_truth = { 1409, 1917, 2947, 16193, 117, 1142, 3087, 1209, 1129, 22559, 2200, 1656, 155, 8954, 119 }; const char *input_arr = "If everything goes smoothly, this text will be tokenized inside Rust."; uint32_t* output_arr = new uint32_t[seq_len]; std::memset(output_arr, 0, sizeof(uint32_t) * seq_len); // call rust tokenizer remote_rust_encode(input_arr, output_arr, seq_len); // check output std::cout << "Sanity check "; for (auto i = 0; i < ground_truth.size(); ++i) { if (output_arr[i] != ground_truth[i]) { std::cerr << "failed at: " << i << ", " << output_arr[i] << " != " << ground_truth[i] << std::endl; return -1; } } std::cout << "passed." << std::endl; // run timed test uint32_t num_tests = DEFAULT_NUM_TESTS; if (argc >= 3 && !strcmp("--num_tests", argv[1])) { std::istringstream iss(argv[2]); iss >> num_tests; } const uint32_t ten_percent = uint32_t(0.1 * num_tests); std::cout << "Begin " << num_tests << " timed tests." << std::endl; auto start = std::chrono::high_resolution_clock::now(); for (auto test_num = 0; test_num < num_tests; ++test_num) { if (test_num % ten_percent == 0) { std::cout << "." << std::flush; } remote_rust_encode(input_arr, output_arr, seq_len); } auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration(end - start); std::cout << std::endl << "End timed tests." << std::endl << "C++ took " << std::setprecision(3) << duration.count() << " seconds." << std::endl; return 0; } ================================================ FILE: src/examples/pytorch/libtorch_demo/tokenizers_binding/tokenizer_test.py ================================================ from transformers import AutoTokenizer import argparse import time from tqdm import tqdm parser = argparse.ArgumentParser() parser.add_argument('--num_tests', type=int, default=10_000) args = parser.parse_args() tokenizer = AutoTokenizer.from_pretrained('bert-base-cased-finetuned-mrpc') start = time.time() for _ in tqdm(range(args.num_tests), desc='Tokenizing'): tokenizer.encode("If everything goes smoothly, this text will be tokenized inside Rust.") end = time.time() print('Python took {:.2f} seconds.'.format(end - start)) ================================================ FILE: src/examples/pytorch/libtorch_demo/trace_bert_neuron.py ================================================ import torch import torch_neuron from transformers import AutoTokenizer, AutoModelForSequenceClassification # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" max_length = 128 batch_size = 6 paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") example_inputs_paraphrase = ( torch.cat([paraphrase['input_ids']] * batch_size, 0), torch.cat([paraphrase['attention_mask']] * batch_size, 0), torch.cat([paraphrase['token_type_ids']] * batch_size, 0) ) # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron model_neuron_batch = torch_neuron.trace(model, example_inputs_paraphrase) # Save the batched model model_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size)) ================================================ FILE: src/examples/pytorch/mnist_mlp/train_monitor.py ================================================ import os import time import torch import torch.nn as nn import torch.nn.functional as F from torchvision.datasets import mnist from torch.optim import SGD from torch.utils.data import DataLoader from torchvision.transforms import ToTensor # XLA imports import torch_xla.core.xla_model as xm # Declare 3-layer MLP for MNIST dataset class MLP(nn.Module): def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) # Load MNIST train dataset train_dataset = mnist.MNIST(root='./MNIST_DATA_train', \ train=True, download=True, transform=ToTensor()) def main(): # Prepare data loader train_loader = DataLoader(train_dataset, batch_size=32) # Fix the random number generator seeds for reproducibility torch.manual_seed(0) # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance) device = 'xla' # Move model to device and declare optimizer and loss function model = MLP().to(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) loss_fn = torch.nn.NLLLoss() # Run the training loop print('----------Training ---------------') for run in range(0, 1000): print(f'Run {run}') model.train() for idx, (train_x, train_label) in enumerate(train_loader): optimizer.zero_grad() train_x = train_x.view(train_x.size(0), -1) train_x = train_x.to(device) train_label = train_label.to(device) output = model(train_x) loss = loss_fn(output, train_label) loss.backward() optimizer.step() xm.mark_step() # XLA: collect ops and run them in XLA runtime if idx < 2: # skip warmup iterations start = time.time() # Save checkpoint for evaluation os.makedirs("checkpoints", exist_ok=True) checkpoint = {'state_dict': model.state_dict()} # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu # This can prevent "XRT memory handle not found" at end of test.py execution xm.save(checkpoint,'checkpoints/checkpoint.pt') print('----------End Training ---------------') ================================================ FILE: src/examples/pytorch/mnist_mlp/train_tb.py ================================================ import os import time import torch import torch.nn as nn import torch.nn.functional as F from torchvision.datasets import mnist from torch.optim import SGD from torch.utils.data import DataLoader from torchvision.transforms import ToTensor # XLA imports import torch_xla.core.xla_model as xm from torch.utils.tensorboard import SummaryWriter # Declare 3-layer MLP for MNIST dataset class MLP(nn.Module): def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) # Load MNIST train dataset train_dataset = mnist.MNIST(root='./MNIST_DATA_train', \ train=True, download=True, transform=ToTensor()) def main(): # Prepare data loader train_loader = DataLoader(train_dataset, batch_size=32) # Fix the random number generator seeds for reproducibility torch.manual_seed(0) # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance) device = 'xla' # Move model to device and declare optimizer and loss function model = MLP().to(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) loss_fn = torch.nn.NLLLoss() # Use SummaryWriter to generate logs for TensorBoard writer = SummaryWriter('./output') # Run the training loop print('----------Training ---------------') model.train() start = time.time() for idx, (train_x, train_label) in enumerate(train_loader): optimizer.zero_grad() train_x = train_x.view(train_x.size(0), -1) train_x = train_x.to(device) train_label = train_label.to(device) output = model(train_x) loss = loss_fn(output, train_label) writer.add_scalar("step loss", loss, idx) # add the step loss to the TensorBoard logs loss.backward() optimizer.step() xm.mark_step() # XLA: collect ops and run them in XLA runtime if idx < 2: # skip warmup iterations start = time.time() # Compute statistics interval = idx - 2 # skip warmup iterations throughput = interval / (time.time() - start) print("Train throughput (iter/sec): {}".format(throughput)) print("Final loss is {:0.4f}".format(loss.detach().to('cpu'))) # Ensure TensorBoard logs are all written writer.flush() # Save checkpoint for evaluation os.makedirs("checkpoints", exist_ok=True) checkpoint = {'state_dict': model.state_dict()} # XLA: use xm.save instead of torch.save to ensure states are moved back to cpu # This can prevent "XRT memory handle not found" at end of test.py execution xm.save(checkpoint,'checkpoints/checkpoint.pt') print('----------End Training ---------------') if __name__ == '__main__': main() ================================================ FILE: src/examples/pytorch/neuronx_distributed/t5-inference/t5-inference-tutorial.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [Broken] T5 inference with Tensor Parallelism" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an extension to the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html). Here we will use NeuronxDistributed to improve the inference performance using tensor parallelism.\n", "\n", "This tutorial has the following main sections:\n", "\n", "1. Install dependencies\n", "1. Plug in `NeuronxDistributed` layers into T5\n", "1. Compile the T5 model\n", "1. Run distributed inference with beam search \n", "\n", "This Jupyter notebook should be run on a Inf2 instance (`inf2.24xlarge`) or Trn1 isntance (`trn1.32xlarge`)\n", "\n", "> The tutorial works for t5 and flan-t5 models. In this notebook we will run distributed inference with flan-t5-xl." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Install dependencies\n", "\n", "The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\n", "can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\n", "\n", "Run the notebook by cloning aws-neuron-sdk\n", "```\n", "git clone https://github.com/aws-neuron/aws-neuron-sdk.git\n", "cd aws-neuron-sdk/src/examples/pytorch/neuronx_distributed/t5-inference/\n", "```\n", "\n", "Once done execute `t5-inference-tutorial.ipynb`\n", "\n", "It is recommended to go through the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html) before you start this tutorial. \n", "In addition to the dependencies in the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html), we need to install neuronx-distributed. \n", "\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuronx`\n", "- `neuronx-cc`\n", "- `transformers`\n", "- `optimum-neuron`\n", "- `neuronx-distributed`\n", "\n", "Most of these packages will be installed when configuring your environment using the Trn1/Inf2 [ setup guide ](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/neuron-setup/pytorch/neuronx/ubuntu/torch-neuronx-ubuntu20.html#setup-torch-neuronx-ubuntu20). The additional dependencies must be installed here:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip install --upgrade transformers==4.33.1 optimum-neuron neuronx_distributed --extra-index-url https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pull the latest version of the compiler\n", "! pip install --upgrade neuronx-cc>=2.11 --no-deps" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Lets update numpy to a newer version \n", "! pip install --upgrade \"numpy>=1.22.2,<2\" --no-deps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plug in NeuronxDistributed layers into T5\n", "\n", "We extend the huggingface's T5 model to use the `NeuronxDistributed` parallel layers. To do so, we simply swap linear layers in `T5LayerSelfAttention`, `T5LayerCrossAttention`, and `T5LayerFF` definitions with `ColumnParallelLinear` and `RowParallelLinear`. We also need to swap the `Embedding` layer with `ParallelEmbedding`.\n", "\n", "Let us take the example of T5Attention. The [attention block](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_t5.py#L363-L366) has q, k, v, and o linear layers. \n", "The multi-head attention block uses q, k and v to compute the attention scores. The attention scores are then passed through o to compute the attention block output. \n", "So let us swap q, k and v layers with `ColumnParallelLinear` and o with `RowParallelLinear`. Having `RowParallelLinear` following a `ColumnParallelLinear` is a performance optimization. The attention scores computed with q, k and v are already split across Neuron devices. The row parallel layer can use this shared output directly. \n", "The embedding layer is simply swapped with the `ParallelEmbedding`.\n", "\n", "```\n", "class ParallelAttention(T5Attention):\n", " def __init__(self, config: T5Config, has_relative_attention_bias=False):\n", " super().__init__(config, has_relative_attention_bias)\n", " # Per attention head and per partition values\n", " world_size = parallel_state.get_tensor_model_parallel_size()\n", " self.num_attention_heads_per_partition = divide(self.n_heads, world_size)\n", " self.hidden_size_per_partition = self.num_attention_heads_per_partition * self.key_value_proj_dim\n", "\n", " # Mesh TensorFlow initialization to avoid scaling before softmax\n", " self.q = ColumnParallelLinear(self.d_model,\n", " self.inner_dim,\n", " bias=False,\n", " gather_output=False)\n", " self.k = ColumnParallelLinear(self.d_model,\n", " self.inner_dim,\n", " bias=False,\n", " gather_output=False)\n", " self.v = ColumnParallelLinear(self.d_model,\n", " self.inner_dim,\n", " bias=False,\n", " gather_output=False)\n", " self.o = RowParallelLinear(self.inner_dim,\n", " self.d_model,\n", " bias=False,\n", " input_is_parallel=True)\n", "\n", " if self.has_relative_attention_bias:\n", " self.relative_attention_bias = ParallelEmbedding(self.relative_attention_num_buckets, self.n_heads)\n", " self.n_heads = self.num_attention_heads_per_partition\n", "...\n", "```\n", "\n", "You can find the all modified T5 layers defined in [t5_model_layers.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5_model_layers.py). \n", "\n", "\n", "Once we have the modified T5 layers, we can plug in the T5Attention and T5LayerFF into the pretrained model. Here is how you do that. \n", "\n", "```\n", "def load_pretrained_with_parallel_attn(model_name):\n", " \n", " model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype=\"auto\")\n", "\n", " # Parallel implementation of Attention modules.\n", " from t5_model_layers import ParallelSelfAttention, ParallelFF, ParallelCrossAttention\n", "\n", " for index, block in enumerate(model.decoder.block):\n", " if index == 0:\n", " block.layer[0] = ParallelSelfAttention(model.config,\n", " has_relative_attention_bias=True)\n", " else:\n", " block.layer[0] = ParallelSelfAttention(model.config)\n", " block.layer[1] = ParallelCrossAttention(model.config)\n", " block.layer[2] = ParallelFF(model.config)\n", " # Load the weights into the parallel layers \n", " neuronx_distributed.parallel_layers.load(model_name + \".pt\", model, sharded=False)\n", "\n", " return model\n", "\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile the parallel T5 model\n", "\n", "Let us set some model parameters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_name = \"google/flan-t5-xl\" \n", "max_length = 128\n", "num_beams = 4\n", "tp_degree = 8 # tensor parallelism degree" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download and save the model that we want to trace. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from transformers import T5ForConditionalGeneration\n", "\n", "model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype=\"auto\")\n", "torch.save({\"model\":model.state_dict()}, model_name.split(\"/\")[-1] + \".pt\")\n", "model.config.use_cache = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To run HuggingFace T5 models on Neuron, we need to make a couple of changes. Let us reuse the code from the [t5 inference tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/torch-neuronx/t5-inference-tutorial.html) which makes T5 compatible with Neuron. For your convenience, the code copied into [wrapper.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/wrapper.py) and [t5_models.py](https://github.com/aws-neuron/aws-neuron-sdk/tree/master/src/examples/pytorch/neuronx_distributed/t5-inference/t5_models.py). This notebook will import these files. \n", "\n", "The only change made to this code is that we use `neuronx_distributed.trace` instead of `torch_neuronx.trace`. \n", "\n", "Let us trace the encoder and decoder. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import t5_models \n", "import neuronx_distributed\n", "import time \n", "\n", "# This can take up to 20 minutes\n", "encoder_compile_start_time = time.time()\n", "traced_encoder = t5_models.parallel_trace_encoder(model_name, max_length, num_beams, tp_degree)\n", "print(\"Encoder compilation time {}\".format(time.time() - encoder_compile_start_time))\n", "\n", "neuronx_distributed.trace.parallel_model_save(traced_encoder, \"TracedParallelEncoder.pt\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This can take up to 15 minutes\n", "decoder_compile_start_time = time.time()\n", "traced_decoder = t5_models.parallel_trace_decoder(model, model_name, num_beams, max_length, tp_degree)\n", "print(\"Decoder compilation time {}\".format(time.time() - decoder_compile_start_time))\n", "\n", "neuronx_distributed.trace.parallel_model_save(traced_decoder, \"TracedParallelDecoder.pt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inference with the traced parallel T5 model\n", "\n", "With the traced model, let us try using beam search for inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Results:\n", "1 Lassen Sie uns gutes Essen essen.\n", "2 Lassen Sie uns gut essen.\n", "3 Lassen Sie uns gutes Essen zu essen.\n", "4 Lassen Sie uns gutes Essen zu sich nehmen.\n" ] } ], "source": [ "import neuronx_distributed\n", "from wrapper import T5Wrapper\n", "from transformers import T5Tokenizer\n", "\n", "\n", "num_return_sequences = 4\n", "\n", "traced_encoder = neuronx_distributed.trace.parallel_model_load(\"TracedParallelEncoder.pt\")\n", "traced_decoder = neuronx_distributed.trace.parallel_model_load(\"TracedParallelDecoder.pt\")\n", "\n", "tokenizer = T5Tokenizer.from_pretrained(model_name)\n", "model = T5Wrapper.from_pretrained(model_name)\n", "\n", "model.encoder = traced_encoder\n", "model.decoder = traced_decoder\n", "setattr(model.encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", "\n", "output = model.parallel_infer(tokenizer=tokenizer,\n", " prompt=\"translate English to German: Lets eat good food.\",\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences,\n", " device=\"xla\")\n", "\n", "results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\n", "\n", "print('Results:')\n", "for i, summary in enumerate(results):\n", " print(i + 1, summary)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "Let us benchmark the per token decoder latency" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let us install NeuronPerf. We will use it to measure the performance.\n", "! pip install neuronperf --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os \n", "import neuronperf as npf\n", "\n", "d_model = model.config.d_model\n", "model_dir = \"TracedParallelDecoder.pt\"\n", "decoder_run_count = 128\n", "\n", "def load_fn(model_path, **kwargs):\n", " return neuronx_distributed.trace.parallel_model_load(model_path)\n", " \n", "# NeuronPerf can't see tp_degree at the moment, so just expose all cores\n", "def env_setup_fn(*_):\n", " del os.environ[\"NEURON_RT_VISIBLE_CORES\"]\n", "\n", "def benchmark():\n", "\n", " # Create some sample inputs for the decoder\n", " decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64)\n", " decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32)\n", " encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64)\n", " encoder_hidden_states = torch.ones((num_beams, max_length, d_model), dtype=torch.float32)\n", " beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\n", " beam_scores = torch.zeros((num_beams,), dtype=torch.float)\n", "\n", " inputs = (decoder_input_ids,\n", " decoder_attention_mask,\n", " encoder_hidden_states,\n", " encoder_attention_mask,\n", " beam_idx,\n", " beam_scores)\n", "\n", " reports = npf.benchmark(\n", " load_fn,\n", " model_dir,\n", " [inputs], \n", " batch_sizes=1,\n", " n_models=1,\n", " max_infers=decoder_run_count,\n", " workers_per_model=1, # no bottleneck on model inputs, so 1 is fine\n", " env_setup_fn=env_setup_fn,\n", " multiprocess=False,\n", " )\n", " \n", " report = reports[0]\n", "\n", " # let's update throughput to be tokens / second and add a new recor\n", " latency_in_s = report[\"latency_ms_avg\"] / 1000\n", " tokens_per_s = decoder_run_count / latency_in_s\n", " report[\"throughput_avg\"] = tokens_per_s\n", " \n", " # display and save results\n", " npf.print_reports(reports, cols=[\"throughput_avg\", \"latency_ms_p50\", \"latency_ms_p99\"])\n", " print(f\"Results saved to: {npf.write_json(reports[0])}\")\n", "\n", "benchmark()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets benchmark inference as a whole including sampling. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import torch\n", "import neuronx_distributed\n", "import neuronperf as npf\n", "\n", "from transformers import T5Tokenizer\n", "from wrapper import T5Wrapper\n", "\n", "tokenizer = T5Tokenizer.from_pretrained(model_name)\n", "\n", "generated_token_count = 0\n", "\n", "class Wrapper(torch.nn.Module):\n", " def __init__(self, \n", " traced_encoder,\n", " traced_decoder):\n", " super().__init__()\n", " self.model = T5Wrapper.from_pretrained(model_name)\n", " self.model.encoder = traced_encoder\n", " self.model.decoder = traced_decoder\n", " setattr(self.model.encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", "\n", " def forward(self, *inputs):\n", " input_ids = inputs[0]['input_ids']\n", " attention_mask = inputs[0]['attention_mask']\n", " return self.model.parallel_infer(input_ids=input_ids,\n", " attention_mask=attention_mask,\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences)\n", "\n", "def load_fn(filename, **kwargs):\n", " traced_encoder = neuronx_distributed.trace.parallel_model_load(filename + \"TracedParallelEncoder.pt\")\n", " traced_decoder = neuronx_distributed.trace.parallel_model_load(filename + \"TracedParallelDecoder.pt\")\n", " return Wrapper(traced_encoder, traced_decoder)\n", "\n", "# NeuronPerf can't see tp_degree at the moment, so just expose all cores\n", "def env_setup_fn(*_):\n", " del os.environ[\"NEURON_RT_VISIBLE_CORES\"]\n", "\n", "def preprocess_fn(inputs):\n", " \n", " encoding = []\n", " for text in inputs:\n", " batch_encoding = tokenizer(text, \n", " max_length=max_length, \n", " truncation=True, \n", " padding='max_length',\n", " return_tensors=\"pt\")\n", " input_ids = batch_encoding['input_ids']\n", " attention_mask = batch_encoding['attention_mask']\n", " encoding.append({\"input_ids\": input_ids,\n", " \"attention_mask\": attention_mask})\n", " return encoding\n", "\n", "def postprocess_fn(outputs):\n", " output = [tokenizer.decode(seq) for seq in outputs]\n", " global generated_token_count \n", " generated_token_count = len(outputs[0])\n", " return output\n", "\n", "def benchmark():\n", " inputs = [\"summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes.\"]\n", " reports = npf.benchmark(\n", " load_fn,\n", " \"\", # Model dir\n", " [inputs], \n", " batch_sizes=1,\n", " n_models=1,\n", " max_infers=5,\n", " max_duration=0, # sampling can take a while, so let's not timeout\n", " workers_per_model=1, \n", " env_setup_fn=env_setup_fn,\n", " preprocess_fn=preprocess_fn,\n", " postprocess_fn=postprocess_fn,\n", " multiprocess=False,\n", " )\n", " \n", " report = reports[0]\n", "\n", " report[\"throughput_avg\"] = round(generated_token_count / (report[\"latency_ms_avg\"] / 1000), 2)\n", " report[\"latency_per_token_ms_p50\"] = round((report[\"latency_ms_p50\"])/generated_token_count, 2)\n", " report[\"latency_per_token_ms_p99\"] = round((report[\"latency_ms_p99\"])/generated_token_count, 2)\n", "\n", " # display and save results\n", " npf.print_reports(reports, cols=[\"throughput_avg\", \"latency_per_token_ms_p50\", \"latency_per_token_ms_p99\"])\n", " print(f\"Results saved to: {npf.write_json(report)}\")\n", "\n", "benchmark()" ] } ], "metadata": { "kernelspec": { "display_name": "aws_neuron_venv_pytorch", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: src/examples/pytorch/neuronx_distributed/t5-inference/t5_model_layers.py ================================================ from neuronx_distributed.parallel_layers import parallel_state from neuronx_distributed.parallel_layers.layers import BaseParallelLinear, ColumnParallelLinear, RowParallelLinear, ParallelEmbedding from neuronx_distributed.parallel_layers.utils import divide import torch from torch import nn from torch.nn.parameter import Parameter from transformers import T5Config from transformers.activations import ACT2FN from transformers.pytorch_utils import find_pruneable_heads_and_indices from transformers.models.t5.modeling_t5 import T5Attention, T5LayerSelfAttention, T5LayerNorm,\ T5LayerCrossAttention, T5LayerFF, T5DenseGatedActDense, T5DenseActDense from transformers import T5ForConditionalGeneration import neuronx_distributed def prune_linear_layer(layer: BaseParallelLinear, index: torch.LongTensor, dim: int = 0) -> BaseParallelLinear: """ Prune a linear layer to keep only entries in index. Used to remove heads. Args: layer (`BaseParallelLinear`): The layer to prune. index (`torch.LongTensor`): The indices to keep in the layer. dim (`int`, *optional*, defaults to 0): The dimension on which to keep the indices. Returns: `BaseParallelLinear`: The pruned layer as a new layer with `requires_grad=True`. """ index = index.to(layer.weight.device) W = layer.weight.index_select(dim, index).clone().detach() if layer.bias is not None: if dim == 1: b = layer.bias.clone().detach() else: b = layer.bias[index].clone().detach() new_size = list(layer.weight.size()) new_size[dim] = len(index) new_layer = ColumnParallelLinear(new_size[1], new_size[0], bias=layer.bias is not None, gather_output=False).to(layer.weight.device) new_layer.weight.requires_grad = False new_layer.weight.copy_(W.contiguous()) new_layer.weight.requires_grad = True if layer.bias is not None: new_layer.bias.requires_grad = False new_layer.bias.copy_(b.contiguous()) new_layer.bias.requires_grad = True return new_layer class ParallelAttention(T5Attention): def __init__(self, config: T5Config, has_relative_attention_bias=False): super().__init__(config, has_relative_attention_bias) # Per attention head and per partition values world_size = parallel_state.get_tensor_model_parallel_size() self.num_attention_heads_per_partition = divide( self.n_heads, world_size) self.hidden_size_per_partition = self.num_attention_heads_per_partition * self.key_value_proj_dim # Mesh TensorFlow initialization to avoid scaling before softmax self.q = ColumnParallelLinear(self.d_model, self.inner_dim, bias=False, gather_output=False) self.k = ColumnParallelLinear(self.d_model, self.inner_dim, bias=False, gather_output=False) self.v = ColumnParallelLinear(self.d_model, self.inner_dim, bias=False, gather_output=False) self.o = RowParallelLinear(self.inner_dim, self.d_model, bias=False, input_is_parallel=True) if self.has_relative_attention_bias: self.relative_attention_bias = ParallelEmbedding(self.relative_attention_num_buckets, self.n_heads) self.n_heads = self.num_attention_heads_per_partition def prune_heads(self, heads): if len(heads) == 0: return heads, index = find_pruneable_heads_and_indices( heads, self.num_attention_heads_per_partition, self.key_value_proj_dim, self.pruned_heads ) # Prune linear layers self.q = prune_linear_layer(self.q, index) self.k = prune_linear_layer(self.k, index) self.v = prune_linear_layer(self.v, index) self.o = prune_linear_layer(self.o, index, dim=1) # Update hyper params self.num_attention_heads_per_partition = self.num_attention_heads_per_partition - len(heads) self.hidden_size_per_partition = self.key_value_proj_dim * self.num_attention_heads_per_partition self.pruned_heads = self.pruned_heads.union(heads) def compute_bias(self, query_length, key_length, device=None): """Compute binned relative position bias""" if device is None: device = self.relative_attention_bias.weight.device context_position = torch.arange(query_length, dtype=torch.long, device=device)[:, None] memory_position = torch.arange(key_length, dtype=torch.long, device=device)[None, :] relative_position = memory_position - context_position # shape (query_length, key_length) relative_position_bucket = self._relative_position_bucket( relative_position, # shape (query_length, key_length) bidirectional=(not self.is_decoder), num_buckets=self.relative_attention_num_buckets, max_distance=self.relative_attention_max_distance, ) values = self.relative_attention_bias( relative_position_bucket) tp_rank = parallel_state.get_tensor_model_parallel_rank() values = values[:, :, tp_rank * self.num_attention_heads_per_partition:(tp_rank + 1) * self.num_attention_heads_per_partition] # values = self.relative_attention_bias( # relative_position_bucket) # shape (query_length, key_length, num_heads) values = values.permute([2, 0, 1]).unsqueeze( 0) # shape (1, num_heads, query_length, key_length) # print("Values shape is: ", values.shape) return values def forward( self, hidden_states, mask=None, key_value_states=None, position_bias=None, past_key_value=None, layer_head_mask=None, query_length=None, use_cache=False, output_attentions=False, ): """ Self-attention (if key_value_states is None) or attention over source sentence (provided by key_value_states). """ # Input is (batch_size, seq_length, dim) # Mask is (batch_size, key_length) (non-causal) or (batch_size, key_length, key_length) # past_key_value[0] is (batch_size, n_heads, q_len - 1, dim_per_head) self.is_decoder = True batch_size, seq_length = hidden_states.shape[:2] real_seq_length = seq_length if past_key_value is not None: assert ( len(past_key_value) == 2 ), f"past_key_value should have 2 past states: keys and values. Got {len(past_key_value)} past states" real_seq_length += past_key_value[0].shape[2] if query_length is None else query_length key_length = real_seq_length if key_value_states is None else key_value_states.shape[1] def shape(states): """projection""" return states.view(batch_size, -1, self.num_attention_heads_per_partition, self.key_value_proj_dim).transpose(1, 2) def unshape(states): """reshape""" return states.transpose(1, 2).contiguous().view(batch_size, -1, self.hidden_size_per_partition) def project(hidden_states, proj_layer, key_value_states, past_key_value): """projects hidden states correctly to key/query states""" if key_value_states is None: # self-attn # (batch_size, n_heads, seq_length, dim_per_head) hidden_states = shape(proj_layer(hidden_states)) elif past_key_value is None: # cross-attn # (batch_size, n_heads, seq_length, dim_per_head) hidden_states = shape(proj_layer(key_value_states)) if past_key_value is not None: # import pdb; pdb.set_trace() if key_value_states is None: # self-attn # (batch_size, n_heads, key_length, dim_per_head) hidden_states = torch.cat([past_key_value, hidden_states], dim=2) elif past_key_value.shape[2] != key_value_states.shape[1]: # checking that the `sequence_length` of the `past_key_value` is the same as # the provided `key_value_states` to support prefix tuning # cross-attn # (batch_size, n_heads, seq_length, dim_per_head) hidden_states = shape(proj_layer(key_value_states)) else: # cross-attn hidden_states = past_key_value return hidden_states # get query states query_states = shape( self.q(hidden_states)) # (batch_size, n_heads, seq_length, dim_per_head) # get key/value states key_states = project( hidden_states, self.k, key_value_states, past_key_value[0] if past_key_value is not None else None ) value_states = project( hidden_states, self.v, key_value_states, past_key_value[1] if past_key_value is not None else None ) # compute scores scores = torch.matmul( query_states, key_states.transpose(3, 2) ) # equivalent of torch.einsum("bnqd,bnkd->bnqk", query_states, key_states), compatible with onnx op>9 if position_bias is None: if not self.has_relative_attention_bias: position_bias = torch.zeros( (1, self.num_attention_heads_per_partition, real_seq_length, key_length), device=scores.device, dtype=scores.dtype ) if self.gradient_checkpointing and self.training: position_bias.requires_grad = True else: position_bias = self.compute_bias(real_seq_length, key_length, device=scores.device) # if key and values are already calculated # we want only the last query position bias if past_key_value is not None: position_bias = position_bias[:, :, -hidden_states.size(1):, :] if mask is not None: print(position_bias.shape, mask.shape, flush=True) position_bias = position_bias + mask # (batch_size, n_heads, seq_length, key_length) if self.pruned_heads: mask = torch.ones(position_bias.shape[1]) mask[list(self.pruned_heads)] = 0 position_bias_masked = position_bias[:, mask.bool()] else: position_bias_masked = position_bias # print("Scores is: ", scores.shape) # print("position_bias_masked: ", position_bias_masked.shape) # print(scores.dtype, position_bias_masked.dtype) scores += position_bias_masked attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as( scores ) # (batch_size, n_heads, seq_length, key_length) attn_weights = nn.functional.dropout( attn_weights, p=self.dropout, training=self.training ) # (batch_size, n_heads, seq_length, key_length) # Mask heads if we want to if layer_head_mask is not None: attn_weights = attn_weights * layer_head_mask attn_output = unshape( torch.matmul(attn_weights, value_states)) # (batch_size, seq_length, dim) attn_output = self.o(attn_output) print(self.is_decoder,use_cache, flush=True) present_key_value_state = (key_states, value_states) if ( self.is_decoder and use_cache) else None outputs = (attn_output,) + (present_key_value_state,) + (position_bias,) if output_attentions: outputs = outputs + (attn_weights,) return outputs class ParallelSelfAttention(T5LayerSelfAttention): def __init__(self, config, has_relative_attention_bias=False): super().__init__(config, has_relative_attention_bias=False) self.SelfAttention = ParallelAttention(config, has_relative_attention_bias=has_relative_attention_bias) self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon) self.dropout = nn.Dropout(config.dropout_rate) class ParallelCrossAttention(T5LayerCrossAttention): def __init__(self, config): super().__init__(config) self.EncDecAttention = ParallelAttention(config, has_relative_attention_bias=False) self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon) self.dropout = nn.Dropout(config.dropout_rate) class ParallelDenseActDense(T5DenseActDense): def __init__(self, config: T5Config): super().__init__(config) self.wi = ColumnParallelLinear(config.d_model, config.d_ff, gather_output=False, bias=False) self.wo = RowParallelLinear(config.d_ff, config.d_model, input_is_parallel=True, bias=False) self.dropout = nn.Dropout(config.dropout_rate) self.act = ACT2FN[config.dense_act_fn] class ParallelDenseGatedActDense(T5DenseGatedActDense): def __init__(self, config: T5Config): super().__init__(config) self.wi_0 = ColumnParallelLinear(config.d_model, config.d_ff, gather_output=False, bias=False) self.wi_1 = ColumnParallelLinear(config.d_model, config.d_ff, gather_output=False, bias=False) self.wo = RowParallelLinear(config.d_ff, config.d_model, input_is_parallel=True, bias=False) self.dropout = nn.Dropout(config.dropout_rate) self.act = ACT2FN[config.dense_act_fn] class ParallelFF(T5LayerFF): def __init__(self, config: T5Config): super().__init__(config) if config.is_gated_act: self.DenseReluDense = ParallelDenseGatedActDense(config) else: self.DenseReluDense = ParallelDenseActDense(config) self.layer_norm = T5LayerNorm(config.d_model, eps=config.layer_norm_epsilon) self.dropout = nn.Dropout(config.dropout_rate) def load_pretrained_with_parallel_attn(model_name): model = T5ForConditionalGeneration.from_pretrained(model_name, torch_dtype="auto") # Parallel implementation of Attention modules. from t5_model_layers import ParallelSelfAttention, ParallelFF, ParallelCrossAttention for index, block in enumerate(model.decoder.block): if index == 0: block.layer[0] = ParallelSelfAttention(model.config, has_relative_attention_bias=True) else: block.layer[0] = ParallelSelfAttention(model.config) block.layer[1] = ParallelCrossAttention(model.config) block.layer[2] = ParallelFF(model.config) # Load the weights into the parallel layers neuronx_distributed.parallel_layers.load(model_name.split("/")[-1] + ".pt", model, sharded=False) return model ================================================ FILE: src/examples/pytorch/neuronx_distributed/t5-inference/t5_models.py ================================================ import torch import neuronx_distributed from functools import partial from transformers import T5Tokenizer, T5ForConditionalGeneration from wrapper import EncoderWrapper, DecoderWrapper from t5_model_layers import load_pretrained_with_parallel_attn def get_wrapped_encoder(max_length, num_beams, tp_degree, model_name): model = load_pretrained_with_parallel_attn(model_name) encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, "xla", num_beams, tp_degree=tp_degree) encoder.eval() # We are alaising the cache, so that way we keep the cache always on device. aliases = {} for i in range(len(encoder.past_key_values_sa)): aliases[encoder.past_key_values_sa[i]] = i for i in range(len(encoder.past_key_values_ca)): aliases[encoder.past_key_values_ca[i]] = len(encoder.past_key_values_sa) + i return encoder, aliases def get_wrapped_decoder(max_length, num_beams, tp_degree, model_name): model = load_pretrained_with_parallel_attn(model_name) decoder = DecoderWrapper(decoder=model.decoder, lm_head=model.lm_head, model_config=model.config, num_beams=num_beams, max_length=max_length, device="xla", tp_degree=tp_degree) decoder.eval() num_outputs_from_trace = 3 if num_beams > 1 else 1 aliases = {} for i in range(len(decoder.past_key_values_sa)): aliases[decoder.past_key_values_sa[i]] = i + num_outputs_from_trace for i in range(len(decoder.past_key_values_ca)): aliases[decoder.past_key_values_ca[i]] = len(decoder.past_key_values_sa) + i + num_outputs_from_trace return decoder, aliases def parallel_trace_encoder(model_name: str, max_length: int, num_beams: int, tp_degree: int): print("starting encoder parallel trace") tokenizer = T5Tokenizer.from_pretrained(model_name) get_encoder_callable = partial(get_wrapped_encoder, max_length, num_beams, tp_degree, model_name) # Trace encoder batch_encoding = tokenizer("translate English to German: Lets go home now", max_length=max_length, truncation=True, padding='max_length', return_tensors="pt") input_ids = batch_encoding['input_ids'] attention_mask = batch_encoding['attention_mask'] # Here we are tracing the encoder and cache together. Cache is marked as state and we are aliasing. traced_encoder = neuronx_distributed.trace.parallel_model_trace(get_encoder_callable, ( input_ids, attention_mask, ), tp_degree=tp_degree, compiler_workdir="/tmp/encoder/", ) setattr(traced_encoder, 'main_input_name', 'input_ids') # Attribute required by beam search print("completed encoder parallel trace") return traced_encoder def parallel_trace_decoder(model: T5ForConditionalGeneration, model_name: str, num_beams: int, max_length: int, tp_degree: int): print("starting decoder trace") get_decoder_callable = partial(get_wrapped_decoder, max_length, num_beams, tp_degree, model_name) # We create mock inputs so we can trace the decoder decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64) decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32) encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64) encoder_hidden_states = torch.ones((num_beams, max_length, model.config.d_model), dtype=torch.float32) beam_idx = torch.arange(0, num_beams, dtype=torch.int64) beam_scores = torch.zeros((num_beams,), dtype=torch.float) traced_decoder = neuronx_distributed.trace.parallel_model_trace(get_decoder_callable, ( decoder_input_ids, decoder_attention_mask, encoder_hidden_states, encoder_attention_mask, beam_idx, beam_scores ), tp_degree=tp_degree, compiler_workdir="/tmp/decoder/", ) print("complete decoder trace") return traced_decoder ================================================ FILE: src/examples/pytorch/neuronx_distributed/t5-inference/wrapper.py ================================================ import torch import neuronx_distributed import torch_xla.core.xla_model as xm from transformers import T5Tokenizer, T5ForConditionalGeneration from transformers.modeling_outputs import BaseModelOutput, Seq2SeqLMOutput from transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention from transformers.generation.utils import ModelOutput from typing import Any, Dict, List, Optional, Tuple, Union from transformers.generation.beam_search import BeamScorer, BeamSearchScorer from optimum.neuron.generation import NeuronGenerationMixin from transformers.generation.logits_process import ( LogitsProcessorList, ) from transformers.generation.stopping_criteria import ( MaxLengthCriteria, MaxTimeCriteria, StoppingCriteriaList, validate_stopping_criteria, ) from transformers.generation.utils import ( BeamSearchDecoderOnlyOutput, BeamSearchEncoderDecoderOutput, BeamSearchOutput, GreedySearchOutput, ) class T5Wrapper(T5ForConditionalGeneration, NeuronGenerationMixin): def _prepare_encoder_decoder_kwargs_for_generation( self, inputs_tensor: torch.Tensor, model_kwargs, model_input_name: Optional[str] = None ) -> Dict[str, Any]: encoder = self.get_encoder() model_kwargs["encoder_outputs"]: ModelOutput = encoder(inputs_tensor, model_kwargs["attention_mask"]) return model_kwargs # Override to cut the input_ids to just last token def prepare_inputs_for_generation( self, input_ids, past_key_values=None, attention_mask=None, head_mask=None, decoder_head_mask=None, decoder_attention_mask=None, cross_attn_head_mask=None, use_cache=None, encoder_outputs=None, **kwargs, ): # cut decoder_input_ids as past is cached input_ids = input_ids[:, -1:] return { "decoder_input_ids": input_ids, "past_key_values": past_key_values, "encoder_outputs": encoder_outputs, "attention_mask": attention_mask, "head_mask": head_mask, "decoder_head_mask": decoder_head_mask, "decoder_attention_mask": decoder_attention_mask, "cross_attn_head_mask": cross_attn_head_mask, "use_cache": use_cache, } ''' We update the cache in the decoder trace, so lets override the _update_model_kwargs_for_xla_generation in NeuronGenerationMixin ''' def _update_model_kwargs_for_xla_generation( self, model_kwargs: Dict[str, Any], batch_size: int, is_encoder_decoder: bool = False, standardize_cache_format: bool = False, max_length: Optional[int] = None, seq_length: Optional[int] = None, use_cache: bool = True, ) -> Dict[str, Any]: def _update_attention(model_kwargs, is_encoder_decoder): """Updates the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`""" attention_mask_name = "decoder_attention_mask" if is_encoder_decoder else "attention_mask" attention_mask = model_kwargs.pop(attention_mask_name) attention_mask_update_slice = torch.ones( (batch_size, 1), dtype=attention_mask.dtype, device=attention_mask.device ) attention_mask = torch.cat([attention_mask[:, 1:], attention_mask_update_slice], dim=-1) mask = {attention_mask_name: attention_mask} return mask mask = _update_attention(model_kwargs, is_encoder_decoder) # sets the updated variables (mask and past_key_values) model_kwargs.update(mask) # Set a mock cache tensor model_kwargs["past_key_values"] = torch.tensor([]) return model_kwargs def _reorder_cache(self, past_key_values, beam_idx): ''' This is needed for beam search and not greedy sampling We reorder the cache within the trace so we can skip it in modelling_t5.py. So we override the _reorder_cache ''' self.beam_idx = beam_idx return past_key_values def infer(self, tokenizer: T5Tokenizer, prompt: str, max_length: int, num_beams: int, num_return_sequences: int, device: str): batch_encoding = tokenizer(prompt, max_length=max_length, truncation=True, padding='max_length', return_tensors="pt") past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask']) decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32), torch.ones((1, 1), dtype=torch.int32)], axis=1) # copy the new cache state to the decoder if device == "xla": for state, tensor in zip(self.decoder.parameters(), past_key_values): state.copy_(tensor) else: # First half of the cache is self attention and the rest is cross attention self.decoder.past_key_values_sa = past_key_values[:len(past_key_values)//2] self.decoder.past_key_values_ca = past_key_values[len(past_key_values)//2:] output = self.generate(**batch_encoding, max_length=max_length, num_beams=num_beams, num_return_sequences=num_return_sequences, do_sample=False, use_cache=True, decoder_attention_mask=decoder_attention_mask, encoder_outputs={"last_hidden_state": torch.ones((1,128,1))}) # Pass fake encoder_outputs so the transfomers code will not invoke the encoder return output def parallel_infer(self, max_length: int, num_beams: int, num_return_sequences: int, device: str = None, tokenizer: T5Tokenizer = None, prompt: str = None, input_ids: torch.Tensor = None, attention_mask: torch.Tensor = None): if input_ids is None or attention_mask is None: batch_encoding = tokenizer(prompt, max_length=max_length, truncation=True, padding='max_length', return_tensors="pt") else: batch_encoding = { 'input_ids' : input_ids, 'attention_mask': attention_mask } past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask']) decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32), torch.ones((1, 1), dtype=torch.int32)], axis=1) # Here the encoder is now returning cache which is device tensor, so we directly assign # the cache device tensor to decoder's cache (which is also a device tensor). # We thereby avoid the copy and always use a pre-allocated memory for model_tp_decoder, model_tp_encoder in zip(self.decoder.models, self.encoder.models): model_tp_decoder.load_state_dict(model_tp_encoder.state_dict(), strict=True) # Pass fake encoder_outputs so the transfomers code will not invoke the encoder output = self.generate(**batch_encoding, max_length=max_length, num_beams=num_beams, num_return_sequences=num_return_sequences, do_sample=False, use_cache=True, decoder_attention_mask=decoder_attention_mask, encoder_outputs={"last_hidden_state": torch.ones((1,128,1))}) return output def forward( self, attention_mask: Optional[torch.FloatTensor] = None, decoder_input_ids: Optional[torch.LongTensor] = None, decoder_attention_mask: Optional[torch.BoolTensor] = None, encoder_outputs: Optional[Tuple[Tuple[torch.Tensor]]] = None, beam_scores = None, **kwargs ) -> Union[Tuple[torch.FloatTensor], Seq2SeqLMOutput]: hidden_states = encoder_outputs["last_hidden_state"] if not hasattr(self, 'beam_idx'): # Infering the number of beams from the attention mask num_beams = attention_mask.shape[0] self.beam_idx = torch.arange(0, num_beams, dtype=torch.int64) decoder_outputs = self.decoder( decoder_input_ids, decoder_attention_mask, hidden_states, attention_mask, self.beam_idx, beam_scores ) # lm_logits = decoder_outputs[0] next_token_scores = decoder_outputs[0] next_tokens = decoder_outputs[1] next_indices = decoder_outputs[2] return next_token_scores, next_tokens, next_indices def beam_search( self, input_ids: torch.LongTensor, beam_scorer: BeamScorer, logits_processor: Optional[LogitsProcessorList] = None, stopping_criteria: Optional[StoppingCriteriaList] = None, max_length: Optional[int] = None, pad_token_id: Optional[int] = None, eos_token_id: Optional[Union[int, List[int]]] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, output_scores: Optional[bool] = None, return_dict_in_generate: Optional[bool] = None, synced_gpus: Optional[bool] = False, seq_length: Optional[int] = None, **model_kwargs, ) -> Union[BeamSearchOutput, torch.LongTensor]: logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList() stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList() pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id if isinstance(eos_token_id, int): eos_token_id = [eos_token_id] output_scores = output_scores if output_scores is not None else self.generation_config.output_scores output_attentions = ( output_attentions if output_attentions is not None else self.generation_config.output_attentions ) output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states ) batch_size = len(beam_scorer._beam_hyps) num_beams = beam_scorer.num_beams batch_beam_size, cur_len = input_ids.shape # Overwrite cur_len cur_len = seq_length if num_beams * batch_size != batch_beam_size: raise ValueError( f"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}." ) # init attention / hidden states / scores tuples scores = () if (return_dict_in_generate and output_scores) else None beam_indices = ( tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None ) # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens # of the first beam are considered to avoid sampling the exact same tokens across all beams. # beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=input_ids.device) beam_scores_device = "cpu" beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=beam_scores_device) beam_scores[:, 1:] = -1e9 beam_scores = beam_scores.view((batch_size * num_beams,)) while True: # prepare model inputs # From max_length-sized input_ids, select first # cur_len - 1 values. update_indices = torch.stack( [torch.arange(input_ids.size(0)), torch.tensor(cur_len - 1).repeat(input_ids.size(0))], dim=-1 ) input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None] model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs) next_token_scores, next_tokens, next_indices = self( **model_inputs, return_dict=True, output_attentions=output_attentions, output_hidden_states=output_hidden_states, beam_scores=beam_scores ) # stateless beam_outputs = beam_scorer.process( input_ids.to("cpu")[:, :cur_len], next_token_scores.to("cpu"), next_tokens.to("cpu"), next_indices.to("cpu"), pad_token_id=pad_token_id, eos_token_id=eos_token_id, beam_indices=beam_indices, ) beam_scores = beam_outputs["next_beam_scores"] beam_next_tokens = beam_outputs["next_beam_tokens"] beam_idx = beam_outputs["next_beam_indices"] update_indices = torch.stack( [torch.arange(batch_beam_size), torch.tensor(cur_len - 1).repeat(batch_beam_size)], dim=-1 ) update_indices_2 = torch.stack( [torch.arange(batch_beam_size), torch.tensor(cur_len).repeat(batch_beam_size)], dim=-1 ) # First select beam_indices device = input_ids.device beam_idx_device = beam_idx.to(device=input_ids.device) input_ids[:, :] = input_ids[beam_idx_device.long(), :] # Then append new tokens input_ids[update_indices_2[:, 0], update_indices_2[:, 1], None] = beam_next_tokens.unsqueeze(-1).to(device).to(torch.long) input_ids = input_ids * 1 # Hack to materialize tensor # update generated ids, model inputs, and length for next step model_kwargs = self._update_model_kwargs_for_xla_generation( model_kwargs, batch_size=batch_beam_size, is_encoder_decoder=self.config.is_encoder_decoder, max_length=stopping_criteria.max_length, seq_length=cur_len, use_cache=model_kwargs["use_cache"], ) if model_kwargs["past_key_values"] is not None: model_kwargs["past_key_values"] = self._reorder_cache(model_kwargs["past_key_values"], beam_idx.to(torch.int64)) if return_dict_in_generate and output_scores: beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices)))) # increase cur_len cur_len = cur_len + 1 # stop when each sentence is finished, or if we exceed the maximum length stop_criterion_1 = beam_scorer.is_done if isinstance(stopping_criteria, list): if len(stopping_criteria) == 1: stopping_criteria = stopping_criteria[0] # Cases that can be handled in XLA without requiring # non-padded input_ids if isinstance(stopping_criteria, MaxLengthCriteria): stop_criterion_2 = cur_len >= stopping_criteria.max_length elif isinstance(stopping_criteria, MaxTimeCriteria): stop_criterion_2 = stopping_criteria(input_ids, scores) else: # Other cases will be handled on CPU batch_size, _ = input_ids.shape input_ids_cpu = input_ids.to("cpu") mask = torch.cat( [torch.ones(batch_size, cur_len), torch.zeros(batch_size, input_ids.shape[1] - cur_len)], dim=1 ).bool() input_ids_cpu = torch.masked_select(input_ids_cpu, mask).reshape((batch_size, cur_len)) scores_cpu = scores.to("cpu") if torch.is_tensor(scores) else scores stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu) if stop_criterion_1 or stop_criterion_2: if not synced_gpus: break else: this_peer_finished = True sequence_outputs = beam_scorer.finalize( input_ids.to("cpu"), beam_scores.to("cpu"), next_tokens.to("cpu"), next_indices.to("cpu"), pad_token_id=pad_token_id, eos_token_id=eos_token_id, max_length=stopping_criteria.max_length, beam_indices=beam_indices, ) for k, v in sequence_outputs.items(): if type(v) == torch.Tensor: sequence_outputs[k] = sequence_outputs[k].to(input_ids.device) return sequence_outputs["sequences"] def greedy_search( self, input_ids: torch.LongTensor, logits_processor: Optional[LogitsProcessorList] = None, stopping_criteria: Optional[StoppingCriteriaList] = None, max_length: Optional[int] = None, pad_token_id: Optional[int] = None, eos_token_id: Optional[Union[int, List[int]]] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, output_scores: Optional[bool] = None, return_dict_in_generate: Optional[bool] = None, seq_length: Optional[int] = int, streamer: Optional["BaseStreamer"] = None, **model_kwargs, ) -> Union[GreedySearchOutput, torch.LongTensor]: """ Overriding greedy sampling to use next tokens returned from neuron device instead of logits. """ # init values logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList() use_cache = model_kwargs["use_cache"] if "use_cache" in model_kwargs else False stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList() pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id if isinstance(eos_token_id, int): eos_token_id = [eos_token_id] eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None output_scores = output_scores if output_scores is not None else self.generation_config.output_scores output_attentions = ( output_attentions if output_attentions is not None else self.generation_config.output_attentions ) output_hidden_states = ( output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states ) # init attention / hidden states / scores tuples scores = () if (return_dict_in_generate and output_scores) else None decoder_attentions = () if (return_dict_in_generate and output_attentions) else None cross_attentions = () if (return_dict_in_generate and output_attentions) else None decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None # keep track of which sequences are already finished unfinished_sequences = torch.ones(input_ids.shape[0], dtype=torch.long, device=input_ids.device) this_peer_finished = False # used by synced_gpus only while True: # prepare model inputs # From max_length-sized input_ids, select first # seq_length - 1 values. if model_kwargs.get("past_key_values") is None: input_ids_ = input_ids[:, :seq_length] else: update_indices = torch.stack( [torch.arange(input_ids.size(0)), torch.tensor(seq_length - 1).repeat(input_ids.size(0))], dim=-1, ) input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None] model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs) # forward pass to get next token output = self( **model_inputs, return_dict=True, output_attentions=output_attentions, output_hidden_states=output_hidden_states, ) next_tokens = output[0] # finished sentences should have their next token be a padding token if eos_token_id is not None: if pad_token_id is None: raise ValueError("If `eos_token_id` is defined, make sure that `pad_token_id` is defined.") next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences) # update generated ids, model inputs, and length for next step batch_size, _ = input_ids.shape update_indices = torch.stack( [torch.arange(batch_size), torch.tensor(seq_length).repeat(batch_size)], dim=-1 ) input_ids[update_indices[:, 0], update_indices[:, 1]] = next_tokens[:] model_kwargs = self._update_model_kwargs_for_xla_generation( model_kwargs, batch_size=batch_size, is_encoder_decoder=self.config.is_encoder_decoder, max_length=stopping_criteria.max_length, seq_length=seq_length, use_cache=use_cache, ) seq_length += 1 # if eos_token was found in one sentence, set sentence to finished if eos_token_id_tensor is not None: unfinished_sequences = unfinished_sequences.mul( next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0) ) # stop when each sentence is finished, or if we exceed the maximum length stop_criterion_1 = unfinished_sequences.max() == 0 if isinstance(stopping_criteria, list): if len(stopping_criteria) == 1: stopping_criteria = stopping_criteria[0] # Cases that can be handled in XLA without requiring # non-padded input_ids if isinstance(stopping_criteria, MaxLengthCriteria): stop_criterion_2 = seq_length >= stopping_criteria.max_length elif isinstance(stopping_criteria, MaxTimeCriteria): stop_criterion_2 = stopping_criteria(input_ids, scores) else: # Other cases will be handled on CPU batch_size, _ = input_ids.shape mask = torch.cat( [torch.ones(batch_size, seq_length), torch.zeros(batch_size, input_ids.shape[1] - seq_length)], dim=1, ).bool() input_ids_cpu = torch.masked_select(input_ids, mask).reshape((batch_size, seq_length)).to("cpu") scores_cpu = scores.to("cpu") if torch.is_tensor(scores) else scores stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu) if stop_criterion_1 or stop_criterion_2: this_peer_finished = True if this_peer_finished: break if streamer is not None: streamer.end() return input_ids class EncoderWrapper(torch.nn.Module): ''' This wrapper converts positional args to kwargs ''' def __init__(self, encoder, decoder, model_config, batch_size, max_length, device, num_beams, tp_degree=None): super().__init__() self.encoder = encoder self.decoder = decoder self.batch_size = batch_size self.max_length = max_length self.model_config = model_config self.device = device self.num_beams = num_beams self.num_attention_heads_per_partition = model_config.num_heads self.tp_degree = tp_degree if self.tp_degree is not None: self.num_attention_heads_per_partition = model_config.num_heads // neuronx_distributed.parallel_layers.parallel_state.get_tensor_model_parallel_size() self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((self.num_beams,self.num_attention_heads_per_partition,self.max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(model_config.num_decoder_layers * 2)]) self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((self.num_beams,self.num_attention_heads_per_partition,self.max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(model_config.num_decoder_layers * 2)]) def forward(self, input_ids, attention_mask): ''' This is the core functionality we want to trace. ''' encoder_output = self.encoder(input_ids=input_ids, attention_mask=attention_mask, output_attentions=False, output_hidden_states=False) last_hidden_state = encoder_output["last_hidden_state"] encoder_hidden_states = torch.concat([tensor.unsqueeze(0).repeat(self.num_beams, 1, 1) for tensor in last_hidden_state]) decoder_blocks = self.decoder.block present_key_value_states_sa = [] present_key_value_states_ca = [] for i, block in enumerate(decoder_blocks): # Cross attention has to be initialized with the encoder hidden state cross_attention: T5LayerCrossAttention = block.layer[1] attention = cross_attention.EncDecAttention def shape(states): """projection""" return states.view(self.batch_size, -1, self.num_attention_heads_per_partition, attention.key_value_proj_dim).transpose(1, 2) key_states = shape(attention.k(encoder_hidden_states)) value_states = shape(attention.v(encoder_hidden_states)) if self.tp_degree is None: # cross_attn_kv_state present_key_value_states_ca.append(key_states) present_key_value_states_ca.append(value_states) # Self attention kv states are initialized to zeros. present_key_value_states_sa.append(torch.zeros((self.batch_size, # key states self.model_config.num_heads, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device=self.device)) present_key_value_states_sa.append(torch.zeros((self.batch_size, # value states self.model_config.num_heads, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device=self.device)) else: # We want to copy the cross attention states (key_states and value_states) into the decoder trace. # One way of doing it is to get the encoder trace to return the kv states as an output and then we can pass it to the decoder trace # as an output. But this requires a copy from device to cpu and back. # # There is no good way to keep the output within the device yet. Until we build that feature, we use this workaround. # The work around uses input_output_aliasing to map the output kv state to an input parameter. The output present_key_value_states_ca # represents the cross attention kv states and is aliased to a similarly named parameter. # # Why are we multiplying past_key_values_ca with 0 and adding to the key or value state? # The trace api will remove any variables that are not used to compute the output tensor. As the past_key_values parameter is not # being used and to compute the kv cache, it would be removed. To avoid that, we use it in an operation that computes the output # but at the same time does not effect the output. present_key_value_states_ca.append((self.past_key_values_ca[i*2] * 0) + key_states) present_key_value_states_ca.append((self.past_key_values_ca[i*2+1] * 0) + value_states) present_key_value_states_sa.append(self.past_key_values_sa[i*2]*torch.zeros((self.batch_size, self.num_attention_heads_per_partition, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device="xla")) present_key_value_states_sa.append(self.past_key_values_sa[i*2+1]*torch.zeros((self.batch_size, self.num_attention_heads_per_partition, self.max_length-1, self.model_config.d_kv), dtype=torch.float32, device="xla")) return present_key_value_states_sa + present_key_value_states_ca class DecoderWrapper(torch.nn.Module): def __init__(self, decoder: T5Stack, lm_head: torch.nn.Linear, model_config, num_beams: int, max_length: int, device: str, tp_degree=None): super().__init__() self.decoder = decoder self.lm_head = lm_head self.model_dim=model_config.d_model self.device = device self.num_beams = num_beams self.batch_size = 1 self.config = model_config num_heads=model_config.num_heads num_decoder_layers=model_config.num_decoder_layers self.num_attention_heads_per_partition = num_heads if tp_degree is not None: self.num_attention_heads_per_partition = num_heads // neuronx_distributed.parallel_layers.parallel_state.get_tensor_model_parallel_size() # (num_beams, n_heads, seq_length, dim_per_head) if device == "cpu": self.past_key_values_sa = [torch.ones((num_beams,num_heads,max_length-1,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)] self.past_key_values_ca = [torch.ones((num_beams,num_heads,max_length,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)] elif device == "xla": self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)]) self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)]) def update_past(self, past_key_values): new_past_sa = [] new_past_ca = [] for past_layer in past_key_values: new_past_layer = list(past_layer) for i in range(len(new_past_layer[:2])): new_past_layer[i] = past_layer[i][:, :, 1:] new_past_sa += [new_past_layer[:2],] new_past_ca += [new_past_layer[2:],] return new_past_sa, new_past_ca def reorder_cache(self, past_key_values, beam_idx): for i in range(len(past_key_values)): past_key_values[i] = torch.index_select(past_key_values[i], 0, beam_idx) return past_key_values def forward(self, input_ids, decoder_attention_mask, encoder_hidden_states, encoder_attention_mask, beam_idx, beam_scores, **kwargs): if self.num_beams > 1: # We reorder the cache based on the beams selected in each iteration. Required step for beam search. past_key_values_sa = self.reorder_cache(self.past_key_values_sa, beam_idx) past_key_values_ca = self.reorder_cache(self.past_key_values_ca, beam_idx) else: # We do not need to reorder for greedy sampling past_key_values_sa = self.past_key_values_sa past_key_values_ca = self.past_key_values_ca # The cache is stored in a flatten form. We order the cache per layer before passing it to the decoder. # Each layer has 4 tensors, so we group by 4. past_key_values = [[*past_key_values_sa[i*2:i*2+2], *past_key_values_ca[i*2:i*2+2]] for i in range(0, int(len(past_key_values_ca)/2))] decoder_output = self.decoder( input_ids=input_ids, attention_mask=decoder_attention_mask, past_key_values=past_key_values, encoder_hidden_states=encoder_hidden_states, encoder_attention_mask=encoder_attention_mask, use_cache=True, output_attentions=False, output_hidden_states=False) last_hidden_state = decoder_output['last_hidden_state'] past_key_values = decoder_output['past_key_values'] if self.config.tie_word_embeddings: last_hidden_state = last_hidden_state * (self.model_dim**-0.5) lm_logits = self.lm_head(last_hidden_state) past_key_values_sa, past_key_values_ca = self.update_past(past_key_values) # We flatten the cache to a single array. This is required for the input output aliasing to work past_key_values_sa = [vec for kv_per_layer in past_key_values_sa for vec in kv_per_layer] past_key_values_ca = [vec for kv_per_layer in past_key_values_ca for vec in kv_per_layer] if self.device == "cpu": self.past_key_values_sa = past_key_values_sa self.past_key_values_ca = past_key_values_ca # Moving the topk inside next_token_logits = lm_logits[:, -1, :] if self.num_beams > 1: logit_max, _ = torch.max(next_token_logits, dim=-1, keepdim=True) logsumexp = torch.log(torch.exp(next_token_logits - logit_max).sum(dim=-1, keepdim=True)) next_token_scores = next_token_logits - logit_max - logsumexp next_token_scores = next_token_scores + beam_scores[:, None].expand_as(next_token_scores) # reshape for beam search vocab_size = next_token_scores.shape[-1] next_token_scores = next_token_scores.view(self.batch_size, self.num_beams * vocab_size) next_token_scores = next_token_scores * 1 # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search) next_token_scores, next_tokens = torch.topk( next_token_scores, 2 * self.num_beams, dim=1, largest=True, sorted=True ) next_indices = torch.div(next_tokens, vocab_size, rounding_mode="floor") next_tokens = next_tokens % vocab_size return [next_token_scores, next_tokens, next_indices] + past_key_values_sa + past_key_values_ca else: # Greedy next_tokens = torch.argmax(next_token_logits, dim=-1) return [next_tokens] + past_key_values_sa + past_key_values_ca ================================================ FILE: src/examples/pytorch/pipeline_tutorial/neuroncore_pipeline_pytorch.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "variable-character", "metadata": {}, "source": [ "# Using NeuronCore Pipeline with PyTorch" ] }, { "cell_type": "markdown", "id": "valued-economics", "metadata": {}, "source": [ "In this tutorial you compile a pretrained BERT base model from HuggingFace 🤗 Transformers, using the NeuronCore Pipeline feature of the AWS Neuron SDK. You benchmark model latency of the pipeline parallel mode and compare with the usual data parallel (multi-worker) deployment.\n", "\n", "This tutorial is intended to run in an inf1.6xlarge, running the latest AWS Deep Learning AMI (DLAMI). The inf1.6xlarge instance size has AWS Inferentia chips for a total of 16 NeuronCores.\n", "\n", "Verify that this Jupyter notebook is running the Python or Conda kernel environment that was set up according to the [PyTorch Installation Guide](../../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page.\n", "\n", "> __Note:__ Do not execute this tutorial using \"Run -> Run all cells\" option. " ] }, { "cell_type": "markdown", "id": "private-authentication", "metadata": {}, "source": [ "## Install Dependencies:\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuron`\n", "- `neuron-cc[tensorflow]`\n", "- `transformers`\n", "\n", "Most of these packages will be installed when configuring your environment using the Neuron PyTorch setup guide. The additional HuggingFace 🤗 Transformers dependency must be installed here." ] }, { "cell_type": "code", "execution_count": null, "id": "romantic-accident", "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install --upgrade \"transformers==4.6.0\"" ] }, { "cell_type": "markdown", "id": "prompt-australian", "metadata": {}, "source": [ "## Compiling a BERT base model for a single NeuronCore" ] }, { "cell_type": "markdown", "id": "aging-biodiversity", "metadata": {}, "source": [ "To run a HuggingFace [BERTModel](https://huggingface.co/transformers/model_doc/bert.html#bertmodel) on Inferentia, you only need to add a single extra line of code to the usual 🤗 Transformers PyTorch implementation, after importing the torch_neuron framework. \n", "\n", "Add the argument `return_dict=False` to the BERT transformers model so it can be traced with [TorchScript](https://pytorch.org/docs/stable/jit.html). TorchScript is a way to create serializable and optimizable models from PyTorch code. \n", "\n", "Enable padding to a maximum sequence length of 128, to test the model's performance with a realistic payload size. You can adapt this sequence length to your application's requirement. \n", "\n", "You can adapt the original example on the [BertModel forward pass docstring](https://huggingface.co/transformers/model_doc/bert.html#transformers.BertModel.forward) according to the following cell\n" ] }, { "cell_type": "code", "execution_count": null, "id": "stretch-preview", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuron\n", "from transformers import BertTokenizer, BertModel\n", "\n", "from joblib import Parallel, delayed \n", "import numpy as np\n", "from tqdm import tqdm\n", "\n", "import os\n", "import time \n", "\n", "\n", "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n", "model = BertModel.from_pretrained('bert-base-uncased',return_dict=False)\n", "\n", "inputs = tokenizer(\"Hello, my dog is cute\",return_tensors=\"pt\",max_length=128,padding='max_length',truncation=True)\n" ] }, { "cell_type": "markdown", "id": "conceptual-aberdeen", "metadata": {}, "source": [ "The one extra line required is the call to torch.neuron.trace() method. This call compiles the model and returns the forwad method of the torch `nn.Model` method, which you can use to run inference. \n", "\n", "The compiled graph can be saved using the `torch.jit.save` function and restored using `torch.jit.load` function for inference on Inf1 instances. During inference, the previously compiled artifacts will be loaded into the Neuron Runtime for inference execution.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "secondary-exclusive", "metadata": {}, "outputs": [], "source": [ "neuron_model = torch.neuron.trace(model, \n", " example_inputs = (inputs['input_ids'],inputs['attention_mask']),\n", " verbose=1)\n" ] }, { "cell_type": "markdown", "id": "atmospheric-stewart", "metadata": {}, "source": [ "## Running the BERT base model on a single NeuronCore\n", "With the model already available in memory, you can time one execution and check for the latency on the single inference call. You will load the model into Inferentia with a single inference call. A large \"wall time\" is expected when you first run the next cell, running the cell twice will show the actual inference latency:" ] }, { "cell_type": "code", "execution_count": null, "id": "approved-reputation", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# The following line tests inference and should be executed on Inf1 instance family. \n", "outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))" ] }, { "cell_type": "markdown", "id": "great-collective", "metadata": {}, "source": [ "You can also check for the throughput of the single model running on a single NeuronCore.\n", "\n", "The sequential inference test (for loop) does not measure all the performance one can achieve in an instance with multiple NeuronCores. To improve hardwar utilization you can run parallel inference requests over multiple model workers, which you'll test in the Data Parallel Bonus Section below." ] }, { "cell_type": "code", "execution_count": null, "id": "framed-reference", "metadata": {}, "outputs": [], "source": [ "%%time\n", "for _ in tqdm(range(100)):\n", " outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask'])) " ] }, { "cell_type": "markdown", "id": "super-innocent", "metadata": {}, "source": [ "Save the compiled model for later use:" ] }, { "cell_type": "code", "execution_count": null, "id": "express-greensboro", "metadata": {}, "outputs": [], "source": [ "neuron_model.save('bert-base-uncased-neuron.pt')" ] }, { "cell_type": "markdown", "id": "modified-government", "metadata": {}, "source": [ "## Compiling a BERT base model for 16 NeuronCores\n", "\n", "Our next step is to compile the same model for all 16 NeuronCores available in the inf1.6xlarge and check the performance difference when running pipeline parallel inferences.. " ] }, { "cell_type": "code", "execution_count": null, "id": "compound-initial", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuron\n", "from transformers import BertTokenizer, BertModel\n", "\n", "from joblib import Parallel, delayed \n", "import numpy as np\n", "from tqdm import tqdm\n", "\n", "import os\n", "import time \n", "\n", "\n", "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n", "model = BertModel.from_pretrained('bert-base-uncased',return_dict=False)\n", "\n", "inputs = tokenizer(\"Hello, my dog is cute\",return_tensors=\"pt\",max_length=128,padding='max_length',truncation=True)\n" ] }, { "cell_type": "markdown", "id": "universal-desperate", "metadata": {}, "source": [ "To enable pipeline mode during compilation, you need only to add the compiler flag `--neuroncore-pipeline-cores` and set the number of desired cores. The cell below sets up a `neuroncore_pipeline_cores` string, which you can set for the available number of NeuronCores on the instance: _inf1.6xlarge_ has 16 NeuronCores in 4 Inferentia chips. \n" ] }, { "cell_type": "code", "execution_count": null, "id": "passing-masters", "metadata": {}, "outputs": [], "source": [ "# Number of Cores in the Pipeline Mode\n", "neuroncore_pipeline_cores = 16 # This string should be '4' on an inf1.xlarge\n", "\n", "# Compiling for neuroncore-pipeline-cores='16'\n", "neuron_pipeline_model = torch.neuron.trace(model,\n", " example_inputs = (inputs['input_ids'],inputs['attention_mask']),\n", " verbose=1,\n", " compiler_args = ['--neuroncore-pipeline-cores', str(neuroncore_pipeline_cores)]\n", " )" ] }, { "cell_type": "markdown", "id": "enhanced-swedish", "metadata": {}, "source": [ "## Running the BERT base model on 16 NeuronCores\n", "Next, time one execution and check for the latency on the single inference call over 16 cores. You will load the model into Inferentia with a single inference call. A large \"wall time\" is expected when you first run the next cell, running the cell twice will show the actual inference latency:" ] }, { "cell_type": "code", "execution_count": null, "id": "expressed-trinity", "metadata": {}, "outputs": [], "source": [ "%%time\n", "# The following line tests inference and should be executed on Inf1 instance family. \n", "outputs = neuron_pipeline_model(*(inputs['input_ids'],inputs['attention_mask']))" ] }, { "cell_type": "markdown", "id": "located-graphic", "metadata": {}, "source": [ "Check also for the throughput of the single model running over a 16 NeuronCores. \n", "\n", "The sequential inference test (for loop) does not measure all the performance one can achieve with Pipeline mode. As the inference runs in streaming fashion, at least 15 cores are waiting for a new call until the last one processes the first call. This results in low NeuronCore utilization. To improve hardware utilization you will require parallel inference requests, which you'll test in the next section." ] }, { "cell_type": "code", "execution_count": null, "id": "hydraulic-calcium", "metadata": {}, "outputs": [], "source": [ "for _ in tqdm(range(100)):\n", " outputs = neuron_pipeline_model(*(inputs['input_ids'],inputs['attention_mask']))\n", " " ] }, { "cell_type": "markdown", "id": "patent-victoria", "metadata": {}, "source": [ "## Load Testing the Pipeline Parallel Mode\n", "\n", "To put the 16 NeuronCores group to test, a client has to run concurrent requests to the model. In this Notebook setup you achieve it by creating a thread pool with `Joblib.Parallel`, with all workers on the pool runing one inference call. \n", "\n", "You can define a new method called `inference_latency()` so that you measure the amount of time each inference calls take." ] }, { "cell_type": "code", "execution_count": null, "id": "appointed-adventure", "metadata": {}, "outputs": [], "source": [ "def inference_latency(model,*inputs):\n", " \"\"\"\n", " infetence_time is a simple method to return the latency of a model inference.\n", " \n", " Parameters:\n", " model: torch model onbject loaded using torch.jit.load\n", " inputs: model() args\n", " \n", " Returns:\n", " latency in seconds\n", " \"\"\"\n", " start = time.time()\n", " _ = model(*inputs)\n", " return time.time() - start" ] }, { "cell_type": "markdown", "id": "environmental-guinea", "metadata": {}, "source": [ "Use `tqdm` to measure total throughput of your experiment, with a nice side-effect of \"cool progress bar!\". The total throughput is expected to be high, so set your experiment range to a large number, here 30k inferences. \n", "\n", "To calculate the latency statistics over the returned 30k list of latencies use `numpy.qunatile()` method." ] }, { "cell_type": "code", "execution_count": null, "id": "played-catch", "metadata": {}, "outputs": [], "source": [ "t = tqdm(range(30000), position=0, leave=True)\n", "latency = Parallel(n_jobs=12,prefer=\"threads\")(delayed(inference_latency)(neuron_pipeline_model,*(inputs['input_ids'],inputs['attention_mask'])) for i in t)\n", "\n", "p50 = np.quantile(latency[-10000:],0.50) * 1000\n", "p95 = np.quantile(latency[-10000:],0.95) * 1000\n", "p99 = np.quantile(latency[-10000:],0.99) * 1000\n", "avg_throughput = t.total/t.format_dict['elapsed']\n", "print(f'Avg Throughput: :{avg_throughput:.1f}')\n", "print(f'50th Percentile Latency:{p50:.1f} ms')\n", "print(f'95th Percentile Latency:{p95:.1f} ms')\n", "print(f'99th Percentile Latency:{p99:.1f} ms')" ] }, { "cell_type": "markdown", "id": "exposed-northern", "metadata": {}, "source": [ "Save compile model for later use:" ] }, { "cell_type": "code", "execution_count": null, "id": "imperial-complex", "metadata": {}, "outputs": [], "source": [ "# Save the TorchScript graph\n", "neuron_pipeline_model.save('bert-base-uncased-neuron-pipeline.pt')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "abroad-earthquake", "metadata": {}, "source": [ "## Bonus Section - Load Testing Data Parallel Mode" ] }, { "cell_type": "code", "execution_count": null, "id": "therapeutic-detector", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuron\n", "from transformers import BertTokenizer \n", "\n", "from joblib import Parallel, delayed \n", "import numpy as np\n", "from tqdm import tqdm\n", "\n", "import os\n", "import time \n", "\n", "def inference_latency(model,*inputs):\n", " \"\"\"\n", " infetence_time is a simple method to return the latency of a model inference.\n", " \n", " Parameters:\n", " model: torch model onbject loaded using torch.jit.load\n", " inputs: model() args\n", " \n", " Returns:\n", " latency in seconds\n", " \"\"\"\n", " start = time.time()\n", " _ = model(*inputs)\n", " return time.time() - start\n", "\n", "tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n", "\n", "inputs = tokenizer(\"Hello, my dog is cute\",return_tensors=\"pt\",max_length=128,padding='max_length',truncation=True)\n" ] }, { "cell_type": "markdown", "id": "legal-terrorist", "metadata": {}, "source": [ "You use the `'NEURON_RT_NUM_CORES'` environment variable to define how many Neuron cores to be used. Set the environment variable to the number of individual workers you want to test in parallel.\n", "\n", "`torch_neuron` will load one model per NeuronCore group until it runs out of cores. At that point, if the Python process continues to spawn more model objest using `torch.jit.load`, `torch_neuron` will start stacking more than one model per core, until the Inferentia chip memory is full. \n", "\n", "Inferentia is able to run inference over all the loaded models, but only one at a time. The Neuron Runtime takes care of dynamically switching the model context as requests come in, no extra worker process management required. Use 1 model per NeuronCore to achieve maximum performance.\n", "\n", "The following cell creates a list with as many models as NeuronCore Groups and execute one single dummy inference to load the models into Inferentia. " ] }, { "cell_type": "code", "execution_count": null, "id": "current-mechanics", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "# Number of data parallel workers\n", "number_of_workers=16 # This number should be 4 on an inf1.xlarge\n", "\n", "# Setting up a data parallel group\n", "os.environ['NEURON_RT_NUM_CORES'] = str(number_of_workers)\n", "\n", "# Loading 'number_of_workers' amount of models in Python memory\n", "model_list = [torch.jit.load('bert-base-uncased-neuron.pt') for _ in range(number_of_workers)]\n", "\n", "# Dummy inference to load models to Inferentia\n", "_ = [mod(*(inputs['input_ids'],inputs['attention_mask'])) for mod in model_list]\n" ] }, { "cell_type": "markdown", "id": "threatened-swaziland", "metadata": {}, "source": [ "Adapt the call to `joblib.Parallel()` iterating over a concatenated version of the `model_list`, to run 'round-robin' calls to each of the model workers. " ] }, { "cell_type": "code", "execution_count": null, "id": "fleet-month", "metadata": {}, "outputs": [], "source": [ "t = tqdm(model_list*1500,position=0, leave=True)\n", "latency = Parallel(n_jobs=number_of_workers,prefer=\"threads\")(delayed(inference_latency)(mod,*(inputs['input_ids'],inputs['attention_mask'])) for mod in t)\n", "\n", "p50 = np.quantile(latency[-10000:],0.50) * 1000\n", "p95 = np.quantile(latency[-10000:],0.95) * 1000\n", "p99 = np.quantile(latency[-10000:],0.99) * 1000\n", "avg_throughput = t.total/t.format_dict['elapsed']\n", "print(f'Avg Throughput: :{avg_throughput:.1f}')\n", "print(f'50th Percentile Latency:{p50:.1f} ms')\n", "print(f'95th Percentile Latency:{p95:.1f} ms')\n", "print(f'99th Percentile Latency:{p99:.1f} ms')" ] }, { "cell_type": "markdown", "id": "aggressive-stevens", "metadata": {}, "source": [ "For this model, despite the larger number of workers, the per-worker latency increases when running a single model per core, which in turn reduces the total throughput. \n", "\n", "This behavior may not repeat if the model memory footprint or the input payload size changes, i.e batch size > 1. We encourage you to experiment with the data parallel and pipeline parallel modes to optimize your application performance. " ] } ], "metadata": { "kernelspec": { "display_name": "Environment (conda_aws_neuron_pytorch_p36)", "language": "python", "name": "conda_aws_neuron_pytorch_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/pytorch/resnet50.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ResNet50 model for Inferentia\n", "\n", "\n", "## Introduction:\n", "\n", "In this tutorial we will compile and deploy a ResNet50 model for inference on Inferentia. \n", "\n", "This Jupyter notebook should run on an inf1.6xlarge instance. The inference part of this tutorial requires an inf1 instance, not the compilation stage. For simplicity we will run this tutorial on an inf1.6xlarge, but in real life scenarios the compilation should be done on a compute instance and the deployment on an inf1 instance to save costs. \n", "\n", "In this tutorial we provide three main sections:\n", "\n", "1. Compile the ResNet50 model and infer with a batch size of 1\n", "\n", "2. Run the same compiled model on multiple NeuronCores using `torch.neuron.DataParallel` and dynamic batching\n", "\n", "3. Compile the ResNet50 model with a batch size of 5 and run it on multiple NeuronCores using `torch.neuron.DataParallel` for optimal performance on Inferentia\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Dependencies:\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch>=1.8`\n", "- `torch-neuron`\n", "- `torchvision`\n", "- `neuron-cc[tensorflow]`\n", "\n", "These will be installed by default when configuring your environment using the Neuron PyTorch setup guide." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile model for Neuron\n", "\n", "The following step will compile the ResNet50 model for Inferentia. This will take a few minutes. At the end of script execution, the compiled model is saved as `resnet50_neuron.pt` in your local directory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torchvision import models, transforms, datasets\n", "import torch_neuron\n", "\n", "# Create an example input for compilation\n", "image = torch.zeros([1, 3, 224, 224], dtype=torch.float32)\n", "\n", "# Load a pretrained ResNet50 model\n", "model = models.resnet50(pretrained=True)\n", "\n", "# Tell the model we are using it for evaluation (not training)\n", "model.eval()\n", "\n", "# Analyze the model - this will show operator support and operator count\n", "torch.neuron.analyze_model(model, example_inputs=[image])\n", "\n", "# Compile the model using torch.neuron.trace to create a Neuron model\n", "# that that is optimized for the Inferentia hardware\n", "model_neuron = torch.neuron.trace(model, example_inputs=[image])\n", "\n", "# The output of the compilation step will report the percentage of operators that \n", "# are compiled to Neuron, for example:\n", "#\n", "# INFO:Neuron:The neuron partitioner created 1 sub-graphs\n", "# INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%\n", "# \n", "# We will also be warned if there are operators that are not placed on the Inferentia hardware\n", "\n", "# Save the compiled model\n", "model_neuron.save(\"resnet50_neuron.pt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run inference on Inferentia\n", "\n", "We can use the compiled Neuron model to run inference on Inferentia.\n", "\n", "In the following example, we preprocess a sample image for inference using the CPU model and Neuron model. We compare the predicted labels from the CPU model and Neuron model to verify that they are the same.\n", "\n", "Important: Do not perform inference with a Neuron traced model on a non-Neuron supported instance, as the results will not be calculated properly." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define a preprocessing function\n", "\n", "We define a basic image preprocessing function that loads a sample image and labels, normalizes and batches the image, and transforms the image into a tensor for inference using the compiled Neuron model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "from urllib import request\n", "\n", "# Create an image directory containing a sample image of a small kitten\n", "os.makedirs(\"./torch_neuron_test/images\", exist_ok=True)\n", "request.urlretrieve(\"https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\",\n", " \"./torch_neuron_test/images/kitten_small.jpg\")\n", "\n", "# Fetch labels to output the top classifications\n", "request.urlretrieve(\"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\",\"imagenet_class_index.json\")\n", "idx2label = []\n", "\n", "# Read the labels and create a list to hold them for classification \n", "with open(\"imagenet_class_index.json\", \"r\") as read_file:\n", " class_idx = json.load(read_file)\n", " idx2label = [class_idx[str(k)][1] for k in range(len(class_idx))]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def preprocess(batch_size=1, num_neuron_cores=1):\n", " # Define a normalization function using the ImageNet mean and standard deviation\n", " normalize = transforms.Normalize(\n", " mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225])\n", "\n", " # Resize the sample image to [1, 3, 224, 224], normalize it, and turn it into a tensor\n", " eval_dataset = datasets.ImageFolder(\n", " os.path.dirname(\"./torch_neuron_test/\"),\n", " transforms.Compose([\n", " transforms.Resize([224, 224]),\n", " transforms.ToTensor(),\n", " normalize,\n", " ])\n", " )\n", " image, _ = eval_dataset[0]\n", " image = torch.tensor(image.numpy()[np.newaxis, ...])\n", "\n", " # Create a \"batched\" image with enough images to go on each of the available NeuronCores\n", " # batch_size is the per-core batch size\n", " # num_neuron_cores is the number of NeuronCores being used\n", " batch_image = image\n", " for i in range(batch_size * num_neuron_cores - 1):\n", " batch_image = torch.cat([batch_image, image], 0)\n", " \n", " return batch_image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run inference using the Neuron model\n", "\n", "We import the necessary python modules, load the torch-neuron compiled model, and run inference on Inferentia. \n", "\n", "By default, the Neuron model will run on a single NeuronCore. In the next section, we will see how to run the Neuron model on multiple NeuronCores to fully saturate our hardware for optimal performance on Inferentia. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torchvision import models, transforms, datasets\n", "import torch_neuron\n", "\n", "# Get a sample image\n", "image = preprocess()\n", "\n", "# Run inference using the CPU model\n", "output_cpu = model(image)\n", "\n", "# Load the compiled Neuron model\n", "model_neuron = torch.jit.load('resnet50_neuron.pt')\n", "\n", "# Run inference using the Neuron model\n", "output_neuron = model_neuron(image)\n", "\n", "# Verify that the CPU and Neuron predictions are the same by comparing\n", "# the top-5 results\n", "top5_cpu = output_cpu[0].sort()[1][-5:]\n", "top5_neuron = output_neuron[0].sort()[1][-5:]\n", "\n", "# Lookup and print the top-5 labels\n", "top5_labels_cpu = [idx2label[idx] for idx in top5_cpu]\n", "top5_labels_neuron = [idx2label[idx] for idx in top5_neuron]\n", "print(\"CPU top-5 labels: {}\".format(top5_labels_cpu))\n", "print(\"Neuron top-5 labels: {}\".format(top5_labels_neuron))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Inference using torch.neuron.DataParallel\n", "\n", "To fully leverage the Inferentia hardware we want to use all avaialable NeuronCores. An inf1.xlarge and inf1.2xlarge have four NeuronCores, an inf1.6xlarge has 16 NeuronCores, and an inf1.24xlarge has 64 NeuronCores. For maximum performance on Inferentia hardware, we can use `torch.neuron.DataParallel` to utilize all available NeuronCores.\n", "\n", "`torch.neuron.DataParallel` implements data parallelism at the module level by duplicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference.\n", "\n", "In the following section, we will run inference using the `torch.neuron.DataParallel` module to fully saturate the Inferentia hardware. We benchmark the model to collect throughput and latency statistics.\n", "\n", "Note: `torch.neuron.DataParallel` is new with Neuron 1.16.0. Please ensure you are using the latest Neuron package to run the following sections. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define a benchmarking function\n", "\n", "We create a function that handles benchmarking the Neuron model to collect throughput and latency metrics. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from time import time\n", "\n", "def benchmark(model, image):\n", " print('Input image shape is {}'.format(list(image.shape)))\n", " \n", " # The first inference loads the model so exclude it from timing \n", " results = model(image)\n", " \n", " # Collect throughput and latency metrics\n", " latency = []\n", " throughput = []\n", "\n", " # Run inference for 100 iterations and calculate metrics\n", " num_infers = 100\n", " for _ in range(num_infers):\n", " delta_start = time()\n", " results = model(image)\n", " delta = time() - delta_start\n", " latency.append(delta)\n", " throughput.append(image.size(0)/delta)\n", " \n", " # Calculate and print the model throughput and latency\n", " print(\"Avg. Throughput: {:.0f}, Max Throughput: {:.0f}\".format(np.mean(throughput), np.max(throughput)))\n", " print(\"Latency P50: {:.0f}\".format(np.percentile(latency, 50)*1000.0))\n", " print(\"Latency P90: {:.0f}\".format(np.percentile(latency, 90)*1000.0))\n", " print(\"Latency P95: {:.0f}\".format(np.percentile(latency, 95)*1000.0))\n", " print(\"Latency P99: {:.0f}\\n\".format(np.percentile(latency, 99)*1000.0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run Inference using torch.neuron.DataParallel\n", "\n", "We create the `torch.neuron.DataParallel` module using the compiled Neuron model, get a sample image, and benchmark the parallelized model on Neuron." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a torch.neuron.DataParallel module using the compiled Neuron model\n", "# By default, torch.neuron.DataParallel will use four cores on an inf1.xlarge\n", "# or inf1.2xlarge, 16 cores on an inf1.6xlarge, and 24 cores on an inf1.24xlarge\n", "model_neuron_parallel = torch.neuron.DataParallel(model_neuron)\n", "\n", "# Get sample image with batch size=1 per NeuronCore\n", "batch_size = 1\n", "\n", "# For an inf1.xlarge or inf1.2xlarge, set num_neuron_cores = 4\n", "num_neuron_cores = 16\n", "\n", "image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\n", "\n", "# Benchmark the model\n", "benchmark(model_neuron_parallel, image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run inference with dynamic batch sizes\n", "\n", "Batch size has a direct impact on model performance. The Inferentia chip is optimized to run with small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\n", "\n", "As a general best practice, we recommend optimizing your model's throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Inferentia.\n", "\n", "Dynamic batching is a feature that allows you to use tensor batch sizes that the Neuron model was not originally compiled against. This is necessary because the underlying Inferentia hardware will always execute inferences with the batch size used during compilation. Fixed batch size execution allows tuning the input batch size for optimal performance. For example, batch size 1 may be best suited for an ultra-low latency on-demand inference application, while batch size > 1 can be used to maximize throughput for offline inferencing. Dynamic batching is implemented by slicing large input tensors into chunks that match the batch size used during the `torch.neuron.trace` compilation call. \n", "\n", "The `torch.neuron.DataParallel` class automatically enables dynamic batching on eligible models. This allows us to run inference in applications that have inputs with a variable batch size without needing to recompile the model.\n", "\n", "In the following example, we use the same `torch.neuron.DataParallel` module to run inference using several different batch sizes. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# using the same DataParallel model_neuron_parallel model, we can run\n", "# inference on inputs with a variable batch size without recompiling\n", "batch_sizes = [2, 3, 4, 5, 6, 7]\n", "for batch_size in batch_sizes:\n", " print('Batch size: {}'.format(batch_size))\n", " image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\n", " \n", " # Benchmark the model for each input batch size\n", " benchmark(model_neuron_parallel, image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile and Infer with different batch sizes on multiple NeuronCores\n", "\n", "Dynamic batching using small batch sizes can result in sub-optimal throughput because it involves slicing tensors into chunks and iteratively sending data to the hardware. Using a larger batch size at compilation time can use the Inferentia hardware more efficiently in order to maximize throughput. You can test the tradeoff between individual request latency and total throughput by fine-tuning the input batch size.\n", "\n", "In the following example, we recompile our model using a batch size of 5 and run the model using `torch.neuron.DataParallel` to fully saturate our Inferentia hardware for optimal performance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create an input with batch size 5 for compilation\n", "batch_size = 5\n", "image = torch.zeros([batch_size, 3, 224, 224], dtype=torch.float32)\n", "\n", "# Recompile the ResNet50 model for inference with batch size 5\n", "model_neuron = torch.neuron.trace(model, example_inputs=[image])\n", "\n", "# Export to saved model\n", "model_neuron.save(\"resnet50_neuron_b{}.pt\".format(batch_size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run inference with batch size of 5 using the Neuron model compiled for a batch size of 5." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = 5\n", "\n", "# Load compiled Neuron model\n", "model_neuron = torch.jit.load(\"resnet50_neuron_b{}.pt\".format(batch_size))\n", "\n", "# Create DataParallel model\n", "model_neuron_parallel = torch.neuron.DataParallel(model_neuron)\n", "\n", "# Get sample image with batch size=5\n", "image = preprocess(batch_size=batch_size, num_neuron_cores=num_neuron_cores)\n", "\n", "# Benchmark the model\n", "benchmark(model_neuron_parallel, image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can experiment with different batch size values to see what gives the best overall throughput on Inferentia." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/pytorch/resnet50_partition.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Manual Partitioning Tutorial\n", "\n", "In this tutorial we will run through how to manually partition a graph. There are six steps:\n", "\n", "1. Import ResNet50 code from torchvision and set to evaluation mode\n", "1. Download a test image and preprocess it\n", "1. Run inference on CPU as a baseline\n", "1. Manually partition the graph using Neuron\n", "1. Save the model to be loaded on another instance\n", "1. Inspect the graph to deepen our understanding\n", "\n", "The following is a ResNet50 implementation copied from `torchvision.models.resnet`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 1:** Import torchvision ResNet50 and run the model on CPU\n", "\n", "Note that training code can be inserted before `model.eval()` if retraining/fine-tuning is necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "\n", "\n", "def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):\n", " \"\"\"3x3 convolution with padding\"\"\"\n", " return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\n", " padding=dilation, groups=groups, bias=False, dilation=dilation)\n", "\n", "\n", "def conv1x1(in_planes, out_planes, stride=1):\n", " \"\"\"1x1 convolution\"\"\"\n", " return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)\n", "\n", "\n", "class Bottleneck(nn.Module):\n", " expansion = 4\n", "\n", " def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,\n", " base_width=64, dilation=1, norm_layer=None):\n", " super(Bottleneck, self).__init__()\n", " if norm_layer is None:\n", " norm_layer = nn.BatchNorm2d\n", " width = int(planes * (base_width / 64.)) * groups\n", " # Both self.conv2 and self.downsample layers downsample the input when stride != 1\n", " self.conv1 = conv1x1(inplanes, width)\n", " self.bn1 = norm_layer(width)\n", " self.conv2 = conv3x3(width, width, stride, groups, dilation)\n", " self.bn2 = norm_layer(width)\n", " self.conv3 = conv1x1(width, planes * self.expansion)\n", " self.bn3 = norm_layer(planes * self.expansion)\n", " self.relu = nn.ReLU(inplace=True)\n", " self.downsample = downsample\n", " self.stride = stride\n", "\n", " def forward(self, x):\n", " identity = x\n", "\n", " out = self.conv1(x)\n", " out = self.bn1(out)\n", " out = self.relu(out)\n", "\n", " out = self.conv2(out)\n", " out = self.bn2(out)\n", " out = self.relu(out)\n", "\n", " out = self.conv3(out)\n", " out = self.bn3(out)\n", "\n", " if self.downsample is not None:\n", " identity = self.downsample(x)\n", "\n", " out += identity\n", " out = self.relu(out)\n", "\n", " return out\n", "\n", "\n", "class ResNet(nn.Module):\n", "\n", " def __init__(self, block, layers, num_classes=1000, zero_init_residual=False,\n", " groups=1, width_per_group=64, replace_stride_with_dilation=None,\n", " norm_layer=None):\n", " super(ResNet, self).__init__()\n", " if norm_layer is None:\n", " norm_layer = nn.BatchNorm2d\n", " self._norm_layer = norm_layer\n", "\n", " self.inplanes = 64\n", " self.dilation = 1\n", " if replace_stride_with_dilation is None:\n", " # each element in the tuple indicates if we should replace\n", " # the 2x2 stride with a dilated convolution instead\n", " replace_stride_with_dilation = [False, False, False]\n", " if len(replace_stride_with_dilation) != 3:\n", " raise ValueError(\"replace_stride_with_dilation should be None \"\n", " \"or a 3-element tuple, got {}\".format(replace_stride_with_dilation))\n", " self.groups = groups\n", " self.base_width = width_per_group\n", " self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,\n", " bias=False)\n", " self.bn1 = norm_layer(self.inplanes)\n", " self.relu = nn.ReLU(inplace=True)\n", " self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n", " self.layer1 = self._make_layer(block, 64, layers[0])\n", " self.layer2 = self._make_layer(block, 128, layers[1], stride=2,\n", " dilate=replace_stride_with_dilation[0])\n", " self.layer3 = self._make_layer(block, 256, layers[2], stride=2,\n", " dilate=replace_stride_with_dilation[1])\n", " self.layer4 = self._make_layer(block, 512, layers[3], stride=2,\n", " dilate=replace_stride_with_dilation[2])\n", " self.avgpool = nn.AdaptiveAvgPool2d((1, 1))\n", " self.fc = nn.Linear(512 * block.expansion, num_classes)\n", "\n", " for m in self.modules():\n", " if isinstance(m, nn.Conv2d):\n", " nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n", " elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n", " nn.init.constant_(m.weight, 1)\n", " nn.init.constant_(m.bias, 0)\n", "\n", " # Zero-initialize the last BN in each residual branch,\n", " # so that the residual branch starts with zeros, and each residual block behaves like an identity.\n", " # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677\n", " if zero_init_residual:\n", " for m in self.modules():\n", " if isinstance(m, Bottleneck):\n", " nn.init.constant_(m.bn3.weight, 0)\n", " elif isinstance(m, BasicBlock):\n", " nn.init.constant_(m.bn2.weight, 0)\n", "\n", " def _make_layer(self, block, planes, blocks, stride=1, dilate=False):\n", " norm_layer = self._norm_layer\n", " downsample = None\n", " previous_dilation = self.dilation\n", " if dilate:\n", " self.dilation *= stride\n", " stride = 1\n", " if stride != 1 or self.inplanes != planes * block.expansion:\n", " downsample = nn.Sequential(\n", " conv1x1(self.inplanes, planes * block.expansion, stride),\n", " norm_layer(planes * block.expansion),\n", " )\n", "\n", " layers = []\n", " layers.append(block(self.inplanes, planes, stride, downsample, self.groups,\n", " self.base_width, previous_dilation, norm_layer))\n", " self.inplanes = planes * block.expansion\n", " for _ in range(1, blocks):\n", " layers.append(block(self.inplanes, planes, groups=self.groups,\n", " base_width=self.base_width, dilation=self.dilation,\n", " norm_layer=norm_layer))\n", "\n", " return nn.Sequential(*layers)\n", "\n", " def forward(self, x):\n", " x = self.conv1(x)\n", " x = self.bn1(x)\n", " x = self.relu(x)\n", " x = self.maxpool(x)\n", "\n", " x = self.layer1(x)\n", " x = self.layer2(x)\n", " x = self.layer3(x)\n", " x = self.layer4(x)\n", "\n", " x = self.avgpool(x)\n", " x = torch.flatten(x, 1)\n", " x = self.fc(x)\n", "\n", " return x\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torch.utils.model_zoo import load_url as load_state_dict_from_url\n", "\n", "model = ResNet(Bottleneck, [3, 4, 6, 3])\n", "state_dict = load_state_dict_from_url('https://download.pytorch.org/models/resnet50-19c8e357.pth', progress=True)\n", "model.load_state_dict(state_dict)\n", "# you can do some training here, before calling model.eval()\n", "model.eval()\n", "print('ResNet50 model is turned into inference mode')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 2:** Download a cat image and preprocess it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "from torchvision import transforms, datasets\n", "from tensorflow.keras.applications import resnet50\n", "import urllib.request\n", "\n", "imagedir = './images'\n", "os.makedirs(imagedir, exist_ok=True)\n", "urllib.request.urlretrieve(\n", " 'https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg',\n", " os.path.join(imagedir, 'kitten_small.jpg'),\n", ")\n", "normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225])\n", "eval_dataset = datasets.ImageFolder(\n", " '.',\n", " transforms.Compose([\n", " transforms.Resize([224, 224]),\n", " transforms.ToTensor(),\n", " normalize,\n", " ])\n", ")\n", "image, label = eval_dataset[0]\n", "image = torch.tensor(image.numpy()[np.newaxis, ...])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 3:** Run inference without neuron for comparison" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('model inference result:')\n", "print(resnet50.decode_predictions(model(image).detach().numpy(), top=5)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "STEP 4: Run the same inference using torch.jit.trace - then we can save and load the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "jit_trace = torch.jit.trace(model, example_inputs=image)\n", "print('jit.trace inference result:')\n", "print(resnet50.decode_predictions(jit_trace(image).detach().numpy(), top=5)[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "jit_trace_filename = 'resnet50_jit_trace.pt'\n", "jit_trace.save(jit_trace_filename)\n", "jit_trace_loaded = torch.jit.load(jit_trace_filename)\n", "print('loaded jit.trace inferenced result:')\n", "print(resnet50.decode_predictions(jit_trace_loaded(image).detach().numpy(), top=5)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 4:** Manually partition the ResNet50 model and execute it\n", "\n", "To generate a Neuron-optimized TorchScript with only layers 1~4 placed on Neuron runtime, we first define a new module class `ResNetNeuron` inheriting from `ResNet`. We add 'torch.neuron.trace' calls in the forward function of this module in order to turn layer submodules into Neuron-optimized ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch.neuron\n", "\n", "class ResNetNeuron(ResNet):\n", "\n", " def trace(self, x):\n", " x = self.conv1(x)\n", " x = self.bn1(x)\n", " x = self.relu(x)\n", " x = self.maxpool(x)\n", "\n", " self.layer1 = torch.neuron.trace(self.layer1, x, fallback=False)\n", " x = self.layer1(x)\n", " \n", " self.layer2 = torch.neuron.trace(self.layer2, x, fallback=False)\n", " x = self.layer2(x)\n", " \n", " self.layer3 = torch.neuron.trace(self.layer3, x, fallback=False)\n", " x = self.layer3(x)\n", " \n", " self.layer4 = torch.neuron.trace(self.layer4, x, fallback=False)\n", " \n", " def forward(self, x):\n", " x = self.conv1(x)\n", " x = self.bn1(x)\n", " x = self.relu(x)\n", " x = self.maxpool(x)\n", " \n", " # After running ResNetNeuron::trace, these layers will be placed on Neuron\n", " x = self.layer1(x)\n", " x = self.layer2(x)\n", " x = self.layer3(x)\n", " x = self.layer4(x)\n", " \n", " x = self.avgpool(x)\n", " x = torch.flatten(x, 1)\n", " x = self.fc(x)\n", "\n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now construct the class and runn an inference to trigger the `neuron-cc` compiler. Watch for the [ \\* ] icon to the left of this cell to disappear and show a number - this will take a minute or two to run" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "model_neuron = ResNetNeuron(Bottleneck, [3, 4, 6, 3])\n", "model_neuron.load_state_dict(state_dict)\n", "model_neuron.eval()\n", "model_neuron.trace(image) # this line triggers neuron-cc compiler\n", "result = model_neuron(image)\n", "print('Neuron optimized model inference result:')\n", "print(resnet50.decode_predictions(result.detach().numpy(), top=5)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 5:** Save the model as TorchScript ready to deploy\n", "\n", "To deploy the Neuron-optimized as TorchScript, we use `torch.jit.trace` again to generate TorchScript for the entire mode, including the Neuron-optimized `ScriptModule`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neuron_trace = torch.jit.trace(model_neuron, example_inputs=image)\n", "print('neuron.trace inference result:')\n", "print(resnet50.decode_predictions(neuron_trace(image).detach().numpy(), top=5)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This Neuron-optimized `ScriptModule` can be saved/loaded easily and be deployed on inf1 instances." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neuron_trace_filename = 'resnet50_neuron_trace.pt'\n", "neuron_trace.save(neuron_trace_filename)\n", "neuron_trace_loaded = torch.jit.load(neuron_trace_filename)\n", "print('loaded neuron.trace inference result:')\n", "print(resnet50.decode_predictions(neuron_trace_loaded(image).detach().numpy(), top=5)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**STEP 6:** Understanding the neuron graph\n", "\n", "We can inspect the graph property of the Neuron-optimized `ScriptModule` to get an idea of how Neuron-optimization is performed. Each `torch.neuron.trace` call fuses a submodule (layer) into a `neuron::forward`/`NeuronModule` operator." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "neuron_trace.graph" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: src/examples/pytorch/torch-neuronx/bert-base-cased-finetuned-mrpc-inference-on-trn1-tutorial.ipynb ================================================ { "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "e11b2ce1", "metadata": {}, "source": [ "# Compiling and Deploying HuggingFace Pretrained BERT on Trn1 or Inf2" ] }, { "attachments": {}, "cell_type": "markdown", "id": "59a44364", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this tutorial we will compile and deploy a HuggingFace 🤗 Transformers BERT model for accelerated inference on Neuron. In this tutorial, we will be deploying directly on Trn1/Inf2 instances. If you are looking to deploy this model through SageMaker on Inf2 instance, please visit the [Sagemaker samples repository](https://github.com/aws-neuron/aws-neuron-sagemaker-samples/tree/master/inference/inf2-bert-on-sagemaker). \n", "\n", "This tutorial will use the [bert-base-cased-finetuned-mrpc](https://huggingface.co/bert-base-cased-finetuned-mrpc) model. This model has 12 layers, 768 hidden dimensions, 12 attention heads, and 110M total parameters. The final layer is a binary classification head that has been trained on the Microsoft Research Paraphrase Corpus (`mrpc`). The input to the model is two sentences and the output of the model is whether or not those sentences are a paraphrase of each other. \n", "\n", "This tutorial has the following main sections:\n", "\n", "1. Install dependencies\n", "1. Compile the BERT model\n", "1. Run inference on Neuron and compare results to CPU\n", "1. Benchmark the model using multicore inference\n", "1. Finding the optimal batch size\n", "\n", "This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9ceecb92", "metadata": {}, "source": [ "## Install dependencies\n", "\n", "The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\n", "can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\n", "\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuronx`\n", "- `neuronx-cc`\n", "- `transformers`\n", "\n", "Most of these packages will be installed when configuring your environment using the Trn1/Inf2 setup guide. The additional dependencies must be installed here:" ] }, { "cell_type": "code", "execution_count": null, "id": "66392b0b", "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\n", "!pip install --upgrade transformers" ] }, { "cell_type": "markdown", "id": "82533d8e", "metadata": {}, "source": [ "## Compile the model into an AWS Neuron optimized TorchScript\n", "\n", "In the following section, we load the BERT model and tokenizer, get a sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()`, and save the optimized model as `TorchScript`.\n", "\n", "`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we unpack the tokenizer output using the `encode` function. \n", "\n", "The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur.\n", "\n", "For language models, the shape of the tokenizer tensors can vary based on the length of input sentence. We can satisfy the Neuron restriction of using a fixed shape input by padding all varying input tensors to a specified length. In a deployment scenario, the padding size should be chosen based on the maximum token length that is expected to occur for the application.\n", "\n", "In the following section we will assume that we will receive a maximum of 128 tokens at inference time. We will pad our example inputs by using `padding='max_length'` and to avoid potential errors caused by creating a tensor that is larger than `max_length=128`, we will always tokenize using `truncation=True`." ] }, { "cell_type": "code", "execution_count": null, "id": "0c9aac5e", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuronx\n", "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n", "import transformers\n", "\n", "\n", "def encode(tokenizer, *inputs, max_length=128, batch_size=1):\n", " tokens = tokenizer.encode_plus(\n", " *inputs,\n", " max_length=max_length,\n", " padding='max_length',\n", " truncation=True,\n", " return_tensors=\"pt\"\n", " )\n", " return (\n", " torch.repeat_interleave(tokens['input_ids'], batch_size, 0),\n", " torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),\n", " torch.repeat_interleave(tokens['token_type_ids'], batch_size, 0),\n", " )\n", "\n", "\n", "# Create the tokenizer and model\n", "name = \"bert-base-cased-finetuned-mrpc\"\n", "tokenizer = AutoTokenizer.from_pretrained(name)\n", "model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n", "\n", "# Set up some example inputs\n", "sequence_0 = \"The company HuggingFace is based in New York City\"\n", "sequence_1 = \"Apples are especially bad for your health\"\n", "sequence_2 = \"HuggingFace's headquarters are situated in Manhattan\"\n", "\n", "paraphrase = encode(tokenizer, sequence_0, sequence_2)\n", "not_paraphrase = encode(tokenizer, sequence_0, sequence_1)\n", "\n", "# Run the original PyTorch BERT model on CPU\n", "cpu_paraphrase_logits = model(*paraphrase)[0]\n", "cpu_not_paraphrase_logits = model(*not_paraphrase)[0]\n", "\n", "# Compile the model for Neuron\n", "model_neuron = torch_neuronx.trace(model, paraphrase)\n", "\n", "# Save the TorchScript for inference deployment\n", "filename = 'model.pt'\n", "torch.jit.save(model_neuron, filename)" ] }, { "cell_type": "markdown", "id": "53e9605d", "metadata": {}, "source": [ "## Run inference and compare results\n", "\n", "In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs.\n", "\n", "NOTE: Although this tutorial section uses one NeuronCore (and the next section uses two NeuronCores), by default each Jupyter notebook Python process will attempt to take ownership of all NeuronCores visible on the instance. For multi-process applications where each process should only use a subset of the NeuronCores on the instance you can use NEURON_RT_NUM_CORES=N or NEURON_RT_VISIBLE_CORES=< list of NeuronCore IDs > when starting the Jupyter notebook as described in [NeuronCore Allocation and Model Placement for Inference](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "a8d509aa", "metadata": {}, "outputs": [], "source": [ "# Load the TorchScript compiled model\n", "model_neuron = torch.jit.load(filename)\n", "\n", "# Verify the TorchScript works on both example inputs\n", "neuron_paraphrase_logits = model_neuron(*paraphrase)[0]\n", "neuron_not_paraphrase_logits = model_neuron(*not_paraphrase)[0]\n", "\n", "# Compare the results\n", "print('CPU paraphrase logits: ', cpu_paraphrase_logits.detach().numpy())\n", "print('Neuron paraphrase logits: ', neuron_paraphrase_logits.detach().numpy())\n", "print('CPU not-paraphrase logits: ', cpu_not_paraphrase_logits.detach().numpy())\n", "print('Neuron not-paraphrase logits: ', neuron_not_paraphrase_logits.detach().numpy())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a4553cc9", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "In this section we benchmark the performance of the BERT model on Neuron. By default, models compiled with `torch_neuronx` will always execute on a *single* NeuronCore. When loading *multiple* models, the default behavior of the Neuron runtime is to evenly distribute models across all available NeuronCores. The runtime places models on the NeuronCore that has the fewest models loaded to it first. In the following section, we will `torch.jit.load` multiple instances of the model which should each be loaded onto their own NeuronCore. It is not useful to load more copies of a model than the number of NeuronCores on the instance since an individual NeuronCore can only execute one model at a time.\n", "\n", "To ensure that we are maximizing hardware utilization, we must run inferences using multiple threads in parallel. It is nearly always recommended to use some form of threading/multiprocessing and some form of model replication since even the smallest Neuron EC2 instance has 2 NeuronCores available. Applications with no form of threading are only capable of `1 / num_neuron_cores` hardware utilization which becomes especially problematic on large instances.\n", "\n", "One way to view the hardware utilization is by executing the `neuron-top` application in the terminal while the benchmark is executing. If the monitor shows >90% utilization on all NeuronCores, this is a good indication that the hardware is being utilized effectively.\n", "\n", "In this example we load two models, which utilizes all NeuronCores (2) on a `trn1.2xlarge` or `inf2.xlarge` instance. Additional models can be loaded and run in parallel on larger Trn1 or Inf2 instance sizes to increase throughput.\n", "\n", "We define a benchmarking function that loads two optimized BERT models onto two separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput." ] }, { "cell_type": "code", "execution_count": null, "id": "c9e14b0d", "metadata": {}, "outputs": [], "source": [ "import time\n", "import concurrent.futures\n", "import numpy as np\n", "\n", "\n", "def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=1000):\n", " \"\"\"\n", " Record performance statistics for a serialized model and its input example.\n", "\n", " Arguments:\n", " filename: The serialized torchscript model to load for benchmarking.\n", " example: An example model input.\n", " n_models: The number of models to load.\n", " n_threads: The number of simultaneous threads to execute inferences on.\n", " batches_per_thread: The number of example batches to run per thread.\n", "\n", " Returns:\n", " A dictionary of performance statistics.\n", " \"\"\"\n", "\n", " # Load models\n", " models = [torch.jit.load(filename) for _ in range(n_models)]\n", "\n", " # Warmup\n", " for _ in range(8):\n", " for model in models:\n", " model(*example)\n", "\n", " latencies = []\n", "\n", " # Thread task\n", " def task(model):\n", " for _ in range(batches_per_thread):\n", " start = time.time()\n", " model(*example)\n", " finish = time.time()\n", " latencies.append((finish - start) * 1000)\n", "\n", " # Submit tasks\n", " begin = time.time()\n", " with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:\n", " for i in range(n_threads):\n", " pool.submit(task, models[i % len(models)])\n", " end = time.time()\n", "\n", " # Compute metrics\n", " boundaries = [50, 95, 99]\n", " percentiles = {}\n", "\n", " for boundary in boundaries:\n", " name = f'latency_p{boundary}'\n", " percentiles[name] = np.percentile(latencies, boundary)\n", " duration = end - begin\n", " batch_size = 0\n", " for tensor in example:\n", " if batch_size == 0:\n", " batch_size = tensor.shape[0]\n", " inferences = len(latencies) * batch_size\n", " throughput = inferences / duration\n", "\n", " # Metrics\n", " metrics = {\n", " 'filename': str(filename),\n", " 'batch_size': batch_size,\n", " 'batches': len(latencies),\n", " 'inferences': inferences,\n", " 'threads': n_threads,\n", " 'models': n_models,\n", " 'duration': duration,\n", " 'throughput': throughput,\n", " **percentiles,\n", " }\n", "\n", " display(metrics)\n", "\n", "\n", "def display(metrics):\n", " \"\"\"\n", " Display the metrics produced by `benchmark` function.\n", "\n", " Args:\n", " metrics: A dictionary of performance statistics.\n", " \"\"\"\n", " pad = max(map(len, metrics)) + 1\n", " for key, value in metrics.items():\n", "\n", " parts = key.split('_')\n", " parts = list(map(str.title, parts))\n", " title = ' '.join(parts) + \":\"\n", "\n", " if isinstance(value, float):\n", " value = f'{value:0.3f}'\n", "\n", " print(f'{title :<{pad}} {value}')\n", "\n", "\n", "# Benchmark BERT on Neuron\n", "benchmark(filename, paraphrase)" ] }, { "cell_type": "markdown", "id": "fc374b12", "metadata": {}, "source": [ "## Finding the optimal batch size" ] }, { "cell_type": "markdown", "id": "113acb55", "metadata": {}, "source": [ "Batch size has a direct impact on model performance. The NeuronCore architecture is optimized to maximize throughput with relatively small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\n", "\n", "As a general best practice, we recommend optimizing your model’s throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Neuron. To minimize latency, using `batch size = 1` will nearly always be optimal. This batch size configuration is typically used for on-demand inference applications. To maximize throughput, *usually* `1 < batch_size < 10` is optimal. A configuration which uses a larger batch size is generally ideal for batched on-demand inference or offline batch processing.\n", "\n", "In the following section, we compile BERT for multiple batch size inputs. We then run inference on each batch size and benchmark the performance. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient." ] }, { "cell_type": "code", "execution_count": null, "id": "be26aafc", "metadata": {}, "outputs": [], "source": [ "# Compile BERT for different batch sizes\n", "for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n", " tokenizer = AutoTokenizer.from_pretrained(name)\n", " model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)\n", " example = encode(tokenizer, sequence_0, sequence_2, batch_size=batch_size)\n", " model_neuron = torch_neuronx.trace(model, example)\n", " filename = f'model_batch_size_{batch_size}.pt'\n", " torch.jit.save(model_neuron, filename)" ] }, { "cell_type": "code", "execution_count": null, "id": "8f0f6ed2", "metadata": {}, "outputs": [], "source": [ "# Benchmark BERT for different batch sizes\n", "for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n", " print('-'*50)\n", " example = encode(tokenizer, sequence_0, sequence_2, batch_size=batch_size)\n", " filename = f'model_batch_size_{batch_size}.pt'\n", " benchmark(filename, example)\n", " print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python (Neuron PyTorch)", "language": "python", "name": "pytorch_venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.16" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/pytorch/torch-neuronx/resnet50-inference-on-trn1-tutorial.ipynb ================================================ { "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "6a30ffd9", "metadata": {}, "source": [ "# Compiling and Deploying ResNet50 on Trn1 or Inf2" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ea682fbe", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this tutorial we will compile and deploy a TorchVision ResNet50 model for accelerated inference on Neuron. To get started with\n", "Jupyter Notebook on Neuron Instance you launched, please use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\n", "\n", "This tutorial will use the [resnet50](https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html) model, which is primarily used for arbitrary image classification tasks.\n", "\n", "This tutorial has the following main sections:\n", "\n", "1. Install dependencies\n", "1. Compile the ResNet model\n", "1. Run inference on Neuron and compare results to CPU\n", "1. Benchmark the model using multicore inference\n", "1. Finding the optimal batch size\n", "\n", "This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5f60760a", "metadata": {}, "source": [ "## Install Dependencies\n", "The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\n", "can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\n", "\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuronx`\n", "- `neuronx-cc`\n", "- `torchvision`\n", "- `Pillow`\n", "\n", "Most of these packages will be installed when configuring your environment using the Trn1 setup guide. The additional dependencies must be installed here:" ] }, { "cell_type": "code", "execution_count": null, "id": "c44c5df5", "metadata": {}, "outputs": [], "source": [ "%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\n", "!pip install Pillow" ] }, { "cell_type": "markdown", "id": "de2efba5", "metadata": {}, "source": [ "## Compile the model into an AWS Neuron optimized TorchScript\n", "\n", "In the following section, we load the model, get a sample input, run inference on CPU, compile the model for Neuron using `torch_neuronx.trace()`, and save the optimized model as `TorchScript`.\n", "\n", "`torch_neuronx.trace()` expects a tensor or tuple of tensor inputs to use for tracing, so we convert the input image into a tensor using the `get_image` function.\n", "\n", "The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur. \n", "\n", "In the following section, we assume that we will receive an image shape of `[1, 3, 224, 224]` at inference time." ] }, { "cell_type": "code", "execution_count": null, "id": "1650de1f", "metadata": {}, "outputs": [], "source": [ "import os\n", "import urllib\n", "from PIL import Image\n", "\n", "import torch\n", "import torch_neuronx\n", "from torchvision import models\n", "from torchvision.transforms import functional\n", "\n", "\n", "def get_image(batch_size=1, image_shape=(224, 224)):\n", " # Get an example input\n", " filename = \"000000039769.jpg\"\n", " if not os.path.exists(filename):\n", " url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", " urllib.request.urlretrieve(url, filename)\n", " image = Image.open(filename).convert('RGB')\n", " image = functional.resize(image, (image_shape))\n", " image = functional.to_tensor(image)\n", " image = torch.unsqueeze(image, 0)\n", " image = torch.repeat_interleave(image, batch_size, 0)\n", " return (image, )\n", "\n", "\n", "# Create the model\n", "model = models.resnet50(pretrained=True)\n", "model.eval()\n", "\n", "# Get an example input\n", "image = get_image()\n", "\n", "# Run inference on CPU\n", "output_cpu = model(*image)\n", "\n", "# Compile the model\n", "model_neuron = torch_neuronx.trace(model, image)\n", "\n", "# Save the TorchScript for inference deployment\n", "filename = 'model.pt'\n", "torch.jit.save(model_neuron, filename)" ] }, { "cell_type": "markdown", "id": "25f453f8", "metadata": {}, "source": [ "## Run inference and compare results\n", "\n", "In this section we load the compiled model, run inference on Neuron, and compare the CPU and Neuron outputs using the ImageNet classes." ] }, { "cell_type": "code", "execution_count": null, "id": "b4a203aa", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "# Load the TorchScript compiled model\n", "model_neuron = torch.jit.load(filename)\n", "\n", "# Run inference using the Neuron model\n", "output_neuron = model_neuron(*image)\n", "\n", "# Compare the results\n", "print(f\"CPU tensor: {output_cpu[0][0:10]}\")\n", "print(f\"Neuron tensor: {output_neuron[0][0:10]}\")\n", "\n", "# Download and read the ImageNet classes\n", "urllib.request.urlretrieve(\"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\",\"imagenet_class_index.json\")\n", "with open(\"imagenet_class_index.json\", \"r\") as file:\n", " class_id = json.load(file)\n", " id2label = [class_id[str(i)][1] for i in range(len(class_id))]\n", "\n", "# Lookup and print the top-5 labels\n", "top5_cpu = output_cpu[0].sort()[1][-5:]\n", "top5_neuron = output_neuron[0].sort()[1][-5:]\n", "top5_labels_cpu = [id2label[idx] for idx in top5_cpu]\n", "top5_labels_neuron = [id2label[idx] for idx in top5_neuron]\n", "print(f\"CPU top-5 labels: {top5_labels_cpu}\")\n", "print(f\"Neuron top-5 labels: {top5_labels_neuron}\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c96389ae", "metadata": {}, "source": [ "## Benchmarking\n", "\n", "In this section we benchmark the performance of the ResNet model on Neuron. By default, models compiled with `torch_neuronx` will always execute on a *single* NeuronCore. When loading *multiple* models, the default behavior of the Neuron runtime is to evenly distribute models across all available NeuronCores. The runtime places models on the NeuronCore that has the fewest models loaded to it first. In the following section, we will `torch.jit.load` multiple instances of the model which should each be loaded onto their own NeuronCore. It is not useful to load more copies of a model than the number of NeuronCores on the instance since an individual NeuronCore can only execute one model at a time.\n", "\n", "To ensure that we are maximizing hardware utilization, we must run inferences using multiple threads in parallel. It is nearly always recommended to use some form of threading/multiprocessing and some form of model replication since even the smallest Neuron EC2 instance has 2 NeuronCores available. Applications with no form of threading are only capable of `1 / num_neuron_cores` hardware utilization which becomes especially problematic on large instances.\n", "\n", "One way to view the hardware utilization is by executing the `neuron-top` application in the terminal while the benchmark is executing. If the monitor shows >90% utilization on all NeuronCores, this is a good indication that the hardware is being utilized effectively.\n", "\n", "In this example we load two models, which utilizes all NeuronCores (2) on a `trn1.2xlarge` or `inf2.xlarge` instance. Additional models can be loaded and run in parallel on larger Trn1 or Inf2 instance sizes to increase throughput.\n", "\n", "We define a benchmarking function that loads two optimized ResNet models onto two separate NeuronCores, runs multithreaded inference, and calculates the corresponding latency and throughput." ] }, { "cell_type": "code", "execution_count": null, "id": "9657ae4f", "metadata": {}, "outputs": [], "source": [ "import time\n", "import concurrent.futures\n", "import numpy as np\n", "\n", "\n", "def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=1000):\n", " \"\"\"\n", " Record performance statistics for a serialized model and its input example.\n", "\n", " Arguments:\n", " filename: The serialized torchscript model to load for benchmarking.\n", " example: An example model input.\n", " n_models: The number of models to load.\n", " n_threads: The number of simultaneous threads to execute inferences on.\n", " batches_per_thread: The number of example batches to run per thread.\n", "\n", " Returns:\n", " A dictionary of performance statistics.\n", " \"\"\"\n", "\n", " # Load models\n", " models = [torch.jit.load(filename) for _ in range(n_models)]\n", "\n", " # Warmup\n", " for _ in range(8):\n", " for model in models:\n", " model(*example)\n", "\n", " latencies = []\n", "\n", " # Thread task\n", " def task(model):\n", " for _ in range(batches_per_thread):\n", " start = time.time()\n", " model(*example)\n", " finish = time.time()\n", " latencies.append((finish - start) * 1000)\n", "\n", " # Submit tasks\n", " begin = time.time()\n", " with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:\n", " for i in range(n_threads):\n", " pool.submit(task, models[i % len(models)])\n", " end = time.time()\n", "\n", " # Compute metrics\n", " boundaries = [50, 95, 99]\n", " percentiles = {}\n", "\n", " for boundary in boundaries:\n", " name = f'latency_p{boundary}'\n", " percentiles[name] = np.percentile(latencies, boundary)\n", " duration = end - begin\n", " batch_size = 0\n", " for tensor in example:\n", " if batch_size == 0:\n", " batch_size = tensor.shape[0]\n", " inferences = len(latencies) * batch_size\n", " throughput = inferences / duration\n", "\n", " # Metrics\n", " metrics = {\n", " 'filename': str(filename),\n", " 'batch_size': batch_size,\n", " 'batches': len(latencies),\n", " 'inferences': inferences,\n", " 'threads': n_threads,\n", " 'models': n_models,\n", " 'duration': duration,\n", " 'throughput': throughput,\n", " **percentiles,\n", " }\n", "\n", " display(metrics)\n", "\n", "\n", "def display(metrics):\n", " \"\"\"\n", " Display the metrics produced by `benchmark` function.\n", "\n", " Args:\n", " metrics: A dictionary of performance statistics.\n", " \"\"\"\n", " pad = max(map(len, metrics)) + 1\n", " for key, value in metrics.items():\n", "\n", " parts = key.split('_')\n", " parts = list(map(str.title, parts))\n", " title = ' '.join(parts) + \":\"\n", "\n", " if isinstance(value, float):\n", " value = f'{value:0.3f}'\n", "\n", " print(f'{title :<{pad}} {value}')\n", "\n", "\n", "# Benchmark ResNet on Neuron\n", "benchmark(filename, image)" ] }, { "cell_type": "markdown", "id": "795d2fca", "metadata": {}, "source": [ "## Finding the optimal batch size\n", "\n", "Batch size has a direct impact on model performance. The NeuronCore architecture is optimized to maximize throughput with relatively small batch sizes. This means that a Neuron compiled model can outperform a GPU model, even if running single digit batch sizes.\n", "\n", "As a general best practice, we recommend optimizing your model’s throughput by compiling the model with a small batch size and gradually increasing it to find the peak throughput on Neuron. To minimize latency, using `batch size = 1` will nearly always be optimal. This batch size configuration is typically used for on-demand inference applications. To maximize throughput, *usually* `1 < batch_size < 10` is optimal. A configuration which uses a larger batch size is generally ideal for batched on-demand inference or offline batch processing.\n", "\n", "In the following section, we compile ResNet for multiple batch size inputs. We then run inference on each batch size and benchmark the performance. Notice that latency increases consistently as the batch size increases. Throughput increases as well, up until a certain point where the input size becomes too large to be efficient." ] }, { "cell_type": "code", "execution_count": null, "id": "fdef1805", "metadata": {}, "outputs": [], "source": [ "# Compile ResNet for different batch sizes\n", "for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n", " model = models.resnet50(pretrained=True)\n", " model.eval()\n", " example = get_image(batch_size=batch_size)\n", " model_neuron = torch_neuronx.trace(model, example)\n", " filename = f'model_batch_size_{batch_size}.pt'\n", " torch.jit.save(model_neuron, filename)" ] }, { "cell_type": "code", "execution_count": null, "id": "ec244d4e", "metadata": {}, "outputs": [], "source": [ "# Benchmark ResNet for different batch sizes\n", "for batch_size in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:\n", " print('-'*50)\n", " example = get_image(batch_size=batch_size)\n", " filename = f'model_batch_size_{batch_size}.pt'\n", " benchmark(filename, example)\n", " print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python (Neuron PyTorch)", "language": "python", "name": "pytorch_venv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.16" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/pytorch/torch-neuronx/t5-inference-tutorial.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# T5 model inference on Trn1 or Inf2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "In this tutorial we will compile and deploy a pretrained T5 model for accelerated inference on Neuron. \n", "\n", "This tutorial will use the [t5-large](https://huggingface.co/t5-large) model. The T5 model can be used for machine translation, document summarization, question answering, and classification tasks. \n", "\n", "This tutorial has the following main sections:\n", "\n", "1. Install dependencies\n", "1. Compile the T5 model\n", "1. Run inference with greedy decoding on Neuron\n", "1. Run infernece with beam search on Neuron\n", "\n", "This Jupyter notebook should be run on a Trn1 instance (`trn1.2xlarge` or larger.) or Inf2 instance (`inf2.xlarge` or larger.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install dependencies\n", "\n", "The code in this tutorial is written for Jupyter Notebooks. To use Jupyter Notebook on the Neuron instance, you\n", "can use this [guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html).\n", "\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuronx`\n", "- `neuronx-cc`\n", "- `transformers`\n", "- `optimum-neuron`\n", "\n", "Most of these packages will be installed when configuring your environment using the Trn1/Inf2 setup guide. The additional dependencies must be installed here:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%env HF_HUB_DISABLE_PROGRESS_BARS=1 # Avoids xet progress bar model download error\n", "!pip install --upgrade transformers==4.31.0 optimum-neuron==0.0.8 sentencepiece" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "🤗 Optimum Neuron is the interface between the 🤗 Transformers library and AWS Accelerators including AWS Trainium and AWS Inferentia. It provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks. In this tutorial we use 🤗 HuggingFace Optimum Neuron's generate() method instead of 🤗 [transformers's generate()](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) to perform greedy decoding. Optimum Neuron takes care of padding the inputs which is necessary to infer on Neuron.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile the model into an AWS Neuron optimized TorchScript\n", "\n", "In the following section, we load the T5 model, compile the model's encoder and decoder for Neuron using `torch_neuronx.trace()`, and save the optimized encoder and decoder as `TorchScript`. \n", "\n", "Before we trace the model, we need to make a couple of changes. \n", "\n", "1. We need to write encoder and decoder wrappers - `torch_neuronx` can only trace functions with positional arguments. But the T5 encoder and decoder both use keyword arguments. So, in order to trace them, we have to write wrappers that convert keyword arguments to positional arguments \n", "2. We modify the t5 code to maximize the computation on the neuron device - Having sections of code running on cpu will reduce the performance. Moreover, we do not want to move data berween the neuron device and cpu during inference. The code we trace with `torch_neuronx` is the code that runs on the neuron device, so we refactor the t5 code to run computationally heavy operations within the wrapper. \n", "\n", "Let us start with the EncoderWrapper. \n", "\n", "In the huggingface t5 implementation, the encoder block takes in the input ids and returns the encoder hidden states. This hidden states are then used to initialize the KV cache in the decoder blocks during the first decoder invocation. We could trace both the encoder and the cache initialization step separately. But there is a better way, we could just compute the initial KV cache state within the encoder wrapper. This way, we remove the overhead of moving the hidden states from neuron device to cpu and back. This also allows neuron's compiler to optimize execution across both the encoder and cache initialization. \n", "\n", "*Why don't we just initalize the cache on the first decoder run?* \n", "\n", "This is harder to do on Neuron. Similar to `torch.jit.trace()`, `torch_neuronx.trace()` produces a function that has a fixed control flow, i.e. there are no conditional executions. So we cannot choose to conditionally initialize the cache in the first decoder iteration. Instead, we can compute the initial cache state outside the generation flow and pass the cache to it. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "\n", "from transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention\n", "\n", "class EncoderWrapper(torch.nn.Module):\n", " '''\n", " We will trace an instance of the EncoderWrapper. \n", " This wrapper just converts positional args to kwargs. \n", " '''\n", "\n", " def __init__(self, \n", " encoder,\n", " decoder, \n", " model_config, \n", " batch_size, \n", " max_length, \n", " device, \n", " num_beams,\n", " tp_degree=None):\n", " \n", " super().__init__()\n", " self.encoder = encoder\n", " self.decoder = decoder\n", " self.batch_size = batch_size\n", " self.max_length = max_length\n", " self.model_config = model_config\n", " self.device = device\n", " self.num_beams = num_beams\n", " self.num_attention_heads_per_partition = model_config.num_heads\n", " self.tp_degree = tp_degree\n", "\n", " def forward(self, input_ids, attention_mask):\n", " '''\n", " This is the core functionality we want to trace. \n", " '''\n", " encoder_output = self.encoder(input_ids=input_ids,\n", " attention_mask=attention_mask,\n", " output_attentions=False,\n", " output_hidden_states=False)\n", "\n", " last_hidden_state = encoder_output[\"last_hidden_state\"]\n", " encoder_hidden_states = torch.concat([tensor.unsqueeze(0).repeat(self.num_beams, 1, 1) for tensor in last_hidden_state])\n", "\n", " decoder_blocks = self.decoder.block\n", " present_key_value_states_sa = []\n", " present_key_value_states_ca = []\n", "\n", " for i, block in enumerate(decoder_blocks):\n", "\n", " # Cross attention has to be initialized with the encoder hidden state\n", " cross_attention: T5LayerCrossAttention = block.layer[1]\n", " attention = cross_attention.EncDecAttention\n", "\n", " def shape(states):\n", " \"\"\"projection\"\"\"\n", " return states.view(self.batch_size, -1, self.num_attention_heads_per_partition, attention.key_value_proj_dim).transpose(1, 2)\n", "\n", " key_states = shape(attention.k(encoder_hidden_states))\n", " value_states = shape(attention.v(encoder_hidden_states))\n", "\n", " # cross_attn_kv_state\n", " present_key_value_states_ca.append(key_states) \n", " present_key_value_states_ca.append(value_states) \n", " \n", " # Self attention kv states are initialized to zeros. This is done to keep the size of the kv cache tensor constant. \n", " # The kv cache will be an input to the decoder trace. Any traced function will have a fixed control flow. What this means \n", " # is that the trace performs the exact same computations on inputs of the same shape in each invocation. So the attention \n", " # kv cache is padded here to keep a fixed shape. \n", " present_key_value_states_sa.append(torch.zeros((self.batch_size, # key states\n", " self.model_config.num_heads, \n", " self.max_length-1, \n", " self.model_config.d_kv), dtype=torch.float32, device=self.device)) \n", " present_key_value_states_sa.append(torch.zeros((self.batch_size, # value states\n", " self.model_config.num_heads, \n", " self.max_length-1, \n", " self.model_config.d_kv), dtype=torch.float32, device=self.device))\n", "\n", " return present_key_value_states_sa + present_key_value_states_ca\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In the decoder wrapper, in addition to converting keyword arguments to positional arguments we add support for attention caching. Generating text from the encoder decoder models is an autoregressive process. For each invocation, we have to compute the key and value states of the attention heads repeatedly. To improve the performance, we cache the key and value states. This cache is what HuggingFace transformers code refers to as `past_key_values`.\n", "\n", "In HuggingFace transformers, the `past_key_values` are updated outside the decoder. This works for training and evaluation but for inference we want to perform them within a single trace. This way, we can optimize across both the decoder execution and cache update. So, we move the cache update within the decoder wrapper." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class DecoderWrapper(torch.nn.Module):\n", "\n", " def __init__(self, \n", " decoder: T5Stack, \n", " lm_head: torch.nn.Linear,\n", " model_config,\n", " num_beams: int, \n", " max_length: int,\n", " device: str,\n", " tp_degree=None):\n", " super().__init__() \n", " self.decoder = decoder\n", " self.lm_head = lm_head\n", " self.model_dim=model_config.d_model\n", " self.device = device\n", " self.num_beams = num_beams\n", " self.batch_size = 1\n", " self.config = model_config\n", " \n", " num_heads=model_config.num_heads\n", " num_decoder_layers=model_config.num_decoder_layers\n", "\n", " self.num_attention_heads_per_partition = num_heads\n", "\n", " # (num_beams, n_heads, seq_length, dim_per_head)\n", " if device == \"cpu\":\n", " self.past_key_values_sa = [torch.ones((num_beams,num_heads,max_length-1,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\n", " self.past_key_values_ca = [torch.ones((num_beams,num_heads,max_length,model_config.d_kv), dtype=torch.float32) for _ in range(num_decoder_layers * 2)]\n", " elif device == \"xla\":\n", " self.past_key_values_sa = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length-1,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\n", " self.past_key_values_ca = torch.nn.ParameterList([torch.nn.Parameter(torch.ones((num_beams,self.num_attention_heads_per_partition,max_length,model_config.d_kv), dtype=torch.float32), requires_grad=False) for _ in range(num_decoder_layers * 2)])\n", "\n", " def update_past(self, past_key_values):\n", " new_past_sa = []\n", " new_past_ca = []\n", " for past_layer in past_key_values:\n", " new_past_layer = list(past_layer)\n", " for i in range(len(new_past_layer[:2])):\n", " new_past_layer[i] = past_layer[i][:, :, 1:]\n", " new_past_sa += [new_past_layer[:2],]\n", " new_past_ca += [new_past_layer[2:],]\n", " return new_past_sa, new_past_ca\n", " \n", " def reorder_cache(self, past_key_values, beam_idx):\n", " for i in range(len(past_key_values)):\n", " gather_index = beam_idx.view([beam_idx.shape[0],1,1,1]).expand_as(past_key_values[i])\n", " past_key_values[i] = torch.gather(past_key_values[i], dim = 0, index=gather_index)\n", " return past_key_values\n", "\n", " def forward(self,\n", " input_ids,\n", " decoder_attention_mask,\n", " encoder_hidden_states,\n", " encoder_attention_mask,\n", " beam_idx,\n", " beam_scores,\n", " **kwargs):\n", "\n", " if self.num_beams > 1:\n", " # We reorder the cache based on the beams selected in each iteration. Required step for beam search.\n", " past_key_values_sa = self.reorder_cache(self.past_key_values_sa, beam_idx)\n", " past_key_values_ca = self.reorder_cache(self.past_key_values_ca, beam_idx)\n", " else:\n", " # We do not need to reorder for greedy sampling\n", " past_key_values_sa = self.past_key_values_sa\n", " past_key_values_ca = self.past_key_values_ca\n", "\n", " # The cache is stored in a flatten form. We order the cache per layer before passing it to the decoder. \n", " # Each layer has 4 tensors, so we group by 4. \n", " past_key_values = [[*past_key_values_sa[i*2:i*2+2], *past_key_values_ca[i*2:i*2+2]] for i in range(0, int(len(past_key_values_ca)/2))]\n", "\n", " decoder_output = self.decoder(\n", " input_ids=input_ids,\n", " attention_mask=decoder_attention_mask,\n", " past_key_values=past_key_values,\n", " encoder_hidden_states=encoder_hidden_states,\n", " encoder_attention_mask=encoder_attention_mask,\n", " use_cache=True,\n", " output_attentions=False,\n", " output_hidden_states=False)\n", "\n", " last_hidden_state = decoder_output['last_hidden_state']\n", " past_key_values = decoder_output['past_key_values']\n", "\n", " if self.config.tie_word_embeddings:\n", " # Rescale output before projecting on vocab\n", " # See https://github.com/tensorflow/mesh/blob/fa19d69eafc9a482aff0b59ddd96b025c0cb207d/mesh_tensorflow/transformer/transformer.py#L586\n", " last_hidden_state = last_hidden_state * (self.model_dim**-0.5)\n", " \n", " lm_logits = self.lm_head(last_hidden_state)\n", "\n", " past_key_values_sa, past_key_values_ca = self.update_past(past_key_values)\n", "\n", " # We flatten the cache to a single array. This is required for the input output aliasing to work\n", " past_key_values_sa = [vec for kv_per_layer in past_key_values_sa for vec in kv_per_layer]\n", " past_key_values_ca = [vec for kv_per_layer in past_key_values_ca for vec in kv_per_layer]\n", "\n", " if self.device == \"cpu\":\n", " self.past_key_values_sa = past_key_values_sa\n", " self.past_key_values_ca = past_key_values_ca\n", "\n", " # We calculate topk inside the wrapper\n", " next_token_logits = lm_logits[:, -1, :]\n", "\n", " if self.num_beams > 1:\n", " # This section of beam search is run outside the decoder in the huggingface t5 implementation. \n", " # To maximize the computation within the neuron device, we move this within the wrapper\n", " logit_max, _ = torch.max(next_token_logits, dim=-1, keepdim=True)\n", " logsumexp = torch.log(torch.exp(next_token_logits - logit_max).sum(dim=-1, keepdim=True))\n", " next_token_scores = next_token_logits - logit_max - logsumexp\n", " next_token_scores = next_token_scores + beam_scores[:, None].expand_as(next_token_scores)\n", "\n", " # reshape for beam search\n", " vocab_size = next_token_scores.shape[-1]\n", " next_token_scores = next_token_scores.view(self.batch_size, self.num_beams * vocab_size)\n", " next_token_scores = next_token_scores * 1\n", "\n", " # Sample 2 next tokens for each beam (so we have some spare tokens and match output of beam search)\n", " next_token_scores, next_tokens = torch.topk(\n", " next_token_scores, 2 * self.num_beams, dim=1, largest=True, sorted=True\n", " ) \n", "\n", " next_indices = torch.div(next_tokens, vocab_size, rounding_mode=\"floor\")\n", " next_tokens = next_tokens % vocab_size\n", "\n", " return [next_token_scores, next_tokens, next_indices] + past_key_values_sa + past_key_values_ca\n", " else:\n", " # Greedy \n", " next_tokens = torch.argmax(next_token_logits, dim=-1)\n", " return [next_tokens] + past_key_values_sa + past_key_values_ca\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's create a T5 model wrapper to make it compatible with our traced encoder and decoder. \n", "\n", "There are two reasons for having this wrapper, \n", "\n", "1. The encoder and decoder traces can only be invoked with positional arguments. But the HuggingFace transformers code is written with keyword arguments. So we override the functions that invoke encoder and decoder to call with positional arguments. \n", "1. The generate() function in the NeuronGenerationMixin performs cache update within the CPU. As we are handling the cache within the DecoderWrapper, we disable the cache update on CPU. \n", "1. The topK computation to determine the next tokens for beam search was moved into the decoder wrapper. So, we need to override the huggingface's beam search implementation to accept the next tokens and the beam scores from the decoder. \n", "\n", "Let's also override the `generate()` function so that it will intialize the cache using the cache initalizer before starting the greedy decoding." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_xla.core.xla_model as xm\n", "\n", "from transformers import T5Tokenizer, T5ForConditionalGeneration\n", "from transformers.modeling_outputs import BaseModelOutput, Seq2SeqLMOutput\n", "from transformers.models.t5.modeling_t5 import T5Stack, T5LayerCrossAttention\n", "from transformers.generation.utils import ModelOutput\n", "from typing import Any, Dict, List, Optional, Tuple, Union\n", "from transformers.generation.beam_search import BeamScorer, BeamSearchScorer\n", "\n", "from optimum.neuron.generation import NeuronGenerationMixin\n", "\n", "from transformers.generation.logits_process import (\n", " LogitsProcessorList,\n", ")\n", "from transformers.generation.stopping_criteria import (\n", " MaxLengthCriteria,\n", " MaxTimeCriteria,\n", " StoppingCriteriaList,\n", " validate_stopping_criteria,\n", ")\n", "\n", "from transformers.generation.utils import (\n", " BeamSearchOutput,\n", " GreedySearchOutput,\n", ")\n", "\n", "class T5Wrapper(T5ForConditionalGeneration, NeuronGenerationMixin):\n", "\n", " def _prepare_encoder_decoder_kwargs_for_generation(\n", " self, \n", " inputs_tensor: torch.Tensor, \n", " model_kwargs, \n", " model_input_name: Optional[str] = None\n", " ) -> Dict[str, Any]:\n", " encoder = self.get_encoder()\n", " model_kwargs[\"encoder_outputs\"]: ModelOutput = encoder(inputs_tensor, model_kwargs[\"attention_mask\"])\n", " return model_kwargs\n", "\n", " # Override to cut the input_ids to just last token\n", " def prepare_inputs_for_generation(\n", " self,\n", " input_ids,\n", " past_key_values=None,\n", " attention_mask=None,\n", " head_mask=None,\n", " decoder_head_mask=None,\n", " decoder_attention_mask=None,\n", " cross_attn_head_mask=None,\n", " use_cache=None,\n", " encoder_outputs=None,\n", " **kwargs,\n", " ):\n", " # cut decoder_input_ids as past is cached\n", " input_ids = input_ids[:, -1:]\n", "\n", " return {\n", " \"decoder_input_ids\": input_ids,\n", " \"past_key_values\": past_key_values,\n", " \"encoder_outputs\": encoder_outputs,\n", " \"attention_mask\": attention_mask,\n", " \"head_mask\": head_mask,\n", " \"decoder_head_mask\": decoder_head_mask,\n", " \"decoder_attention_mask\": decoder_attention_mask,\n", " \"cross_attn_head_mask\": cross_attn_head_mask,\n", " \"use_cache\": use_cache,\n", " }\n", " \n", " '''\n", " We update the cache in the decoder trace, so lets override the _update_model_kwargs_for_xla_generation in NeuronGenerationMixin\n", " '''\n", " def _update_model_kwargs_for_xla_generation(\n", " self,\n", " model_kwargs: Dict[str, Any],\n", " batch_size: int,\n", " is_encoder_decoder: bool = False,\n", " standardize_cache_format: bool = False,\n", " max_length: Optional[int] = None,\n", " seq_length: Optional[int] = None,\n", " use_cache: bool = True,\n", " ) -> Dict[str, Any]:\n", "\n", " def _update_attention(model_kwargs, is_encoder_decoder):\n", " \"\"\"Updates the appropriate attention mask -- encoder-decoder models use `decoder_attention_mask`\"\"\"\n", "\n", " attention_mask_name = \"decoder_attention_mask\" if is_encoder_decoder else \"attention_mask\"\n", " attention_mask = model_kwargs.pop(attention_mask_name)\n", " attention_mask_update_slice = torch.ones(\n", " (batch_size, 1), dtype=attention_mask.dtype, device=attention_mask.device\n", " )\n", " attention_mask = torch.cat([attention_mask[:, 1:], attention_mask_update_slice], dim=-1)\n", " mask = {attention_mask_name: attention_mask}\n", " return mask\n", "\n", " mask = _update_attention(model_kwargs, is_encoder_decoder)\n", " # sets the updated variables (mask and past_key_values)\n", " model_kwargs.update(mask)\n", "\n", " # Set a mock cache tensor\n", " model_kwargs[\"past_key_values\"] = torch.tensor([])\n", "\n", " return model_kwargs\n", " \n", " def _reorder_cache(self, past_key_values, beam_idx):\n", " '''\n", " This is needed for beam search and not greedy sampling\n", " We reorder the cache within the trace so we can skip it in modelling_t5.py. So we override the _reorder_cache\n", " '''\n", " self.beam_idx = beam_idx\n", " return past_key_values\n", "\n", " def generate(self,\n", " tokenizer: T5Tokenizer,\n", " prompt: str,\n", " max_length: int,\n", " num_beams: int,\n", " num_return_sequences: int,\n", " device: str):\n", "\n", " batch_encoding = tokenizer(prompt, max_length=max_length, truncation=True, padding='max_length',\n", " return_tensors=\"pt\")\n", "\n", " past_key_values = self.encoder(batch_encoding['input_ids'],batch_encoding['attention_mask'])\n", " \n", " decoder_attention_mask = torch.cat([torch.zeros((1, max_length-1), dtype=torch.int32),\n", " torch.ones((1, 1), dtype=torch.int32)], axis=1)\n", "\n", " # copy the new cache state to the decoder\n", " if device == \"xla\":\n", " for state, tensor in zip(self.decoder.parameters(), past_key_values):\n", " state.copy_(tensor)\n", " else:\n", " # First half of the cache is self attention and the rest is cross attention\n", " self.decoder.past_key_values_sa = past_key_values[:len(past_key_values)//2]\n", " self.decoder.past_key_values_ca = past_key_values[len(past_key_values)//2:]\n", " \n", " output = super().generate(**batch_encoding,\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences,\n", " do_sample=False,\n", " use_cache=True,\n", " decoder_attention_mask=decoder_attention_mask, \n", " encoder_outputs={\"last_hidden_state\": torch.ones((1,128,1))}) # Pass fake encoder_outputs so the transfomers code will not invoke the encoder\n", " return output\n", "\n", " def forward(\n", " self,\n", " attention_mask: Optional[torch.FloatTensor] = None,\n", " decoder_input_ids: Optional[torch.LongTensor] = None,\n", " decoder_attention_mask: Optional[torch.BoolTensor] = None,\n", " encoder_outputs: Optional[Tuple[Tuple[torch.Tensor]]] = None,\n", " beam_scores = None,\n", " **kwargs\n", " ) -> Union[Tuple[torch.FloatTensor], Seq2SeqLMOutput]:\n", "\n", " hidden_states = encoder_outputs[\"last_hidden_state\"]\n", "\n", " if not hasattr(self, 'beam_idx'):\n", " # Infering the number of beams from the attention mask\n", " num_beams = attention_mask.shape[0]\n", " self.beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\n", "\n", " decoder_outputs = self.decoder(\n", " decoder_input_ids,\n", " decoder_attention_mask,\n", " hidden_states,\n", " attention_mask,\n", " self.beam_idx,\n", " beam_scores\n", " )\n", "\n", " # lm_logits = decoder_outputs[0]\n", " next_token_scores = decoder_outputs[0]\n", " next_tokens = decoder_outputs[1]\n", " next_indices = decoder_outputs[2]\n", "\n", " return next_token_scores, next_tokens, next_indices\n", "\n", " def beam_search(\n", " self,\n", " input_ids: torch.LongTensor,\n", " beam_scorer: BeamScorer,\n", " logits_processor: Optional[LogitsProcessorList] = None,\n", " stopping_criteria: Optional[StoppingCriteriaList] = None,\n", " max_length: Optional[int] = None,\n", " pad_token_id: Optional[int] = None,\n", " eos_token_id: Optional[Union[int, List[int]]] = None,\n", " output_attentions: Optional[bool] = None,\n", " output_hidden_states: Optional[bool] = None,\n", " output_scores: Optional[bool] = None,\n", " return_dict_in_generate: Optional[bool] = None,\n", " synced_gpus: Optional[bool] = False,\n", " seq_length: Optional[int] = None,\n", " **model_kwargs,\n", " ) -> Union[BeamSearchOutput, torch.LongTensor]:\n", "\n", " logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\n", " stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\n", " pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\n", " eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\n", " if isinstance(eos_token_id, int):\n", " eos_token_id = [eos_token_id]\n", " output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\n", " output_attentions = (\n", " output_attentions if output_attentions is not None else self.generation_config.output_attentions\n", " )\n", " output_hidden_states = (\n", " output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\n", " )\n", "\n", " batch_size = len(beam_scorer._beam_hyps)\n", " num_beams = beam_scorer.num_beams\n", "\n", " batch_beam_size, cur_len = input_ids.shape\n", "\n", " # Overwrite cur_len\n", " cur_len = seq_length\n", "\n", " if num_beams * batch_size != batch_beam_size:\n", " raise ValueError(\n", " f\"Batch dimension of `input_ids` should be {num_beams * batch_size}, but is {batch_beam_size}.\"\n", " )\n", "\n", " # init attention / hidden states / scores tuples\n", " scores = () if (return_dict_in_generate and output_scores) else None\n", " beam_indices = (\n", " tuple(() for _ in range(batch_beam_size)) if (return_dict_in_generate and output_scores) else None\n", " )\n", "\n", " # initialise score of first beam with 0 and the rest with -1e9. This makes sure that only tokens\n", " # of the first beam are considered to avoid sampling the exact same tokens across all beams.\n", " # beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=input_ids.device)\n", " beam_scores_device = \"cpu\"\n", " beam_scores = torch.zeros((batch_size, num_beams), dtype=torch.float, device=beam_scores_device)\n", " beam_scores[:, 1:] = -1e9\n", " beam_scores = beam_scores.view((batch_size * num_beams,))\n", "\n", " while True:\n", " # prepare model inputs\n", " # From max_length-sized input_ids, select first\n", " # cur_len - 1 values.\n", " update_indices = torch.stack(\n", " [torch.arange(input_ids.size(0)), torch.tensor(cur_len - 1).repeat(input_ids.size(0))], dim=-1\n", " )\n", " input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\n", " model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\n", "\n", " next_token_scores, next_tokens, next_indices = self(\n", " **model_inputs,\n", " return_dict=True,\n", " output_attentions=output_attentions,\n", " output_hidden_states=output_hidden_states,\n", " beam_scores=beam_scores\n", " )\n", "\n", " # stateless\n", " beam_outputs = beam_scorer.process(\n", " input_ids.to(\"cpu\")[:, :cur_len],\n", " next_token_scores.to(\"cpu\"),\n", " next_tokens.to(\"cpu\"),\n", " next_indices.to(\"cpu\"),\n", " pad_token_id=pad_token_id,\n", " eos_token_id=eos_token_id,\n", " beam_indices=beam_indices,\n", " )\n", "\n", " beam_scores = beam_outputs[\"next_beam_scores\"]\n", " beam_next_tokens = beam_outputs[\"next_beam_tokens\"]\n", " beam_idx = beam_outputs[\"next_beam_indices\"]\n", "\n", " update_indices = torch.stack(\n", " [torch.arange(batch_beam_size), torch.tensor(cur_len - 1).repeat(batch_beam_size)], dim=-1\n", " )\n", " update_indices_2 = torch.stack(\n", " [torch.arange(batch_beam_size), torch.tensor(cur_len).repeat(batch_beam_size)], dim=-1\n", " )\n", " # First select beam_indices\n", " device = input_ids.device\n", " beam_idx_device = beam_idx.to(device=input_ids.device)\n", " input_ids[:, :] = input_ids[beam_idx_device.long(), :]\n", "\n", " # Then append new tokens\n", " input_ids[update_indices_2[:, 0], update_indices_2[:, 1], None] = beam_next_tokens.unsqueeze(-1).to(device).to(torch.long)\n", " input_ids = input_ids * 1 # Hack to materialize tensor\n", "\n", " # update generated ids, model inputs, and length for next step\n", " model_kwargs = self._update_model_kwargs_for_xla_generation(\n", " model_kwargs,\n", " batch_size=batch_beam_size,\n", " is_encoder_decoder=self.config.is_encoder_decoder,\n", " max_length=stopping_criteria.max_length,\n", " seq_length=cur_len,\n", " use_cache=model_kwargs[\"use_cache\"],\n", " )\n", " if model_kwargs[\"past_key_values\"] is not None:\n", " model_kwargs[\"past_key_values\"] = self._reorder_cache(model_kwargs[\"past_key_values\"], beam_idx.to(torch.int64))\n", "\n", " if return_dict_in_generate and output_scores:\n", " beam_indices = tuple((beam_indices[beam_idx[i]] + (beam_idx[i],) for i in range(len(beam_indices))))\n", "\n", " # increase cur_len\n", " cur_len = cur_len + 1\n", "\n", " # stop when each sentence is finished, or if we exceed the maximum length\n", " stop_criterion_1 = beam_scorer.is_done\n", " if isinstance(stopping_criteria, list):\n", " if len(stopping_criteria) == 1:\n", " stopping_criteria = stopping_criteria[0]\n", "\n", " # Cases that can be handled in XLA without requiring\n", " # non-padded input_ids\n", " if isinstance(stopping_criteria, MaxLengthCriteria):\n", " stop_criterion_2 = cur_len >= stopping_criteria.max_length\n", " elif isinstance(stopping_criteria, MaxTimeCriteria):\n", " stop_criterion_2 = stopping_criteria(input_ids, scores)\n", " else:\n", " # Other cases will be handled on CPU\n", " batch_size, _ = input_ids.shape\n", " input_ids_cpu = input_ids.to(\"cpu\")\n", " mask = torch.cat(\n", " [torch.ones(batch_size, cur_len), torch.zeros(batch_size, input_ids.shape[1] - cur_len)], dim=1\n", " ).bool()\n", " input_ids_cpu = torch.masked_select(input_ids_cpu, mask).reshape((batch_size, cur_len))\n", " scores_cpu = scores.to(\"cpu\") if torch.is_tensor(scores) else scores\n", " stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\n", "\n", " if stop_criterion_1 or stop_criterion_2:\n", " if not synced_gpus:\n", " break\n", " else:\n", " this_peer_finished = True\n", "\n", " sequence_outputs = beam_scorer.finalize(\n", " input_ids.to(\"cpu\"),\n", " beam_scores.to(\"cpu\"),\n", " next_tokens.to(\"cpu\"),\n", " next_indices.to(\"cpu\"),\n", " pad_token_id=pad_token_id,\n", " eos_token_id=eos_token_id,\n", " max_length=stopping_criteria.max_length,\n", " beam_indices=beam_indices,\n", " )\n", "\n", " for k, v in sequence_outputs.items():\n", " if type(v) == torch.Tensor:\n", " sequence_outputs[k] = sequence_outputs[k].to(input_ids.device)\n", "\n", " return sequence_outputs[\"sequences\"]\n", "\n", "\n", " def greedy_search(\n", " self,\n", " input_ids: torch.LongTensor,\n", " logits_processor: Optional[LogitsProcessorList] = None,\n", " stopping_criteria: Optional[StoppingCriteriaList] = None,\n", " max_length: Optional[int] = None,\n", " pad_token_id: Optional[int] = None,\n", " eos_token_id: Optional[Union[int, List[int]]] = None,\n", " output_attentions: Optional[bool] = None,\n", " output_hidden_states: Optional[bool] = None,\n", " output_scores: Optional[bool] = None,\n", " return_dict_in_generate: Optional[bool] = None,\n", " seq_length: Optional[int] = int,\n", " streamer: Optional[\"BaseStreamer\"] = None,\n", " **model_kwargs,\n", " ) -> Union[GreedySearchOutput, torch.LongTensor]:\n", " \"\"\"\n", " Overriding greedy sampling to use next tokens returned from neuron device instead of logits.\n", " \"\"\"\n", " # init values\n", " logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()\n", " use_cache = model_kwargs[\"use_cache\"] if \"use_cache\" in model_kwargs else False\n", " stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()\n", " pad_token_id = pad_token_id if pad_token_id is not None else self.generation_config.pad_token_id\n", " eos_token_id = eos_token_id if eos_token_id is not None else self.generation_config.eos_token_id\n", " if isinstance(eos_token_id, int):\n", " eos_token_id = [eos_token_id]\n", " eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None\n", " output_scores = output_scores if output_scores is not None else self.generation_config.output_scores\n", " output_attentions = (\n", " output_attentions if output_attentions is not None else self.generation_config.output_attentions\n", " )\n", " output_hidden_states = (\n", " output_hidden_states if output_hidden_states is not None else self.generation_config.output_hidden_states\n", " )\n", "\n", " # init attention / hidden states / scores tuples\n", " scores = () if (return_dict_in_generate and output_scores) else None\n", " decoder_attentions = () if (return_dict_in_generate and output_attentions) else None\n", " cross_attentions = () if (return_dict_in_generate and output_attentions) else None\n", " decoder_hidden_states = () if (return_dict_in_generate and output_hidden_states) else None\n", "\n", "\n", " # keep track of which sequences are already finished\n", " unfinished_sequences = torch.ones(input_ids.shape[0], dtype=torch.long, device=input_ids.device)\n", "\n", " this_peer_finished = False # used by synced_gpus only\n", " while True:\n", "\n", " # prepare model inputs\n", " # From max_length-sized input_ids, select first\n", " # seq_length - 1 values.\n", "\n", " if model_kwargs.get(\"past_key_values\") is None:\n", " input_ids_ = input_ids[:, :seq_length]\n", " else:\n", " update_indices = torch.stack(\n", " [torch.arange(input_ids.size(0)), torch.tensor(seq_length - 1).repeat(input_ids.size(0))],\n", " dim=-1,\n", " )\n", " input_ids_ = input_ids[update_indices[:, 0], update_indices[:, 1], None]\n", "\n", " model_inputs = self.prepare_inputs_for_generation(input_ids_, **model_kwargs)\n", " \n", " # forward pass to get next token\n", " output = self(\n", " **model_inputs,\n", " return_dict=True,\n", " output_attentions=output_attentions,\n", " output_hidden_states=output_hidden_states,\n", " )\n", " next_tokens = output[0]\n", "\n", " # finished sentences should have their next token be a padding token\n", " if eos_token_id is not None:\n", " if pad_token_id is None:\n", " raise ValueError(\"If `eos_token_id` is defined, make sure that `pad_token_id` is defined.\")\n", " next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)\n", "\n", " # update generated ids, model inputs, and length for next step\n", "\n", " batch_size, _ = input_ids.shape\n", " update_indices = torch.stack(\n", " [torch.arange(batch_size), torch.tensor(seq_length).repeat(batch_size)], dim=-1\n", " )\n", " input_ids[update_indices[:, 0], update_indices[:, 1]] = next_tokens[:]\n", " model_kwargs = self._update_model_kwargs_for_xla_generation(\n", " model_kwargs,\n", " batch_size=batch_size,\n", " is_encoder_decoder=self.config.is_encoder_decoder,\n", " max_length=stopping_criteria.max_length,\n", " seq_length=seq_length,\n", " use_cache=use_cache,\n", " )\n", "\n", " seq_length += 1\n", "\n", " # if eos_token was found in one sentence, set sentence to finished\n", " if eos_token_id_tensor is not None:\n", " unfinished_sequences = unfinished_sequences.mul(\n", " next_tokens.tile(eos_token_id_tensor.shape[0], 1).ne(eos_token_id_tensor.unsqueeze(1)).prod(dim=0)\n", " )\n", "\n", " # stop when each sentence is finished, or if we exceed the maximum length\n", " stop_criterion_1 = unfinished_sequences.max() == 0\n", "\n", " if isinstance(stopping_criteria, list):\n", " if len(stopping_criteria) == 1:\n", " stopping_criteria = stopping_criteria[0]\n", "\n", " # Cases that can be handled in XLA without requiring\n", " # non-padded input_ids\n", " if isinstance(stopping_criteria, MaxLengthCriteria):\n", " stop_criterion_2 = seq_length >= stopping_criteria.max_length\n", " elif isinstance(stopping_criteria, MaxTimeCriteria):\n", " stop_criterion_2 = stopping_criteria(input_ids, scores)\n", " else:\n", " # Other cases will be handled on CPU\n", " batch_size, _ = input_ids.shape\n", " mask = torch.cat(\n", " [torch.ones(batch_size, seq_length), torch.zeros(batch_size, input_ids.shape[1] - seq_length)],\n", " dim=1,\n", " ).bool()\n", " input_ids_cpu = torch.masked_select(input_ids, mask).reshape((batch_size, seq_length)).to(\"cpu\")\n", " scores_cpu = scores.to(\"cpu\") if torch.is_tensor(scores) else scores\n", " stop_criterion_2 = stopping_criteria(input_ids_cpu, scores_cpu)\n", "\n", " if stop_criterion_1 or stop_criterion_2:\n", " this_peer_finished = True\n", "\n", " if this_peer_finished:\n", " break\n", "\n", " if streamer is not None:\n", " streamer.end()\n", "\n", " return input_ids\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's test inference on CPU with all the wrappers before tracing." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Let's set some run parameters\n", "\n", "model_name = \"t5-large\"\n", "num_beams = 1\n", "num_return_sequences = 1\n", "max_length = 128" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Results:\n", "1 Lassen Sie uns gutes Essen essen.\n" ] } ], "source": [ "from transformers import T5Tokenizer\n", "\n", "\n", "prompt=\"translate English to German: Lets eat good food.\"\n", " \n", "tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\n", "model = T5Wrapper.from_pretrained(model_name)\n", "\n", "model.encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, \"cpu\", num_beams)\n", "setattr(model.encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", "\n", "model.decoder = DecoderWrapper(decoder=model.decoder,\n", " lm_head=model.lm_head,\n", " model_config=model.config,\n", " num_beams=num_beams,\n", " max_length=max_length,\n", " device=\"cpu\")\n", "\n", "output = model.generate(tokenizer=tokenizer,\n", " prompt=prompt,\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences,\n", " device=\"cpu\")\n", "\n", "results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\n", "\n", "print('Results:')\n", "for i, summary in enumerate(results):\n", " print(i + 1, summary)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the wrappers are running as expected, let's trace the encoder, and decoder. To trace these functions, we pass the function and a sample input to the trace function. The result of the trace stage will be a static executable where the operations to be run upon inference are determined during compilation. This means that when inferring, the resulting Neuron model must be executed with tensors that are the exact same shape as those provided at compilation time. If a model is given a tensor at inference time whose shape does not match the tensor given at compilation time, an error will occur.\n", "\n", "The decoder wrapper returns the new state of the cache as an output which is copied back to the CPU. As the cache is a large tensor, copying it to and from the XLA device for each decoder invocation will significantly slow down the inference. Instead, we can use input output aliasing, a feature of `torch_neuronx` to keep these tensors on device rather than copying back to the CPU. To use input output aliasing, we need to map the outputs to input parameters while tracing. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuronx\n", "\n", "from transformers import T5Tokenizer, T5ForConditionalGeneration\n", "\n", "def trace_encoder(model: T5ForConditionalGeneration,\n", " tokenizer: T5Tokenizer,\n", " max_length: int,\n", " num_beams: int):\n", " \n", " # Trace encoder\n", " batch_encoding = tokenizer(\"translate English to German: Lets go home now\",\n", " max_length=max_length, truncation=True, padding='max_length', return_tensors=\"pt\")\n", " input_ids = batch_encoding['input_ids']\n", " attention_mask = batch_encoding['attention_mask']\n", "\n", " encoder = EncoderWrapper(model.encoder, model.decoder, model.config, num_beams, max_length, \"xla\", num_beams)\n", " traced_encoder = torch_neuronx.trace(encoder, (input_ids, attention_mask), compiler_workdir=\"/tmp/encoder/\")\n", " setattr(traced_encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", "\n", " return traced_encoder\n", "\n", "def trace_decoder(model: T5ForConditionalGeneration,\n", " num_beams: int,\n", " max_length: int):\n", "\n", " decoder = DecoderWrapper(decoder=model.decoder,\n", " lm_head=model.lm_head,\n", " model_config=model.config,\n", " num_beams=num_beams,\n", " max_length=max_length,\n", " device=\"xla\")\n", "\n", " # We create mock inputs so we can trace the decoder\n", " decoder_input_ids = torch.ones((num_beams, 1), dtype=torch.int64)\n", " decoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int32)\n", " encoder_attention_mask = torch.ones((num_beams, max_length), dtype=torch.int64)\n", " encoder_hidden_states = torch.ones((num_beams, max_length, model.config.d_model), dtype=torch.float32)\n", "\n", " beam_idx = torch.arange(0, num_beams, dtype=torch.int64)\n", " beam_scores = torch.zeros((num_beams,), dtype=torch.float)\n", "\n", " num_outputs_from_trace = 3 if num_beams > 1 else 1\n", "\n", " aliases = {}\n", " for i in range(len(decoder.past_key_values_sa)):\n", " aliases[decoder.past_key_values_sa[i]] = i + num_outputs_from_trace\n", " for i in range(len(decoder.past_key_values_ca)):\n", " aliases[decoder.past_key_values_ca[i]] = len(decoder.past_key_values_sa) + i + num_outputs_from_trace\n", "\n", " traced_decoder = torch_neuronx.trace(decoder, (\n", " decoder_input_ids,\n", " decoder_attention_mask,\n", " encoder_hidden_states,\n", " encoder_attention_mask,\n", " beam_idx,\n", " beam_scores,\n", " ), input_output_aliases=aliases, compiler_workdir=\"/tmp/decoder/\")\n", "\n", " return traced_decoder\n", "\n", "\n", "tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\n", "model = T5ForConditionalGeneration.from_pretrained(model_name)\n", "\n", "# We enable this flag to ensure model uses attention key value caching\n", "model.config.use_cache = True\n", "\n", "traced_encoder = trace_encoder(model, tokenizer, max_length, num_beams)\n", "traced_decoder = trace_decoder(model, num_beams, max_length)\n", "\n", "torch.jit.save(traced_encoder, \"TracedEncoder.pt\")\n", "torch.jit.save(traced_decoder, \"TracedDecoder.pt\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run inference with greedy decoding\n", "Now that we have the traced model, let's use it for inference. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Results:\n", "1 Lassen Sie uns gutes Essen essen.\n" ] } ], "source": [ "runtime = torch.classes.neuron.Runtime()\n", "runtime.initialize()\n", "runtime.set_default_neuron_cores(0, 1)\n", "\n", "tokenizer = T5Tokenizer.from_pretrained(model_name)\n", "model = T5Wrapper.from_pretrained(model_name)\n", "\n", "model.encoder = torch.jit.load(\"TracedEncoder.pt\")\n", "# Attribute required by beam search\n", "setattr(model.encoder, 'main_input_name', 'input_ids') \n", "\n", "model.decoder = torch.jit.load(\"TracedDecoder.pt\")\n", "torch_neuronx.move_trace_to_device(model.decoder, 0)\n", "\n", "\n", "output = model.generate(tokenizer=tokenizer,\n", " prompt=\"translate English to German: Lets eat good food.\",\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences,\n", " device=\"xla\")\n", "\n", "results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\n", "\n", "print('Results:')\n", "for i, summary in enumerate(results):\n", " print(i + 1, summary)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run inference with beam search" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's set some run parameters for beam search\n", "\n", "model_name = \"t5-large\"\n", "num_beams = 4\n", "num_return_sequences = 4\n", "max_length = 128\n", "\n", "tokenizer = T5Tokenizer.from_pretrained(model_name, model_max_length=max_length)\n", "model = T5ForConditionalGeneration.from_pretrained(model_name)\n", "model.config.use_cache = True\n", "\n", "traced_encoder = trace_encoder(model, tokenizer, max_length, num_beams)\n", "traced_decoder = trace_decoder(model, num_beams, max_length)\n", "\n", "torch.jit.save(traced_encoder, \"TracedEncoder.pt\")\n", "torch.jit.save(traced_decoder, \"TracedDecoder.pt\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Results:\n", "1 Lassen Sie uns gutes Essen essen.\n", "2 Lassen Sie uns gutes Essen zu essen.\n", "3 Lassen Sie uns essen gutes Essen.\n", "4 Lassen Sie uns gutes Essen.\n" ] } ], "source": [ "tokenizer = T5Tokenizer.from_pretrained(model_name)\n", "model = T5Wrapper.from_pretrained(model_name)\n", "\n", "model.encoder = torch.jit.load(\"TracedEncoder.pt\")\n", "# Attribute required by beam search\n", "setattr(model.encoder, 'main_input_name', 'input_ids') \n", "\n", "model.decoder = torch.jit.load(\"TracedDecoder.pt\")\n", "torch_neuronx.move_trace_to_device(model.decoder, 0)\n", "\n", "\n", "output = model.generate(tokenizer=tokenizer,\n", " prompt=\"translate English to German: Lets eat good food.\",\n", " max_length=max_length,\n", " num_beams=num_beams,\n", " num_return_sequences=num_return_sequences,\n", " device=\"xla\")\n", "\n", "results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\n", "\n", "print('Results:')\n", "for i, summary in enumerate(results):\n", " print(i + 1, summary)" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: src/examples/pytorch/torchserve/benchmark_bert.py ================================================ import os import argparse import time import numpy as np import requests import sys from concurrent import futures import torch parser = argparse.ArgumentParser() parser.add_argument('--url', help='Torchserve model URL', type=str, default=f'http://127.0.0.1:8080/predictions/bert-max_length128-batch_size6') parser.add_argument('--num_thread', type=int, default=64, help='Number of threads invoking the model URL') parser.add_argument('--batch_size', type=int, default=6) parser.add_argument('--sequence_length', type=int, default=128) parser.add_argument('--latency_window_size', type=int, default=1000) parser.add_argument('--throughput_time', type=int, default=300) parser.add_argument('--throughput_interval', type=int, default=10) args = parser.parse_args() data = { 'seq_0': 'A completely made up sentence.', 'seq_1': 'Well, I suppose they are all made up.' } live = True num_infer = 0 latency_list = [] def one_thread(pred, feed_data): global latency_list global num_infer global live session = requests.Session() while True: start = time.time() result = session.post(pred, data=feed_data) latency = time.time() - start latency_list.append(latency) num_infer += 1 if not live: break def current_performance(): last_num_infer = num_infer for _ in range(args.throughput_time // args.throughput_interval): current_num_infer = num_infer throughput = (current_num_infer - last_num_infer) / args.throughput_interval p50 = 0.0 p90 = 0.0 if latency_list: p50 = np.percentile(latency_list[-args.latency_window_size:], 50) p90 = np.percentile(latency_list[-args.latency_window_size:], 90) print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90)) sys.stdout.flush() last_num_infer = current_num_infer time.sleep(args.throughput_interval) global live live = False with futures.ThreadPoolExecutor(max_workers=args.num_thread+1) as executor: executor.submit(current_performance) for _ in range(args.num_thread): executor.submit(one_thread, args.url, data) ================================================ FILE: src/examples/pytorch/torchserve/config.json ================================================ { "model_name": "bert-base-cased-finetuned-mrpc", "max_length": 128, "batch_size": 6 } ================================================ FILE: src/examples/pytorch/torchserve/handler_bert.py ================================================ import os import json import sys import logging from abc import ABC import torch import torch_neuron from transformers import AutoTokenizer from ts.torch_handler.base_handler import BaseHandler # one core per worker os.environ['NEURON_RT_NUM_CORES'] = '1' logger = logging.getLogger(__name__) class BertEmbeddingHandler(BaseHandler, ABC): """ Handler class for Bert Embedding computations. """ def __init__(self): super(BertEmbeddingHandler, self).__init__() self.initialized = False def initialize(self, ctx): self.manifest = ctx.manifest properties = ctx.system_properties self.device = 'cpu' model_dir = properties.get('model_dir') serialized_file = self.manifest['model']['serializedFile'] model_pt_path = os.path.join(model_dir, serialized_file) # point sys.path to our config file with open('config.json') as fp: config = json.load(fp) self.max_length = config['max_length'] self.batch_size = config['batch_size'] self.classes = ['not paraphrase', 'paraphrase'] self.model = torch.jit.load(model_pt_path) logger.debug(f'Model loaded from {model_dir}') self.model.to(self.device) self.model.eval() self.tokenizer = AutoTokenizer.from_pretrained(config['model_name']) self.initialized = True def preprocess(self, input_data): """ Tokenization pre-processing """ input_ids = [] attention_masks = [] token_type_ids = [] for row in input_data: seq_0 = row['seq_0'].decode('utf-8') seq_1 = row['seq_1'].decode('utf-8') logger.debug(f'Received text: "{seq_0}", "{seq_1}"') inputs = self.tokenizer.encode_plus( seq_0, seq_1, max_length=self.max_length, padding='max_length', truncation=True, return_tensors='pt' ) input_ids.append(inputs['input_ids']) attention_masks.append(inputs['attention_mask']) token_type_ids.append(inputs['token_type_ids']) batch = (torch.cat(input_ids, 0), torch.cat(attention_masks, 0), torch.cat(token_type_ids, 0)) return batch def inference(self, inputs): """ Predict the class of a text using a trained transformer model. """ # sanity check dimensions assert(len(inputs) == 3) num_inferences = len(inputs[0]) assert(num_inferences <= self.batch_size) # insert padding if we received a partial batch padding = self.batch_size - num_inferences if padding > 0: pad = torch.nn.ConstantPad1d((0, 0, 0, padding), value=0) inputs = [pad(x) for x in inputs] outputs = self.model(*inputs)[0] predictions = [] for i in range(num_inferences): prediction = self.classes[outputs[i].argmax().item()] predictions.append([prediction]) logger.debug("Model predicted: '%s'", prediction) return predictions def postprocess(self, inference_output): return inference_output ================================================ FILE: src/examples/pytorch/torchserve/handler_bert_neuronx.py ================================================ import os import json import sys import logging from abc import ABC import torch import torch_neuronx from transformers import AutoTokenizer from ts.torch_handler.base_handler import BaseHandler # one core per worker os.environ['NEURON_RT_NUM_CORES'] = '1' logger = logging.getLogger(__name__) class BertEmbeddingHandler(BaseHandler, ABC): """ Handler class for Bert Embedding computations. """ def __init__(self): super(BertEmbeddingHandler, self).__init__() self.initialized = False def initialize(self, ctx): self.manifest = ctx.manifest properties = ctx.system_properties self.device = 'cpu' model_dir = properties.get('model_dir') serialized_file = self.manifest['model']['serializedFile'] model_pt_path = os.path.join(model_dir, serialized_file) # point sys.path to our config file with open('config.json') as fp: config = json.load(fp) self.max_length = config['max_length'] self.batch_size = config['batch_size'] self.classes = ['not paraphrase', 'paraphrase'] self.model = torch.jit.load(model_pt_path) logger.debug(f'Model loaded from {model_dir}') self.model.to(self.device) self.model.eval() self.tokenizer = AutoTokenizer.from_pretrained(config['model_name']) self.initialized = True def preprocess(self, input_data): """ Tokenization pre-processing """ input_ids = [] attention_masks = [] token_type_ids = [] for row in input_data: seq_0 = row['seq_0'].decode('utf-8') seq_1 = row['seq_1'].decode('utf-8') logger.debug(f'Received text: "{seq_0}", "{seq_1}"') inputs = self.tokenizer.encode_plus( seq_0, seq_1, max_length=self.max_length, padding='max_length', truncation=True, return_tensors='pt' ) input_ids.append(inputs['input_ids']) attention_masks.append(inputs['attention_mask']) token_type_ids.append(inputs['token_type_ids']) batch = (torch.cat(input_ids, 0), torch.cat(attention_masks, 0), torch.cat(token_type_ids, 0)) return batch def inference(self, inputs): """ Predict the class of a text using a trained transformer model. """ # sanity check dimensions assert(len(inputs) == 3) num_inferences = len(inputs[0]) assert(num_inferences <= self.batch_size) # insert padding if we received a partial batch padding = self.batch_size - num_inferences if padding > 0: pad = torch.nn.ConstantPad1d((0, 0, 0, padding), value=0) inputs = [pad(x) for x in inputs] outputs = self.model(*inputs)[0] predictions = [] for i in range(num_inferences): prediction = self.classes[outputs[i].argmax(dim=-1).item()] predictions.append([prediction]) logger.debug("Model predicted: '%s'", prediction) return predictions def postprocess(self, inference_output): return inference_output ================================================ FILE: src/examples/pytorch/torchserve/infer_bert.py ================================================ import json import concurrent.futures import requests with open('config.json') as fp: config = json.load(fp) max_length = config['max_length'] batch_size = config['batch_size'] name = f'bert-max_length{max_length}-batch_size{batch_size}' # dispatch requests in parallel url = f'http://localhost:8080/predictions/{name}' paraphrase = {'seq_0': "HuggingFace's headquarters are situated in Manhattan", 'seq_1': "The company HuggingFace is based in New York City"} not_paraphrase = {'seq_0': paraphrase['seq_0'], 'seq_1': 'This is total nonsense.'} with concurrent.futures.ThreadPoolExecutor(max_workers=batch_size) as executor: def worker_thread(worker_index): # we'll send half the requests as not_paraphrase examples for sanity data = paraphrase if worker_index < batch_size//2 else not_paraphrase try: response = requests.post(url, data=data) # Check if the response status code indicates success if response.status_code == 200: print(worker_index, response.json()) else: # If the response is not successful, raise an exception with the status code and error message error_message = response.json().get('message', 'Unknown Error') raise Exception(f"Failed request with status code {response.status_code}: {error_message}") except Exception as e: # Catch all other exceptions that may be raised print(f"An unexpected error occurred: {e}") raise for worker_index in range(batch_size): executor.submit(worker_thread, worker_index) ================================================ FILE: src/examples/pytorch/torchserve/torchserve.config ================================================ # bind inference API to all network interfaces with SSL enabled inference_address=http://0.0.0.0:8080 default_workers_per_model=1 ================================================ FILE: src/examples/pytorch/torchserve/trace_bert_neuron.py ================================================ import torch import torch_neuron from transformers import AutoTokenizer, AutoModelForSequenceClassification # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" max_length = 128 batch_size = 6 paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") example_inputs_paraphrase = ( torch.cat([paraphrase['input_ids']] * batch_size, 0), torch.cat([paraphrase['attention_mask']] * batch_size, 0), torch.cat([paraphrase['token_type_ids']] * batch_size, 0) ) # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron model_neuron_batch = torch_neuron.trace(model, example_inputs_paraphrase) # Save the batched model model_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size)) ================================================ FILE: src/examples/pytorch/torchserve/trace_bert_neuronx.py ================================================ import torch import torch_neuronx from transformers import AutoTokenizer, AutoModelForSequenceClassification # Build tokenizer and model tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False) # Setup some example inputs sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "HuggingFace's headquarters are situated in Manhattan" max_length = 128 batch_size = 6 paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt") example_inputs_paraphrase = ( torch.cat([paraphrase['input_ids']] * batch_size, 0), torch.cat([paraphrase['attention_mask']] * batch_size, 0), torch.cat([paraphrase['token_type_ids']] * batch_size, 0) ) # Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron model_neuron_batch = torch_neuronx.trace(model, example_inputs_paraphrase) # Save the batched model model_neuron_batch.save('bert_neuron_b{}.pt'.format(batch_size)) ================================================ FILE: src/examples/pytorch/transformers-marianmt.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transformers MarianMT Tutorial\n", "\n", "In this tutorial, you will deploy the [HuggingFace MarianMT](https://huggingface.co/transformers/v4.0.1/model_doc/marian.html) model for text translation.\n", "\n", "This Jupyter notebook should be run on an inf1.6xlarge instance since you will be loading and compiling several large models.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page.\n", "\n", "To generate text, you will be using the beam search algorithm to incrementally generate token candidates until the full output text has been created. Unlike simple single-pass models, this algorithm divides the work into two distinct phases:\n", "\n", "- **Encoder**: Convert the input text into an encoded representation. (Executed once)\n", "- **Decoder**: Use the encoded representation of the input text and the current output tokens to incrementally generate the set of next best candidate tokens. (Executed many times)\n", "\n", "In this tutorial you will perform the following steps:\n", "\n", "- **Compile**: Compile both the Encoder and Decoder for Neuron using simplified interfaces for inference.\n", "- **Infer**: Run on CPU and Neuron and compare results.\n", "\n", "Finally, a completely unrolled decoder will be built which simplifies the implementation at the cost of performing fixed-length inferences." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Dependencies:\n", "\n", "This tutorial has the following dependencies:\n", "\n", "- `transformers==4.25.1`\n", "- `torch-neuron`\n", "- `sentencepiece`\n", "- `neuron-cc[tensorflow]`\n", "\n", "The following will install the required `transformers` version. Note that encoder/decoder API changes across different minor versions requires that you are specific about the version used. Also note that the `torch-neuron` version is pinned due to `transformer` compatibility issues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install sentencepiece transformers==4.26.1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parameters\n", "\n", "The parameters of a generative model can be tuned for different use-cases. In this example, you'll tailor the parameters to a single inference beam search for an on-demand inference use-case. See the [MarianConfig](https://huggingface.co/transformers/v4.0.1/model_doc/marian.html#marianconfig) for parameter details.\n", "\n", "Rather than varying the encoder/decoder token sizes at runtime, you must define these parameters prior to compilation. The encoder/decoder token sizes are important tunable parameters as a large token sequence will offer greater sentence length flexibility but perform worse than a small token sequence.\n", "\n", "To maximize performance on Neuron, the `num_beams`, `max_encode_length` and `max_decoder_length` should be made as small as possible for the use-case.\n", "\n", "For this tutorial you will use a model that translates sentences of up to 32 token from English to German." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "model_name = \"Helsinki-NLP/opus-mt-en-de\" # English -> German model\n", "num_texts = 1 # Number of input texts to decode\n", "num_beams = 4 # Number of beams per input text\n", "max_encoder_length = 32 # Maximum input token length\n", "max_decoder_length = 32 # Maximum output token length" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CPU Model Inference\n", "\n", "Start by executing the model on CPU to test its execution.\n", "\n", "The following defines the inference function which will be used to compare the Neuron and CPU output. In this example you will display all beam search sequences that were generated. For a real on-demand use case, set the `num_beams` to `1` to return only the top result." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def infer(model, tokenizer, text):\n", "\n", " # Truncate and pad the max length to ensure that the token size is compatible with fixed-sized encoder (Not necessary for pure CPU execution)\n", " batch = tokenizer(text, max_length=max_decoder_length, truncation=True, padding='max_length', return_tensors=\"pt\")\n", " output = model.generate(**batch, max_length=max_decoder_length, num_beams=num_beams, num_return_sequences=num_beams)\n", " results = [tokenizer.decode(t, skip_special_tokens=True) for t in output]\n", "\n", " print('Texts:')\n", " for i, summary in enumerate(results):\n", " print(i + 1, summary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that after loading the model, we also set the maximum length. This will later be used to limit the size of the compiled model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import MarianMTModel, MarianTokenizer\n", "\n", "model_cpu = MarianMTModel.from_pretrained(model_name)\n", "model_cpu.config.max_length = max_decoder_length\n", "model_cpu.eval()\n", "\n", "tokenizer = MarianTokenizer.from_pretrained(model_name)\n", "\n", "sample_text = \"I am a small frog.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infer(model_cpu, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Padded Model\n", "In order to perform inference on Neuron, the model must be changed in a way that it supports tracing and fixed-sized inputs. One way in which this is possible is to use a pad the model inputs to the maximum possible tensor sizes. The benefit of using a padded model is that it supports variable length text generation up to a specified length `max_decoder_length`. A consequence of padding is that it can negatively impact performance due to large data transfers.\n", "\n", "### PaddedEncoder & PaddedDecoder Modules\n", "Here you will define wrappers around the encoder and decoder portions of the generation model that are compatible with `torch.jit.trace` as well as fixed-sized inputs.\n", "\n", "The following are important features which are distinct from the default configuration:\n", "\n", "1. Disabled `return_dict`. When this is enabled, the network uses `dataclass` type outputs which are not compatible with `torch.jit.trace`.\n", "2. Disabled `use_cache`. When this option is enabled, the network expects a collection of cache tensors which grow upon each iteration. Since Neuron requires fixed sized inputs, this must be disabled.\n", "3. The `GenerationMixin:beam_search` implementation uses only the logits for the current iteration index from the original decoder layer output. Since inputs must be padded, performance can be improved by selecting only a subset of the hidden state prior to the final linear layer. For efficiency on Neuron, this reduction uses an elementwise-multiply to mask out the unused hidden values and then sums along an axis.\n", "4. Since a reduction step is insterted between the decoder output and the final logit calculation, the original `model` attribute is not used. Instead the `PaddedDecoder` class combines the decoder, reducer, and linear layers into a combined forward pass. In the original model there is a clear distinction between the decoder layer and the final linear layer. These layers are fused together to get one large fully optimized graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "from torch.nn import functional as F\n", "\n", "\n", "class PaddedEncoder(torch.nn.Module):\n", "\n", " def __init__(self, model):\n", " super().__init__()\n", " self.encoder = model.model.encoder\n", " self.main_input_name = 'input_ids'\n", " \n", " def forward(self, input_ids, attention_mask):\n", " return self.encoder(input_ids, attention_mask=attention_mask, return_dict=False)\n", "\n", "\n", "class PaddedDecoder(torch.nn.Module):\n", "\n", " def __init__(self, model):\n", " super().__init__()\n", " self.weight = model.model.shared.weight.clone().detach()\n", " self.bias = model.final_logits_bias.clone().detach()\n", " self.decoder = model.model.decoder\n", "\n", " def forward(self, input_ids, attention_mask, encoder_outputs, index):\n", "\n", " # Invoke the decoder\n", " hidden, = self.decoder(\n", " input_ids=input_ids,\n", " encoder_hidden_states=encoder_outputs,\n", " encoder_attention_mask=attention_mask,\n", " return_dict=False,\n", " use_cache=False,\n", " )\n", "\n", " _, n_length, _ = hidden.shape\n", "\n", " # Create selection mask\n", " mask = torch.arange(n_length, dtype=torch.float32) == index\n", " mask = mask.view(1, -1, 1)\n", "\n", " # Broadcast mask\n", " masked = torch.multiply(hidden, mask)\n", "\n", " # Reduce along 1st dimension\n", " hidden = torch.sum(masked, 1, keepdims=True)\n", "\n", " # Compute final linear layer for token probabilities\n", " logits = F.linear(\n", " hidden,\n", " self.weight,\n", " bias=self.bias\n", " )\n", " return logits\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PaddedGenerator - GenerationMixin Class\n", "\n", "\n", "On text generation tasks, HuggingFace Transformers defines a [GenerationMixin](https://huggingface.co/transformers/v4.0.1/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin) base class which provides standard methods and algorithms to generate text. For this tutorial, you will be using the beam search algorithm on encoder/decoder architectures.\n", "\n", "To be able to use these methods, you will be defining your own class derived from the GenerationMixin class to run a beam search. This will invoke the encoder and decoder layers in a way that is compatible with fixed sized inputs and traced modules. This means you must import the base class and the output objects ([Seq2SeqLMOutput](https://huggingface.co/transformers/v4.0.1/main_classes/output.html#transformers.modeling_outputs.Seq2SeqLMOutput), [BaseModelOutput](https://huggingface.co/transformers/v4.0.1/main_classes/output.html#transformers.modeling_outputs.BaseModelOutput)) used by the [beam_search](https://huggingface.co/transformers/v4.0.1/main_classes/model.html?highlight=generate#transformers.generation_utils.GenerationMixin.beam_search) algorithm.\n", "\n", "The `GenerationMixin:generate` method will use `GenerationMixin:beam_search` which requires that you to define your own class implementation that invokes the `PaddedEncoder` and `PaddedDecoder` modules using padded inputs. The standard generator model implementation will not work by default because it is intended to infer with variable-sized (growing) input tensors. \n", "\n", "The `from_model` method is defined to create the `PaddedGenerator` from an existing pretrained generator class.\n", "\n", "To invoke the Encoder and Decoder traced modules in a way that is compatible with the `GenerationMixin:beam_search` implementation, the `get_encoder`, `__call__`, and `prepare_inputs_for_generation` methods are overriden.\n", "\n", "Lastly, the class defines methods for serialization so that the model can be easily saved and loaded." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from transformers import GenerationMixin, AutoConfig\n", "from transformers.modeling_outputs import Seq2SeqLMOutput, BaseModelOutput\n", "from transformers.modeling_utils import PreTrainedModel\n", "\n", "\n", "class PaddedGenerator(PreTrainedModel, GenerationMixin):\n", "\n", " @classmethod\n", " def from_model(cls, model):\n", " generator = cls(model.config)\n", " generator.encoder = PaddedEncoder(model)\n", " generator.decoder = PaddedDecoder(model)\n", " return generator\n", " \n", " def prepare_inputs_for_generation(\n", " self,\n", " input_ids,\n", " encoder_outputs=None,\n", " attention_mask=None,\n", " **kwargs,\n", " ):\n", " # Pad the inputs for Neuron\n", " current_length = input_ids.shape[1]\n", " pad_size = self.config.max_length - current_length\n", " return dict(\n", " input_ids=F.pad(input_ids, (0, pad_size)),\n", " attention_mask=attention_mask,\n", " encoder_outputs=encoder_outputs.last_hidden_state,\n", " current_length=torch.tensor(current_length - 1),\n", " )\n", "\n", " def get_encoder(self):\n", " def encode(input_ids, attention_mask, **kwargs): \n", " output, = self.encoder(input_ids, attention_mask)\n", " return BaseModelOutput(\n", " last_hidden_state=output,\n", " )\n", " return encode\n", "\n", " def forward(self, input_ids, attention_mask, encoder_outputs, current_length, **kwargs):\n", " logits = self.decoder(input_ids, attention_mask, encoder_outputs, current_length)\n", " return Seq2SeqLMOutput(logits=logits)\n", "\n", " @property\n", " def device(self): # Attribute required by beam search\n", " return torch.device('cpu')\n", " \n", " def save_pretrained(self, directory):\n", " if os.path.isfile(directory):\n", " print(f\"Provided path ({directory}) should be a directory, not a file\")\n", " return\n", " os.makedirs(directory, exist_ok=True)\n", " torch.jit.save(self.encoder, os.path.join(directory, 'encoder.pt'))\n", " torch.jit.save(self.decoder, os.path.join(directory, 'decoder.pt'))\n", " self.config.save_pretrained(directory)\n", "\n", " @classmethod\n", " def from_pretrained(cls, directory):\n", " config = AutoConfig.from_pretrained(directory)\n", " obj = cls(config)\n", " obj.encoder = torch.jit.load(os.path.join(directory, 'encoder.pt'))\n", " obj.decoder = torch.jit.load(os.path.join(directory, 'decoder.pt'))\n", " setattr(obj.encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", " return obj\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Padded CPU Inference\n", "To start, it is important to ensure that the transformations we have made to the model were successful. Using the classes defined above we can test that the padded model execution on CPU is identical to the original output also running on CPU." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "padded_model_cpu = PaddedGenerator.from_model(model_cpu)\n", "infer(padded_model_cpu, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Padded Neuron Tracing & Inference\n", "\n", "Now that the padded version of model is confirmed to produce the same outputs as the non-padded version, the model can be compiled for Neuron." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch_neuron\n", "\n", "\n", "def trace(model, num_texts, num_beams, max_decoder_length, max_encoder_length):\n", " \"\"\"\n", " Traces the encoder and decoder modules for use on Neuron.\n", "\n", " This function fixes the network to the given sizes. Once the model has been\n", " compiled to a given size, the inputs to these networks must always be of\n", " fixed size.\n", "\n", " Args:\n", " model (PaddedGenerator): The padded generator to compile for Neuron\n", " num_texts (int): The number of input texts to translate at once\n", " num_beams (int): The number of beams to compute per text\n", " max_decoder_length (int): The maximum number of tokens to be generated\n", " max_encoder_length (int): The maximum number of input tokens that will be encoded\n", " \"\"\"\n", "\n", " # Trace the encoder\n", " inputs = (\n", " torch.ones((num_texts, max_encoder_length), dtype=torch.long),\n", " torch.ones((num_texts, max_encoder_length), dtype=torch.long),\n", " )\n", " encoder = torch_neuron.trace(model.encoder, inputs)\n", "\n", " # Trace the decoder (with expanded inputs)\n", " batch_size = num_texts * num_beams\n", " inputs = (\n", " torch.ones((batch_size, max_decoder_length), dtype=torch.long),\n", " torch.ones((batch_size, max_encoder_length), dtype=torch.long),\n", " torch.ones((batch_size, max_encoder_length, model.config.d_model), dtype=torch.float),\n", " torch.tensor(0),\n", " )\n", " decoder = torch_neuron.trace(model.decoder, inputs)\n", " \n", " traced = PaddedGenerator(model.config)\n", " traced.encoder = encoder\n", " traced.decoder = decoder\n", " setattr(encoder, 'main_input_name', 'input_ids') # Attribute required by beam search\n", " return traced" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "padded_model_neuron = trace(padded_model_cpu, num_texts, num_beams, max_decoder_length, max_encoder_length)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the Neuron execution to the original CPU implementation, you will see the exact same generated text.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# CPU execution for comparison\n", "infer(padded_model_neuron, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Padded Neuron Serialization\n", "Finally, we can test that we can serialize and reload the model so that it can be used later in its precompiled format." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "padded_model_neuron.save_pretrained('NeuronPaddedMarianMT')\n", "padded_model_loaded = PaddedGenerator.from_pretrained('NeuronPaddedMarianMT')\n", "infer(padded_model_loaded, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Greedy Unrolled Model\n", "An unrolled version of the model can achieve better performance in some cases since all operations will be executed on the Neuron hardware without returning to CPU. The consequence of this type of model is that since the generation loop execution never returns to CPU, the entire sequence up to `max_decoder_length` is performed in a single forward pass.\n", "\n", "The following module performs greedy text generation. Unlike the original beam search text generation, this implementation always selects the most probable token and does not generate multiple result texts.\n", "\n", "### GreedyUnrolledGenerator Module" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class GreedyUnrolledGenerator(torch.nn.Module):\n", " \n", " def __init__(self, model):\n", " super().__init__()\n", " self.config = model.config\n", " self.model = model\n", " \n", " def forward(self, input_ids, attention_mask):\n", " \n", " # Generate the encoder state for the input tokens. This is only done once and the state is reused.\n", " encoder_outputs, = self.model.model.encoder(input_ids, attention_mask=attention_mask, return_dict=False)\n", " \n", " # Set the intial state for the decode loop. This will grow per decoder iteration\n", " tokens = torch.full((input_ids.size(0), 2), self.config.decoder_start_token_id)\n", " \n", " # Iteratively invoke the decoder on incrementally generated `tokens` to generate a `next_token`.\n", " # Note that unlike the GeneratorMixin.generate function, there is no early-exit if the stop token \n", " # has been reached. This will always run a fixed number of iterations.\n", " for i in range(self.config.max_length):\n", " \n", " hidden, = self.model.model.decoder(\n", " input_ids=tokens,\n", " encoder_hidden_states=encoder_outputs,\n", " encoder_attention_mask=attention_mask,\n", " return_dict=False,\n", " use_cache=False,\n", " ) # size: [batch, current_length, vocab_size]\n", " \n", " logits = F.linear(\n", " hidden[:, -1, :],\n", " self.model.model.shared.weight,\n", " bias=self.model.final_logits_bias\n", " )\n", " next_tokens = torch.argmax(logits, dim=1, keepdims=True)\n", " tokens = torch.cat([tokens, next_tokens], dim=1)\n", " \n", " return tokens" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Greedy CPU Inference\n", "The inference code must be updated since the `generate` method is no longer used. This is because the entire generative inference loop occurs within the `GreedyUnrolledGenerator.forward` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def infer_greedy(model, tokenizer, text):\n", " batch = tokenizer(text, max_length=max_decoder_length, truncation=True, padding='max_length', return_tensors=\"pt\")\n", " inputs = batch['input_ids'], batch['attention_mask']\n", " tokens = greedy_cpu(*inputs)\n", " print('Texts:')\n", " for i, t in enumerate(tokens):\n", " result = tokenizer.decode(t, skip_special_tokens=True)\n", " print(i + 1, result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like in previous section of this tutorial, first the greedy model is executed on CPU to validate that the correct results were produced. In this example, the generated text matches the first result of the original beam search." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_cpu.config.max_length = 8 # This controls the number of decoder loops. Reduced to improve compilation speed.\n", "greedy_cpu = GreedyUnrolledGenerator(model_cpu)\n", "infer_greedy(greedy_cpu, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Greedy Neuron Tracing & Inference\n", "Similarly the tracing is simplified since the now the `GreedyUnrolledGenerator.forward` can be compiled as a single unit. \n", "\n", "For compilation efficiency, two changes will be made compared to normal compilaition:\n", "- `torch.jit.freeze` is used because it can *sometimes* speed up compilation by in the case where a module is re-used multiple times. In this case, it is more efficient because the `self.model.model.decoder` is used in a loop. \n", "- The `torch_neuron.trace` option `fallback` is set to `False`. This forces all operations to execute on Neuron. Most of the time this is not recommended or efficient. In this case, it is more efficient because it means a single subgraph is produced rather than many. Usually one subgraph would be produced per decoder iteration since `aten::embedding` is executed in a loop. The `aten::embedding` operation is otherwise exected on CPU by default since this is usually more efficient than executing on Neuron.\n", "\n", "You may notice that compilation will take significantly longer with the unrolled model since the model inserts new operations into the compute graph for every single decoder iteration. This creates a much larger model graph even though the weights are re-used." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "example = (\n", " torch.ones((num_texts, max_encoder_length), dtype=torch.long),\n", " torch.ones((num_texts, max_encoder_length), dtype=torch.long),\n", ")\n", "greedy_cpu.eval()\n", "greedy_trace = torch.jit.trace(greedy_cpu, example)\n", "greedy_frozen = torch.jit.freeze(greedy_trace)\n", "greedy_neuron = torch_neuron.trace(greedy_frozen, example, fallback=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "infer_greedy(greedy_neuron, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Greedy Neuron Serialization\n", "Unlike the previous version of the model that used the `GenerationMixin` base class. This greedy version of the model can be serialized using the regular `torch.jit.save` and `torch.jit.load` utilities since it is a pure torchscript module." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "torch.jit.save(greedy_neuron, 'greedy_neuron.pt')\n", "loaded_greedy_neuron = torch.jit.load('greedy_neuron.pt')\n", "infer_greedy(loaded_greedy_neuron, tokenizer, sample_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix\n", "### BART (Mask Filling Task)\n", "\n", "These `PaddedGenerator` class can be applied to the BART model for the task of filling in mask tokens.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from transformers import BartForConditionalGeneration, BartTokenizer\n", "bart_name = \"facebook/bart-large\"\n", "bart_model = BartForConditionalGeneration.from_pretrained(bart_name)\n", "bart_model.config.max_length = max_decoder_length\n", "bart_tokenizer = BartTokenizer.from_pretrained(bart_name)\n", "bart_text = \"UN Chief Says There Is No in Syria\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# CPU Execution\n", "infer(bart_model, bart_tokenizer, bart_text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# Neuron Execution\n", "paddded_bart = PaddedGenerator.from_model(bart_model)\n", "bart_neuron = trace(paddded_bart, num_texts, num_beams, max_decoder_length, max_encoder_length)\n", "infer(bart_neuron, bart_tokenizer, bart_text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pegasus (Summarization Task)\n", "\n", "These `PaddedGenerator` class can be applied to the Pegasus model for summarization.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "from transformers import PegasusForConditionalGeneration, PegasusTokenizer\n", "pegasus_name = 'google/pegasus-xsum'\n", "pegasus_model = PegasusForConditionalGeneration.from_pretrained(pegasus_name)\n", "pegasus_model.config.max_length = max_decoder_length\n", "pegasus_tokenizer = PegasusTokenizer.from_pretrained(pegasus_name)\n", "pegasus_text = \"PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions. The aim is to reduce the risk of wildfires.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# CPU Execution\n", "infer(pegasus_model, pegasus_tokenizer, pegasus_text)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "# Neuron Execution\n", "paddded_pegasus = PaddedGenerator.from_model(pegasus_model)\n", "pegasus_neuron = trace(paddded_pegasus, num_texts, num_beams, max_decoder_length, max_encoder_length)\n", "infer(pegasus_neuron, pegasus_tokenizer, pegasus_text)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/pytorch/yolo_v4.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluate YOLO v4 on Inferentia" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This tutorial walks through compiling and evaluating YOLO v4 model implemented in PyTorch on Inferentia. \n", "\n", "The tutorial has five main sections:\n", "\n", "1. Define YOLO v4 model in PyTorch\n", "2. Download the COCO 2017 evaluation dataset and define the data loader function\n", "3. Build, Compile, and Save Neuron-Optimized YOLO v4 TorchScript\n", "4. Evaluate Accuracy on the COCO 2017 Dataset\n", "5. Benchmark COCO Dataset Performance of the Neuron-Optimized TorchScript\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [PyTorch Installation Guide](../../../frameworks/torch/torch-neuron/setup/pytorch-install.html). You can select the kernel from the \"Kernel -> Change Kernel\" option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Install Dependencies:\n", "This tutorial requires the following pip packages:\n", "\n", "- `torch-neuron`\n", "- `torchvision`\n", "- `pillow`\n", "- `pycocotools`\n", "- `neuron-cc[tensorflow]`\n", "\n", "Many of these packages will be installed by default when configuring your environment using the Neuron PyTorch setup guide. The additional dependencies must be installed here." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install --upgrade pillow pycocotools " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Define YOLO v4 model in PyTorch \n", "The following PyTorch model definition is from https://github.com/Tianxiaomo/pytorch-YOLOv4/." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "import torch.neuron\n", "from torch import nn\n", "import torch.nn.functional as F\n", "import os\n", "import warnings\n", "\n", "# Setting up NeuronCore groups for inf1.6xlarge with 16 cores\n", "n_cores = 16 # This value should be 4 on inf1.xlarge and inf1.2xlarge\n", "os.environ['NEURON_RT_NUM_CORES'] = str(n_cores)\n", "\n", "\n", "class Mish(torch.nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", "\n", " def forward(self, x):\n", " x = x * (torch.tanh(torch.nn.functional.softplus(x)))\n", " return x\n", "\n", "\n", "class Upsample(nn.Module):\n", " def __init__(self):\n", " super(Upsample, self).__init__()\n", "\n", " def forward(self, x, target_size, inference=False):\n", " assert (x.data.dim() == 4)\n", "\n", " if inference:\n", "\n", " return x.view(x.size(0), x.size(1), x.size(2), 1, x.size(3), 1).\\\n", " expand(x.size(0), x.size(1), x.size(2), target_size[2] // x.size(2), x.size(3), target_size[3] // x.size(3)).\\\n", " contiguous().view(x.size(0), x.size(1), target_size[2], target_size[3])\n", " else:\n", " return F.interpolate(x, size=(target_size[2], target_size[3]), mode='nearest')\n", "\n", "\n", "class Conv_Bn_Activation(nn.Module):\n", " def __init__(self, in_channels, out_channels, kernel_size, stride, activation, bn=True, bias=False):\n", " super().__init__()\n", " pad = (kernel_size - 1) // 2\n", "\n", " self.conv = nn.ModuleList()\n", " if bias:\n", " self.conv.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad))\n", " else:\n", " self.conv.append(nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad, bias=False))\n", " if bn:\n", " self.conv.append(nn.BatchNorm2d(out_channels))\n", " if activation == \"mish\":\n", " self.conv.append(Mish())\n", " elif activation == \"relu\":\n", " self.conv.append(nn.ReLU(inplace=True))\n", " elif activation == \"leaky\":\n", " self.conv.append(nn.LeakyReLU(0.1, inplace=True))\n", " elif activation == \"linear\":\n", " pass\n", " else:\n", " print(\"activate error !!! {} {} {}\".format(sys._getframe().f_code.co_filename,\n", " sys._getframe().f_code.co_name, sys._getframe().f_lineno))\n", "\n", " def forward(self, x):\n", " for l in self.conv:\n", " x = l(x)\n", " return x\n", "\n", "\n", "class ResBlock(nn.Module):\n", " \"\"\"\n", " Sequential residual blocks each of which consists of \\\n", " two convolution layers.\n", " Args:\n", " ch (int): number of input and output channels.\n", " nblocks (int): number of residual blocks.\n", " shortcut (bool): if True, residual tensor addition is enabled.\n", " \"\"\"\n", "\n", " def __init__(self, ch, nblocks=1, shortcut=True):\n", " super().__init__()\n", " self.shortcut = shortcut\n", " self.module_list = nn.ModuleList()\n", " for i in range(nblocks):\n", " resblock_one = nn.ModuleList()\n", " resblock_one.append(Conv_Bn_Activation(ch, ch, 1, 1, 'mish'))\n", " resblock_one.append(Conv_Bn_Activation(ch, ch, 3, 1, 'mish'))\n", " self.module_list.append(resblock_one)\n", "\n", " def forward(self, x):\n", " for module in self.module_list:\n", " h = x\n", " for res in module:\n", " h = res(h)\n", " x = x + h if self.shortcut else h\n", " return x\n", "\n", "\n", "class DownSample1(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = Conv_Bn_Activation(3, 32, 3, 1, 'mish')\n", "\n", " self.conv2 = Conv_Bn_Activation(32, 64, 3, 2, 'mish')\n", " self.conv3 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\n", " # [route]\n", " # layers = -2\n", " self.conv4 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\n", "\n", " self.conv5 = Conv_Bn_Activation(64, 32, 1, 1, 'mish')\n", " self.conv6 = Conv_Bn_Activation(32, 64, 3, 1, 'mish')\n", " # [shortcut]\n", " # from=-3\n", " # activation = linear\n", "\n", " self.conv7 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\n", " # [route]\n", " # layers = -1, -7\n", " self.conv8 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\n", "\n", " def forward(self, input):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x2)\n", " # route -2\n", " x4 = self.conv4(x2)\n", " x5 = self.conv5(x4)\n", " x6 = self.conv6(x5)\n", " # shortcut -3\n", " x6 = x6 + x4\n", "\n", " x7 = self.conv7(x6)\n", " # [route]\n", " # layers = -1, -7\n", " x7 = torch.cat([x7, x3], dim=1)\n", " x8 = self.conv8(x7)\n", " return x8\n", "\n", "\n", "class DownSample2(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = Conv_Bn_Activation(64, 128, 3, 2, 'mish')\n", " self.conv2 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\n", " # r -2\n", " self.conv3 = Conv_Bn_Activation(128, 64, 1, 1, 'mish')\n", "\n", " self.resblock = ResBlock(ch=64, nblocks=2)\n", "\n", " # s -3\n", " self.conv4 = Conv_Bn_Activation(64, 64, 1, 1, 'mish')\n", " # r -1 -10\n", " self.conv5 = Conv_Bn_Activation(128, 128, 1, 1, 'mish')\n", "\n", " def forward(self, input):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x1)\n", "\n", " r = self.resblock(x3)\n", " x4 = self.conv4(r)\n", "\n", " x4 = torch.cat([x4, x2], dim=1)\n", " x5 = self.conv5(x4)\n", " return x5\n", "\n", "\n", "class DownSample3(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = Conv_Bn_Activation(128, 256, 3, 2, 'mish')\n", " self.conv2 = Conv_Bn_Activation(256, 128, 1, 1, 'mish')\n", " self.conv3 = Conv_Bn_Activation(256, 128, 1, 1, 'mish')\n", "\n", " self.resblock = ResBlock(ch=128, nblocks=8)\n", " self.conv4 = Conv_Bn_Activation(128, 128, 1, 1, 'mish')\n", " self.conv5 = Conv_Bn_Activation(256, 256, 1, 1, 'mish')\n", "\n", " def forward(self, input):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x1)\n", "\n", " r = self.resblock(x3)\n", " x4 = self.conv4(r)\n", "\n", " x4 = torch.cat([x4, x2], dim=1)\n", " x5 = self.conv5(x4)\n", " return x5\n", "\n", "\n", "class DownSample4(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = Conv_Bn_Activation(256, 512, 3, 2, 'mish')\n", " self.conv2 = Conv_Bn_Activation(512, 256, 1, 1, 'mish')\n", " self.conv3 = Conv_Bn_Activation(512, 256, 1, 1, 'mish')\n", "\n", " self.resblock = ResBlock(ch=256, nblocks=8)\n", " self.conv4 = Conv_Bn_Activation(256, 256, 1, 1, 'mish')\n", " self.conv5 = Conv_Bn_Activation(512, 512, 1, 1, 'mish')\n", "\n", " def forward(self, input):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x1)\n", "\n", " r = self.resblock(x3)\n", " x4 = self.conv4(r)\n", "\n", " x4 = torch.cat([x4, x2], dim=1)\n", " x5 = self.conv5(x4)\n", " return x5\n", "\n", "\n", "class DownSample5(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv1 = Conv_Bn_Activation(512, 1024, 3, 2, 'mish')\n", " self.conv2 = Conv_Bn_Activation(1024, 512, 1, 1, 'mish')\n", " self.conv3 = Conv_Bn_Activation(1024, 512, 1, 1, 'mish')\n", "\n", " self.resblock = ResBlock(ch=512, nblocks=4)\n", " self.conv4 = Conv_Bn_Activation(512, 512, 1, 1, 'mish')\n", " self.conv5 = Conv_Bn_Activation(1024, 1024, 1, 1, 'mish')\n", "\n", " def forward(self, input):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x1)\n", "\n", " r = self.resblock(x3)\n", " x4 = self.conv4(r)\n", "\n", " x4 = torch.cat([x4, x2], dim=1)\n", " x5 = self.conv5(x4)\n", " return x5\n", "\n", "\n", "class Neck(nn.Module):\n", " def __init__(self, inference=False):\n", " super().__init__()\n", " self.inference = inference\n", "\n", " self.conv1 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " self.conv2 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\n", " self.conv3 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " # SPP\n", " self.maxpool1 = nn.MaxPool2d(kernel_size=5, stride=1, padding=5 // 2)\n", " self.maxpool2 = nn.MaxPool2d(kernel_size=9, stride=1, padding=9 // 2)\n", " self.maxpool3 = nn.MaxPool2d(kernel_size=13, stride=1, padding=13 // 2)\n", "\n", " # R -1 -3 -5 -6\n", " # SPP\n", " self.conv4 = Conv_Bn_Activation(2048, 512, 1, 1, 'leaky')\n", " self.conv5 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\n", " self.conv6 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " self.conv7 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " # UP\n", " self.upsample1 = Upsample()\n", " # R 85\n", " self.conv8 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " # R -1 -3\n", " self.conv9 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv10 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\n", " self.conv11 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv12 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\n", " self.conv13 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv14 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\n", " # UP\n", " self.upsample2 = Upsample()\n", " # R 54\n", " self.conv15 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\n", " # R -1 -3\n", " self.conv16 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\n", " self.conv17 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\n", " self.conv18 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\n", " self.conv19 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\n", " self.conv20 = Conv_Bn_Activation(256, 128, 1, 1, 'leaky')\n", "\n", " def forward(self, input, downsample4, downsample3, inference=False):\n", " x1 = self.conv1(input)\n", " x2 = self.conv2(x1)\n", " x3 = self.conv3(x2)\n", " # SPP\n", " m1 = self.maxpool1(x3)\n", " m2 = self.maxpool2(x3)\n", " m3 = self.maxpool3(x3)\n", " spp = torch.cat([m3, m2, m1, x3], dim=1)\n", " # SPP end\n", " x4 = self.conv4(spp)\n", " x5 = self.conv5(x4)\n", " x6 = self.conv6(x5)\n", " x7 = self.conv7(x6)\n", " # UP\n", " up = self.upsample1(x7, downsample4.size(), self.inference)\n", " # R 85\n", " x8 = self.conv8(downsample4)\n", " # R -1 -3\n", " x8 = torch.cat([x8, up], dim=1)\n", "\n", " x9 = self.conv9(x8)\n", " x10 = self.conv10(x9)\n", " x11 = self.conv11(x10)\n", " x12 = self.conv12(x11)\n", " x13 = self.conv13(x12)\n", " x14 = self.conv14(x13)\n", "\n", " # UP\n", " up = self.upsample2(x14, downsample3.size(), self.inference)\n", " # R 54\n", " x15 = self.conv15(downsample3)\n", " # R -1 -3\n", " x15 = torch.cat([x15, up], dim=1)\n", "\n", " x16 = self.conv16(x15)\n", " x17 = self.conv17(x16)\n", " x18 = self.conv18(x17)\n", " x19 = self.conv19(x18)\n", " x20 = self.conv20(x19)\n", " return x20, x13, x6\n", "\n", "\n", "class Yolov4Head(nn.Module):\n", " def __init__(self, output_ch, n_classes, inference=False):\n", " super().__init__()\n", " self.inference = inference\n", "\n", " self.conv1 = Conv_Bn_Activation(128, 256, 3, 1, 'leaky')\n", " self.conv2 = Conv_Bn_Activation(256, output_ch, 1, 1, 'linear', bn=False, bias=True)\n", "\n", " self.yolo1 = YoloLayer(\n", " anchor_mask=[0, 1, 2], num_classes=n_classes,\n", " anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\n", " num_anchors=9, stride=8)\n", "\n", " # R -4\n", " self.conv3 = Conv_Bn_Activation(128, 256, 3, 2, 'leaky')\n", "\n", " # R -1 -16\n", " self.conv4 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv5 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\n", " self.conv6 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv7 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\n", " self.conv8 = Conv_Bn_Activation(512, 256, 1, 1, 'leaky')\n", " self.conv9 = Conv_Bn_Activation(256, 512, 3, 1, 'leaky')\n", " self.conv10 = Conv_Bn_Activation(512, output_ch, 1, 1, 'linear', bn=False, bias=True)\n", " \n", " self.yolo2 = YoloLayer(\n", " anchor_mask=[3, 4, 5], num_classes=n_classes,\n", " anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\n", " num_anchors=9, stride=16)\n", "\n", " # R -4\n", " self.conv11 = Conv_Bn_Activation(256, 512, 3, 2, 'leaky')\n", "\n", " # R -1 -37\n", " self.conv12 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " self.conv13 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\n", " self.conv14 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " self.conv15 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\n", " self.conv16 = Conv_Bn_Activation(1024, 512, 1, 1, 'leaky')\n", " self.conv17 = Conv_Bn_Activation(512, 1024, 3, 1, 'leaky')\n", " self.conv18 = Conv_Bn_Activation(1024, output_ch, 1, 1, 'linear', bn=False, bias=True)\n", " \n", " self.yolo3 = YoloLayer(\n", " anchor_mask=[6, 7, 8], num_classes=n_classes,\n", " anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],\n", " num_anchors=9, stride=32)\n", "\n", " def forward(self, input1, input2, input3):\n", " x1 = self.conv1(input1)\n", " x2 = self.conv2(x1)\n", "\n", " x3 = self.conv3(input1)\n", " # R -1 -16\n", " x3 = torch.cat([x3, input2], dim=1)\n", " x4 = self.conv4(x3)\n", " x5 = self.conv5(x4)\n", " x6 = self.conv6(x5)\n", " x7 = self.conv7(x6)\n", " x8 = self.conv8(x7)\n", " x9 = self.conv9(x8)\n", " x10 = self.conv10(x9)\n", "\n", " # R -4\n", " x11 = self.conv11(x8)\n", " # R -1 -37\n", " x11 = torch.cat([x11, input3], dim=1)\n", "\n", " x12 = self.conv12(x11)\n", " x13 = self.conv13(x12)\n", " x14 = self.conv14(x13)\n", " x15 = self.conv15(x14)\n", " x16 = self.conv16(x15)\n", " x17 = self.conv17(x16)\n", " x18 = self.conv18(x17)\n", " \n", " if self.inference:\n", " y1 = self.yolo1(x2)\n", " y2 = self.yolo2(x10)\n", " y3 = self.yolo3(x18)\n", "\n", " return get_region_boxes([y1, y2, y3])\n", " \n", " else:\n", " return [x2, x10, x18]\n", "\n", "\n", "class Yolov4(nn.Module):\n", " def __init__(self, yolov4conv137weight=None, n_classes=80, inference=False):\n", " super().__init__()\n", "\n", " output_ch = (4 + 1 + n_classes) * 3\n", "\n", " # backbone\n", " self.down1 = DownSample1()\n", " self.down2 = DownSample2()\n", " self.down3 = DownSample3()\n", " self.down4 = DownSample4()\n", " self.down5 = DownSample5()\n", " # neck\n", " self.neek = Neck(inference)\n", " # yolov4conv137\n", " if yolov4conv137weight:\n", " _model = nn.Sequential(self.down1, self.down2, self.down3, self.down4, self.down5, self.neek)\n", " pretrained_dict = torch.load(yolov4conv137weight)\n", "\n", " model_dict = _model.state_dict()\n", " # 1. filter out unnecessary keys\n", " pretrained_dict = {k1: v for (k, v), k1 in zip(pretrained_dict.items(), model_dict)}\n", " # 2. overwrite entries in the existing state dict\n", " model_dict.update(pretrained_dict)\n", " _model.load_state_dict(model_dict)\n", " \n", " # head\n", " self.head = Yolov4Head(output_ch, n_classes, inference)\n", "\n", "\n", " def forward(self, input):\n", " d1 = self.down1(input)\n", " d2 = self.down2(d1)\n", " d3 = self.down3(d2)\n", " d4 = self.down4(d3)\n", " d5 = self.down5(d4)\n", "\n", " x20, x13, x6 = self.neek(d5, d4, d3)\n", "\n", " output = self.head(x20, x13, x6)\n", " return output\n", "\n", "\n", "def yolo_forward_dynamic(output, conf_thresh, num_classes, anchors, num_anchors, scale_x_y, only_objectness=1,\n", " validation=False):\n", " # Output would be invalid if it does not satisfy this assert\n", " # assert (output.size(1) == (5 + num_classes) * num_anchors)\n", "\n", " # print(output.size())\n", "\n", " # Slice the second dimension (channel) of output into:\n", " # [ 2, 2, 1, num_classes, 2, 2, 1, num_classes, 2, 2, 1, num_classes ]\n", " # And then into\n", " # bxy = [ 6 ] bwh = [ 6 ] det_conf = [ 3 ] cls_conf = [ num_classes * 3 ]\n", " # batch = output.size(0)\n", " # H = output.size(2)\n", " # W = output.size(3)\n", "\n", " bxy_list = []\n", " bwh_list = []\n", " det_confs_list = []\n", " cls_confs_list = []\n", "\n", " for i in range(num_anchors):\n", " begin = i * (5 + num_classes)\n", " end = (i + 1) * (5 + num_classes)\n", " \n", " bxy_list.append(output[:, begin : begin + 2])\n", " bwh_list.append(output[:, begin + 2 : begin + 4])\n", " det_confs_list.append(output[:, begin + 4 : begin + 5])\n", " cls_confs_list.append(output[:, begin + 5 : end])\n", "\n", " # Shape: [batch, num_anchors * 2, H, W]\n", " bxy = torch.cat(bxy_list, dim=1)\n", " # Shape: [batch, num_anchors * 2, H, W]\n", " bwh = torch.cat(bwh_list, dim=1)\n", "\n", " # Shape: [batch, num_anchors, H, W]\n", " det_confs = torch.cat(det_confs_list, dim=1)\n", " # Shape: [batch, num_anchors * H * W]\n", " det_confs = det_confs.view(output.size(0), num_anchors * output.size(2) * output.size(3))\n", "\n", " # Shape: [batch, num_anchors * num_classes, H, W]\n", " cls_confs = torch.cat(cls_confs_list, dim=1)\n", " # Shape: [batch, num_anchors, num_classes, H * W]\n", " cls_confs = cls_confs.view(output.size(0), num_anchors, num_classes, output.size(2) * output.size(3))\n", " # Shape: [batch, num_anchors, num_classes, H * W] --> [batch, num_anchors * H * W, num_classes] \n", " cls_confs = cls_confs.permute(0, 1, 3, 2).reshape(output.size(0), num_anchors * output.size(2) * output.size(3), num_classes)\n", "\n", " # Apply sigmoid(), exp() and softmax() to slices\n", " #\n", " bxy = torch.sigmoid(bxy) * scale_x_y - 0.5 * (scale_x_y - 1)\n", " bwh = torch.exp(bwh)\n", " det_confs = torch.sigmoid(det_confs)\n", " cls_confs = torch.sigmoid(cls_confs)\n", "\n", " # Prepare C-x, C-y, P-w, P-h (None of them are torch related)\n", " grid_x = np.expand_dims(np.expand_dims(np.expand_dims(np.linspace(0, output.size(3) - 1, output.size(3)), axis=0).repeat(output.size(2), 0), axis=0), axis=0)\n", " grid_y = np.expand_dims(np.expand_dims(np.expand_dims(np.linspace(0, output.size(2) - 1, output.size(2)), axis=1).repeat(output.size(3), 1), axis=0), axis=0)\n", " # grid_x = torch.linspace(0, W - 1, W).reshape(1, 1, 1, W).repeat(1, 1, H, 1)\n", " # grid_y = torch.linspace(0, H - 1, H).reshape(1, 1, H, 1).repeat(1, 1, 1, W)\n", "\n", " anchor_w = []\n", " anchor_h = []\n", " for i in range(num_anchors):\n", " anchor_w.append(anchors[i * 2])\n", " anchor_h.append(anchors[i * 2 + 1])\n", "\n", " device = None\n", " cuda_check = output.is_cuda\n", " if cuda_check:\n", " device = output.get_device()\n", "\n", " bx_list = []\n", " by_list = []\n", " bw_list = []\n", " bh_list = []\n", "\n", " # Apply C-x, C-y, P-w, P-h\n", " for i in range(num_anchors):\n", " ii = i * 2\n", " # Shape: [batch, 1, H, W]\n", " bx = bxy[:, ii : ii + 1] + torch.tensor(grid_x, device=device, dtype=torch.float32) # grid_x.to(device=device, dtype=torch.float32)\n", " # Shape: [batch, 1, H, W]\n", " by = bxy[:, ii + 1 : ii + 2] + torch.tensor(grid_y, device=device, dtype=torch.float32) # grid_y.to(device=device, dtype=torch.float32)\n", " # Shape: [batch, 1, H, W]\n", " bw = bwh[:, ii : ii + 1] * anchor_w[i]\n", " # Shape: [batch, 1, H, W]\n", " bh = bwh[:, ii + 1 : ii + 2] * anchor_h[i]\n", "\n", " bx_list.append(bx)\n", " by_list.append(by)\n", " bw_list.append(bw)\n", " bh_list.append(bh)\n", "\n", "\n", " ########################################\n", " # Figure out bboxes from slices #\n", " ########################################\n", " \n", " # Shape: [batch, num_anchors, H, W]\n", " bx = torch.cat(bx_list, dim=1)\n", " # Shape: [batch, num_anchors, H, W]\n", " by = torch.cat(by_list, dim=1)\n", " # Shape: [batch, num_anchors, H, W]\n", " bw = torch.cat(bw_list, dim=1)\n", " # Shape: [batch, num_anchors, H, W]\n", " bh = torch.cat(bh_list, dim=1)\n", "\n", " # Shape: [batch, 2 * num_anchors, H, W]\n", " bx_bw = torch.cat((bx, bw), dim=1)\n", " # Shape: [batch, 2 * num_anchors, H, W]\n", " by_bh = torch.cat((by, bh), dim=1)\n", "\n", " # normalize coordinates to [0, 1]\n", " bx_bw /= output.size(3)\n", " by_bh /= output.size(2)\n", "\n", " # Shape: [batch, num_anchors * H * W, 1]\n", " bx = bx_bw[:, :num_anchors].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\n", " by = by_bh[:, :num_anchors].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\n", " bw = bx_bw[:, num_anchors:].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\n", " bh = by_bh[:, num_anchors:].view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\n", "\n", " bx1 = bx - bw * 0.5\n", " by1 = by - bh * 0.5\n", " bx2 = bx1 + bw\n", " by2 = by1 + bh\n", "\n", " # Shape: [batch, num_anchors * h * w, 4] -> [batch, num_anchors * h * w, 1, 4]\n", " boxes = torch.cat((bx1, by1, bx2, by2), dim=2).view(output.size(0), num_anchors * output.size(2) * output.size(3), 1, 4)\n", " # boxes = boxes.repeat(1, 1, num_classes, 1)\n", "\n", " # boxes: [batch, num_anchors * H * W, 1, 4]\n", " # cls_confs: [batch, num_anchors * H * W, num_classes]\n", " # det_confs: [batch, num_anchors * H * W]\n", "\n", " det_confs = det_confs.view(output.size(0), num_anchors * output.size(2) * output.size(3), 1)\n", " confs = cls_confs * det_confs\n", "\n", " # boxes: [batch, num_anchors * H * W, 1, 4]\n", " # confs: [batch, num_anchors * H * W, num_classes]\n", "\n", " return boxes, confs\n", "\n", "class YoloLayer(nn.Module):\n", " \"\"\"\n", " Yolo layer\n", " model_out: while inference,is post-processing inside or outside the model\n", " true:outside\n", " \"\"\"\n", " def __init__(self, anchor_mask=[], num_classes=0, anchors=[], num_anchors=1, stride=32, model_out=False):\n", " super(YoloLayer, self).__init__()\n", " self.anchor_mask = anchor_mask\n", " self.num_classes = num_classes\n", " self.anchors = anchors\n", " self.num_anchors = num_anchors\n", " self.anchor_step = len(anchors) // num_anchors\n", " self.coord_scale = 1\n", " self.noobject_scale = 1\n", " self.object_scale = 5\n", " self.class_scale = 1\n", " self.thresh = 0.6\n", " self.stride = stride\n", " self.seen = 0\n", " self.scale_x_y = 1\n", "\n", " self.model_out = model_out\n", "\n", " def forward(self, output, target=None):\n", " if self.training:\n", " return output\n", " masked_anchors = []\n", " for m in self.anchor_mask:\n", " masked_anchors += self.anchors[m * self.anchor_step:(m + 1) * self.anchor_step]\n", " masked_anchors = [anchor / self.stride for anchor in masked_anchors]\n", "\n", " return yolo_forward_dynamic(output, self.thresh, self.num_classes, masked_anchors, len(self.anchor_mask),scale_x_y=self.scale_x_y)\n", "\n", "\n", "def get_region_boxes(boxes_and_confs):\n", "\n", " # print('Getting boxes from boxes and confs ...')\n", "\n", " boxes_list = []\n", " confs_list = []\n", "\n", " for item in boxes_and_confs:\n", " boxes_list.append(item[0])\n", " confs_list.append(item[1])\n", "\n", " # boxes: [batch, num1 + num2 + num3, 1, 4]\n", " # confs: [batch, num1 + num2 + num3, num_classes]\n", " boxes = torch.cat(boxes_list, dim=1)\n", " confs = torch.cat(confs_list, dim=1)\n", " \n", " return boxes, confs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Download the COCO 2017 evaluation dataset and define the data loader function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!curl -LO http://images.cocodataset.org/zips/val2017.zip\n", "!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", "!unzip -q val2017.zip\n", "!unzip annotations_trainval2017.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define data loader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import time\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "import torchvision.datasets as dset\n", "from pycocotools.coco import COCO\n", "\n", "\n", "def get_image_filenames(root=os.getcwd()):\n", " \"\"\"\n", " Generate paths to the coco dataset image files.\n", " \n", " Args:\n", " root (str): The root folder contains.\n", " \n", " Yields:\n", " filename (str): The path to an image file.\n", " \"\"\"\n", " image_path = os.path.join(root, 'val2017')\n", " for root, dirs, files in os.walk(image_path):\n", " for filename in files:\n", " yield os.path.join(image_path, filename)\n", "\n", " \n", "def get_coco_dataloader(coco2017_root, transform, subset_indices=None):\n", " \"\"\"\n", " Create the dataset loader and ground truth coco dataset.\n", " \n", " Arguments:\n", " coco2017_root (str): The root directory to load the data/labels from.\n", " transform (torchvision.Transform): A transform to apply to the images.\n", " subset_indices (list): Indices used to create a subset of the dataset.\n", "\n", " Returns: \n", " loader (iterable): Produces transformed images and labels.\n", " cocoGt (pycocotools.coco.COCO): Contains the ground truth in coco \n", " format.\n", " label_info (dict): A mapping from label id to the human-readable name.\n", " \"\"\"\n", "\n", " # Create the dataset\n", " coco2017_img_path = os.path.join(coco2017_root, 'val2017')\n", " coco2017_ann_path = os.path.join(\n", " coco2017_root, 'annotations/instances_val2017.json')\n", "\n", " # check the number of images in val2017 - Should be 5000\n", " num_files = len(list(get_image_filenames(coco2017_root)))\n", " print('\\nNumber of images in val2017 = {}\\n'.format(num_files))\n", "\n", " # load annotations to decode classification results\n", " with open(coco2017_ann_path) as f:\n", " annotate_json = json.load(f)\n", " label_info = {label[\"id\"]: label[\"name\"]\n", " for label in annotate_json['categories']}\n", "\n", " # initialize COCO ground truth dataset\n", " cocoGt = COCO(coco2017_ann_path)\n", "\n", " # create the dataset using torchvision's coco detection dataset\n", " coco_val_data = dset.CocoDetection(\n", " root=coco2017_img_path, \n", " annFile=coco2017_ann_path, \n", " transform=transform\n", " )\n", "\n", " if subset_indices is not None:\n", " # Create a smaller subset of the data for testing - e.g. to pinpoint error at image 516\n", " coco_val_data = torch.utils.data.Subset(coco_val_data, subset_indices)\n", "\n", " # create the dataloader using torch dataloader\n", " loader = torch.utils.data.DataLoader(coco_val_data, batch_size=1, shuffle=False)\n", "\n", " return loader, cocoGt, label_info\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load dataset\n", "Here 2 dataset loaders are created and the resulting data is displayed\n", "- `orig_coco_val_data_loader`: Contains the original unmodified image\n", "- `coco_val_data_loader`: Contains images of a standardized size of 608x608 pixels " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coco2017_root = './'\n", "orig_coco_val_data_loader, *_ = get_coco_dataloader(coco2017_root, transforms.ToTensor())\n", "transform = transforms.Compose([transforms.Resize([608, 608]), transforms.ToTensor()])\n", "coco_val_data_loader, cocoGt, label_info = get_coco_dataloader(coco2017_root, transform)\n", "image_orig, _ = next(iter(orig_coco_val_data_loader))\n", "print(image_orig.shape)\n", "image, image_info = next(iter(coco_val_data_loader))\n", "image_id = image_info[0][\"image_id\"].item()\n", "print(image.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define some helper functions for deployment (inference)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def postprocess(boxes, scores, score_threshold=0.05, iou_threshold=0.5):\n", " \"\"\"\n", " Classifies and filters bounding boxes from Yolo V4 output.\n", " \n", " Performs classification, filtering, and non-maximum suppression to remove\n", " boxes that are irrelevant. The result is the filtered set of boxes, the \n", " associated label confidence score, and the predicted label.\n", " \n", " See: https://pytorch.org/docs/stable/torchvision/ops.html#torchvision.ops.nms\n", " \n", " Args:\n", " boxes (torch.Tensor): The Yolo V4 bounding boxes.\n", " scores (torch.Tensor): The categories scores for each box.\n", " score_threshold (float): Ignore boxes with scores below threshold.\n", " iou_threshold (float): Discards boxes with intersection above threshold. \n", " \n", " Returns:\n", " boxes (torch.Tensor): The filtered Yolo V4 bounding boxes.\n", " scores (torch.Tensor): The label score for each box.\n", " labels (torch.Tensor): The label for each box.\n", " \"\"\"\n", " \n", " # shape: [n_batch, n_boxes, 1, 4] => [n_boxes, 4] # Assumes n_batch size is 1\n", " boxes = boxes.squeeze()\n", "\n", " # shape: [n_batch, n_boxes, 80] => [n_boxes, 80] # Assumes n_batch size is 1\n", " scores = scores.squeeze()\n", "\n", " # Classify each box according to the maximum category score\n", " score, column = torch.max(scores, dim=1)\n", "\n", " # Filter out rows for scores which are below threshold\n", " mask = score > score_threshold\n", "\n", " # Filter model output data\n", " boxes = boxes[mask]\n", " score = score[mask]\n", " idxs = column[mask]\n", "\n", " # Perform non-max suppression on all categories at once. shape: [n_keep,]\n", " keep = torchvision.ops.batched_nms(\n", " boxes=boxes, \n", " scores=score, \n", " idxs=idxs,\n", " iou_threshold=iou_threshold,\n", " )\n", "\n", " # The image category id associated with each column\n", " categories = torch.tensor([\n", " 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16,\n", " 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31,\n", " 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,\n", " 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,\n", " 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72,\n", " 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85,\n", " 86, 87, 88, 89, 90\n", " ])\n", " \n", " boxes = boxes[keep] # shape: [n_keep, 4]\n", " score = score[keep] # shape: [n_keep,]\n", " idxs = idxs[keep]\n", " label = categories[idxs] # shape: [n_keep,]\n", " \n", " return boxes, score, label\n", "\n", "\n", "def get_results_as_dict(boxes, scores, labels, image_orig):\n", " \"\"\"\n", " Transforms post-processed output into dictionary output.\n", " \n", " This translates the model coordinate bounding boxes (x1, y1, x2, y2) \n", " into a rectangular description (x, y, width, height) scaled to the \n", " original image size.\n", " \n", " Args:\n", " boxes (torch.Tensor): The Yolo V4 bounding boxes.\n", " scores (torch.Tensor): The label score for each box.\n", " labels (torch.Tensor): The label for each box.\n", " image_orig (torch.Tensor): The image to scale the bounding boxes to.\n", " \n", " Returns:\n", " output (dict): The dictionary of rectangle bounding boxes.\n", " \"\"\"\n", " h_size, w_size = image_orig.shape[-2:]\n", "\n", " x1 = boxes[:, 0] * w_size\n", " y1 = boxes[:, 1] * h_size\n", " x2 = boxes[:, 2] * w_size\n", " y2 = boxes[:, 3] * h_size\n", "\n", " width = x2 - x1\n", " height = y2 - y1\n", "\n", " boxes = torch.stack([x1, y1, width, height]).T\n", " return {\n", " 'boxes': boxes.detach().numpy(),\n", " 'labels': labels.detach().numpy(),\n", " 'scores': scores.detach().numpy(),\n", " }\n", "\n", "\n", "def prepare_for_coco_detection(predictions):\n", " \"\"\"\n", " Convert dictionary model predictions into an expected COCO dataset format.\n", " \n", " Args:\n", " predictions (dict): The list of box coordinates, scores, and labels.\n", " \n", " Returns:\n", " output (list[dict]): The list of bounding boxes.\n", " \"\"\"\n", " coco_results = []\n", " for original_id, prediction in predictions.items():\n", " if len(prediction) == 0:\n", " continue\n", "\n", " boxes = prediction[\"boxes\"].tolist()\n", " scores = prediction[\"scores\"].tolist()\n", " labels = prediction[\"labels\"].tolist()\n", "\n", " coco_results.extend(\n", " [\n", " {\n", " \"image_id\": original_id,\n", " \"category_id\": labels[k],\n", " \"bbox\": box,\n", " \"score\": scores[k],\n", " }\n", " for k, box in enumerate(boxes)\n", " ]\n", " )\n", " return coco_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download pretrained checkpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "def download_file_from_google_drive(id, destination):\n", " response = requests.post('https://drive.google.com/uc?id='+id+'&confirm=t')\n", " save_response_content(response, destination)\n", "\n", "def save_response_content(response, destination):\n", " CHUNK_SIZE = 32768\n", " with open(destination, \"wb\") as f:\n", " for chunk in response.iter_content(CHUNK_SIZE):\n", " if chunk: # filter out keep-alive new chunks\n", " f.write(chunk)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "download_file_from_google_drive('1wv_LiFeCRYwtpkqREPeI13-gPELBDwuJ', './yolo_v4.pth')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Build, Compile, and Save Neuron-Optimized YOLO v4 TorchScript\n", "### Construct model and load pretrained checkpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model = Yolov4(yolov4conv137weight=None, n_classes=80, inference=True)\n", "weightfile = \"./yolo_v4.pth\"\n", "pretrained_dict = torch.load(weightfile, map_location=torch.device('cpu'))\n", "model.load_state_dict(pretrained_dict)\n", "model.eval()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Execute inference for a single image and display output" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "\n", "image_orig, _ = next(iter(orig_coco_val_data_loader))\n", "image, _ = next(iter(coco_val_data_loader))\n", "boxes, scores = model(image)\n", "boxes, scores, labels = postprocess(boxes, scores)\n", "result_dict = get_results_as_dict(boxes, scores, labels, image_orig)\n", "\n", "fig, ax = plt.subplots(figsize=(10, 10))\n", "ax.imshow(image_orig.numpy().squeeze(0).transpose(1, 2, 0))\n", "for xywh, _ in zip(result_dict['boxes'], result_dict['labels']):\n", " x, y, w, h = xywh\n", " rect = patches.Rectangle((x, y), w, h, linewidth=1, edgecolor='g', facecolor='none')\n", " ax.add_patch(rect)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Run compilation with manually specified device placement\n", "\n", "First, inspect the model without running compilation by adding the `skip_compiler=True` argument to the `torch.neuron.trace` call." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "model_neuron_for_inspection = torch.neuron.trace(model, image, skip_compiler=True)\n", "print(model_neuron_for_inspection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspecting the model, we discover that there are many `aten::slice` operations in some submodules called `YoloLayer`. Although these operations are supported by the neuron-cc compiler, they are not going to run efficiently on the Inferentia hardware. To work it around, we recommend to manually place these operators on CPU." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To manually place `YoloLayer` on CPU, we may make use of the `subgraph_builder_function` argument in `torch.neuron.trace`. It is a callback function that returns `True` or `False` based on information available in `node`. The typical use is a condition based on either `node.name` or `node.type_string`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def subgraph_builder_function(node):\n", " return 'YoloLayer' not in node.name\n", "\n", "model_neuron = torch.neuron.trace(model, image, subgraph_builder_function=subgraph_builder_function)\n", "model_neuron.save('yolo_v4_neuron.pt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compilation is now finished and the compiled model has been saved to a local file called 'yolo_v4_neuron.pt'. Saving is important due to the slow compilation process." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 4: Evaluate Accuracy on the COCO 2017 Dataset\n", "### Load compiled model and run inference\n", "To validate accuracy of the compiled model, lets run inference on the COCO 2017 validation dataset. We start by defining a helper function `run_inference`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def run_inference(dataloader, dataloader_orig, model, convert=True, modelName=''):\n", " \"\"\"\n", " Run Yolo V4 inference on the COCO dataset.\n", " \n", " Args:\n", " dataloader (iterable): Data loader of input processed images and labels.\n", " dataloader_orig (iterable): Data loader with original images.\n", " model (torch.nn.Module): The torch model to run inference against.\n", " convert (bool): Set to False when using a vanilla torchvision model that \n", " does not need to be transformed into coco format.\n", " \n", " Returns: \n", " imgIds (list): The list of images with predictions.\n", " cocoDt (pycocotools.coco.COCO): Contains the predictions from the model \n", " in coco format.\n", " \"\"\"\n", " print('\\n================ Starting Inference on {} Images using {} model ================\\n'.format(\n", " len(dataloader), modelName))\n", "\n", " modelName = str(modelName).replace(\" \", \"_\")\n", "\n", " # convert predicition to cocoDt\n", " # code from def evaluate in https://github.com/pytorch/vision/blob/master/references/detection/engine.py\n", " imgIds = []\n", " results = []\n", " skippedImages = []\n", "\n", " # time inference\n", " inference_time = 0.0\n", " for idx, ((image, targets), (image_orig, _)) in enumerate(zip(dataloader, dataloader_orig)):\n", " # if target is empty, skip the image because it breaks the scripted model\n", " if not targets:\n", " skippedImages.append(idx)\n", " continue\n", "\n", " # get the predictions\n", " start_time = time.time()\n", " boxes, scores = model(image)\n", " delta = time.time() - start_time\n", " inference_time += delta\n", " boxes, scores, labels = postprocess(boxes, scores)\n", " outputs = get_results_as_dict(boxes, scores, labels, image_orig)\n", "\n", " res = {target[\"image_id\"].item(): output for target,\n", " output in zip(targets, [outputs])}\n", "\n", " # add the image id to imgIds\n", " image_id = targets[0][\"image_id\"].item()\n", " imgIds.append(image_id)\n", "\n", " # convert the predicition into cocoDt results\n", " pred = prepare_for_coco_detection(res)\n", " results.extend(pred)\n", "\n", " print('\\n==================== Performance Measurement ====================')\n", " print('Finished inference on {} images in {:.2f} seconds'.format(\n", " len(dataloader), inference_time))\n", " print('=================================================================\\n')\n", "\n", " # create bbox detections file\n", " # following code in https://github.com/aws/aws-neuron-sdk/blob/master/src/examples/tensorflow/yolo_v4_demo/evaluate.ipynb\n", " resultsfile = modelName + '_bbox_detections.json'\n", " print('Generating json file...')\n", " with open(resultsfile, 'w') as f:\n", " json.dump(results, f)\n", "\n", " # return COCO api object with loadRes\n", " cocoDt = cocoGt.loadRes(resultsfile)\n", "\n", " return imgIds, cocoDt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next step is to simply load the compiled model from disk and then run inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_neuron = torch.jit.load('yolo_v4_neuron.pt')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imgIds, cocoDt = run_inference(coco_val_data_loader, orig_coco_val_data_loader, model_neuron)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then use the standard `pycocotools` routines to generate a report of bounding box precision/recall." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pycocotools.cocoeval import COCOeval\n", "\n", "cocoEval = COCOeval(cocoGt, cocoDt, 'bbox')\n", "cocoEval.params.imgIds = imgIds\n", "cocoEval.evaluate()\n", "cocoEval.accumulate()\n", "cocoEval.summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For reference, we may perform the same evaluation on the CPU model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imgIdsRef, cocoDtRef = run_inference(coco_val_data_loader, orig_coco_val_data_loader, model)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cocoEval = COCOeval(cocoGt, cocoDtRef, 'bbox')\n", "cocoEval.params.imgIds = imgIdsRef\n", "cocoEval.evaluate()\n", "cocoEval.accumulate()\n", "cocoEval.summarize()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 5: Benchmark COCO Dataset Performance of the Neuron-Optimized TorchScript\n", "The following code snippet sets up data parallel on 16 NeuronCores and runs saturated multi-threaded inference on the Inferentia accelerator. Note that the number of cores (`n_cores`) should be set to the number of available NeuronCores on the current instance." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.neuron\n", "import torchvision\n", "import torchvision.transforms as transforms\n", "import torchvision.datasets as dset\n", "import multiprocessing as mp\n", "from concurrent.futures import ThreadPoolExecutor\n", "import PIL\n", "import os\n", "import time\n", "\n", "n_threads = 16\n", "\n", "def get_image_filenames(root=os.getcwd()):\n", " \"\"\"\n", " Generate paths to the coco dataset image files.\n", " \n", " Args:\n", " root (str): The root folder contains.\n", " \n", " Yields:\n", " filename (str): The path to an image file.\n", " \"\"\"\n", " image_path = os.path.join(root, 'val2017')\n", " for root, dirs, files in os.walk(image_path):\n", " for filename in files:\n", " yield os.path.join(image_path, filename)\n", "\n", "def preprocess(path):\n", " \"\"\"\n", " Load an image and convert to the expected Yolo V4 tensor format.\n", " \n", " Args:\n", " path (str): The image file to load from disk. \n", " \n", " Returns:\n", " result (torch.Tensor): The image for prediction. Shape: [1, 3, 608, 608]\n", " \"\"\"\n", " image = PIL.Image.open(path).convert('RGB')\n", " resized = torchvision.transforms.functional.resize(image, [608, 608])\n", " tensor = torchvision.transforms.functional.to_tensor(resized)\n", " return tensor.unsqueeze(0).to(torch.float32)\n", "\n", "\n", "def load_model(filename='yolo_v4_neuron.pt'):\n", " \"\"\"\n", " Load and pre-warm the Yolo V4 model.\n", " \n", " Args:\n", " filename (str): The location to load the model from.\n", " \n", " Returns:\n", " model (torch.nn.Module): The torch model.\n", " \"\"\"\n", " \n", " # Load model from disk\n", " model = torch.jit.load(filename)\n", "\n", " # Warm up model on neuron by running a single example image\n", " filename = next(iter(get_image_filenames()))\n", " image = preprocess(filename)\n", " model(image)\n", "\n", " return model\n", "\n", "\n", "def task(model, filename):\n", " \"\"\"\n", " The thread task to perform prediction.\n", " \n", " This does the full end-to-end processing of an image from loading from disk\n", " all the way to classifying and filtering bounding boxes.\n", " \n", " Args:\n", " model (torch.nn.Module): The model to run processing with\n", " filename (str): The image file to load from disk. \n", " \n", " Returns:\n", " boxes (torch.Tensor): The Yolo V4 bounding boxes.\n", " scores (torch.Tensor): The label score for each box.\n", " labels (torch.Tensor): The label for each box. \n", " \"\"\"\n", " image = preprocess(filename)\n", " begin = time.time()\n", " boxes, scores = model(image)\n", " delta = time.time() - begin\n", " return postprocess(boxes, scores), delta\n", "\n", "\n", "def benchmark():\n", " \"\"\"\n", " Run a benchmark on the entire COCO dataset against the neuron model.\n", " \"\"\"\n", " \n", " # Load a model into each NeuronCore\n", " models = [load_model() for _ in range(n_cores)]\n", " \n", " # Create input/output lists\n", " filenames = list(get_image_filenames())\n", " results = list()\n", " latency = list()\n", " \n", " # We want to keep track of average completion time per thread\n", " sum_time = 0.0\n", " \n", " # Submit all tasks and wait for them to finish\n", " with ThreadPoolExecutor(n_threads) as pool:\n", " for i, filename in enumerate(filenames):\n", " result = pool.submit(task, models[i % len(models)], filename)\n", " results.append(result)\n", " for result in results:\n", " results, times = result.result() # Note: Outputs unused for benchmark\n", " latency.append(times)\n", " sum_time += times\n", " \n", " print('Duration: ', sum_time / n_threads)\n", " print('Images Per Second:', len(filenames) / (sum_time / n_threads))\n", " print(\"Latency P50: {:.1f}\".format(np.percentile(latency[1000:], 50)*1000.0))\n", " print(\"Latency P90: {:.1f}\".format(np.percentile(latency[1000:], 90)*1000.0))\n", " print(\"Latency P95: {:.1f}\".format(np.percentile(latency[1000:], 95)*1000.0))\n", " print(\"Latency P99: {:.1f}\".format(np.percentile(latency[1000:], 99)*1000.0))\n", "\n", "benchmark()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/tensorflow/bert_demo/LICENSE ================================================ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: src/examples/tensorflow/bert_demo/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/tensorflow/bert_demo/bert_client.py ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ import sys import os import argparse import random import time import grpc import mrpc_pb2 sys.path.append(os.path.dirname(__file__)) import mrpc_pb2_grpc import mrpc_feature latencies = [] def client(): parser = argparse.ArgumentParser() parser.add_argument('--port', default=60061, help='gRPC port') parser.add_argument('--pair', default=None, help='Text pair') parser.add_argument('--cycle', type=int, default=1, help='Number of inference cycles') parser.add_argument('--save-accuracy', default=None, help='Save accuracy to file') args = parser.parse_args() text_pair = mrpc_pb2.TextPair() if args.pair is not None: text_a, text_b = args.pair text_pair.text_a = text_a.encode() text_pair.text_b = text_b.encode() else: eval_data_path = os.path.join(os.path.dirname(__file__), 'glue_mrpc_dev.tsv') tsv = mrpc_feature.read_tsv(eval_data_path) with grpc.insecure_channel('127.0.0.1:{}'.format(args.port)) as channel: stub = mrpc_pb2_grpc.mrpcStub(channel) num_correct = 0 very_start = time.time() for _ in range(args.cycle): if args.pair is None: data = random.choice(tsv[1:]) text_pair.text_a = data[3].encode() text_pair.text_b = data[4].encode() start = time.time() yes_no = stub.paraphrase(text_pair) elapsed = time.time() - start if data is None: evaluation = '' else: if yes_no.prediction.decode() == data[0]: num_correct += 1 evaluation = 'correct, ' if yes_no.prediction.decode() == data[0] else 'incorrect, ' print('{} ({}latency {} s)'.format(yes_no.message.decode(), evaluation, elapsed)) latencies.append(elapsed) if args.cycle > 1: accuracy = num_correct / args.cycle print('took {} s for {} cycles, accuracy {}'.format(time.time() - very_start, args.cycle, accuracy)) if args.save_accuracy is not None: with open(args.save_accuracy, 'w') as f: f.write(str(accuracy)) def write_latencies(): with open('latencies.txt', 'a') as f: for l in latencies: f.write(str(l) + '\n') if __name__ == '__main__': client() write_latencies() ================================================ FILE: src/examples/tensorflow/bert_demo/bert_model.py ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ import os import argparse import shlex import numpy as np import tensorflow as tf from tensorflow.neuron import fuse from tensorflow.core.framework import attr_value_pb2 def main(): parser = argparse.ArgumentParser() parser.add_argument('--input_saved_model', required=True, help='Original SaveModel') parser.add_argument('--output_saved_model', required=True, help='Output SavedModel that runs on Inferentia') parser.add_argument('--dtype', default='float16', help='Data type for weights') parser.add_argument('--batch_size', type=int, default=4) parser.add_argument('--sequence_length', type=int, default=128) parser.add_argument('--crude_gelu', action='store_true') parser.add_argument('--aggressive_optimizations', action='store_true') args = parser.parse_args() if os.path.exists(args.output_saved_model): raise OSError('output_saved_model {} already exists'.format(args.output_saved_model)) dtype = tf.float16 if args.dtype == 'float16' else tf.float32 if args.aggressive_optimizations: args.crude_gelu = True bert = NeuronBERTMRPC( args.input_saved_model, dtype=dtype, batch_size=args.batch_size, seq_len=args.sequence_length, crude_gelu=args.crude_gelu, aggressive_fp16_cast=args.aggressive_optimizations, ) fuser = fuse(compiler_args=['--fp32-cast', 'matmult'], timeout=360000) bert.encoder = fuser(bert.encoder) input_ids = bert.input_ids input_mask = bert.input_mask segment_ids = bert.segment_ids with tf.Session(graph=tf.Graph()) as sess: input_ids_ph_shape = input_ids.shape.as_list() input_ids_ph_shape[0] = None input_ids_ph = tf.placeholder(input_ids.dtype, input_ids_ph_shape, name='input_ids') input_mask_ph_shape = input_mask.shape.as_list() input_mask_ph_shape[0] = None input_mask_ph = tf.placeholder(input_mask.dtype, input_mask_ph_shape, name='input_mask') segment_ids_ph_shape = segment_ids.shape.as_list() segment_ids_ph_shape[0] = None segment_ids_ph = tf.placeholder(segment_ids.dtype, segment_ids_ph_shape, name='segment_ids') dummy_reshapes = [] discard_op_names = set() with tf.name_scope('bert/embeddings'): expand_dims = tf.expand_dims(input_ids_ph, axis=-1) batch_size = tf.shape(input_ids_ph)[0] reshape = tf.reshape(expand_dims, [batch_size * bert.seq_len]) gatherv2 = tf.gather(bert.weights_dict['bert/embeddings/word_embeddings:0'], reshape, axis=0) reshape_1 = tf.reshape(gatherv2, [batch_size, bert.seq_len, bert.hid_size]) reshape_2 = tf.reshape(segment_ids_ph, [batch_size * bert.seq_len]) one_hot = tf.one_hot(reshape_2, depth=2) matmul = tf.matmul(one_hot, bert.weights_dict['bert/embeddings/token_type_embeddings:0']) reshape_3 = tf.reshape(matmul, [batch_size, bert.seq_len, bert.hid_size]) slice0 = tf.slice(bert.weights_dict['bert/embeddings/position_embeddings:0'], begin=[0, 0], size=[bert.seq_len, -1]) add_1 = reshape_1 + reshape_3 + slice0 input_tensor = tf.reshape(add_1, [batch_size, bert.seq_len, bert.hid_size]) with tf.name_scope('bert/encoder'): reshape = tf.reshape(input_mask_ph, [batch_size, 1, 1, bert.seq_len]) bias_tensor = tf.cast(reshape, tf.float32) bias_tensor = 1.0 - bias_tensor bias_tensor = bias_tensor * -10000.0 bias_tensor = tf.cast(bias_tensor, bert.dtype) tensor = bert.layer_norm(input_tensor, 'embeddings', force_float32=True) tensor = tf.reshape(tensor, [bert.batch_size, bert.seq_len, bert.hid_size]) dummy_reshapes.append(tensor) discard_op_names.add(tensor.op.name) bias_tensor = tf.reshape(bias_tensor, [bert.batch_size, 1, 1, bert.seq_len]) dummy_reshapes.append(bias_tensor) discard_op_names.add(bias_tensor.op.name) logits = bert.encoder(tensor, bias_tensor) with tf.name_scope('loss'): if bert.dtype is not tf.float32: logits = tf.cast(logits, tf.float32) probabilities = tf.nn.softmax(logits) for rts in dummy_reshapes: neuron_op = rts.consumers()[0] neuron_op._update_input(list(neuron_op.inputs).index(rts), rts.op.inputs[0]) try: sess.run(probabilities) except: pass graph_def = sess.graph.as_graph_def() new_graph_def = tf.GraphDef() new_graph_def.node.MergeFrom(node for node in graph_def.node if node.name not in discard_op_names) neuron_op_node = [node for node in new_graph_def.node if node.op == 'NeuronOp'][0] neuron_op_node.attr['input_batch_axis'].list.i[:] = [0, 0] neuron_op_node.attr['output_batch_axis'].list.i[:] = [0] with tf.Session(graph=tf.Graph()) as sess: tf.import_graph_def(new_graph_def, name='') inputs = { 'input_ids': sess.graph.get_tensor_by_name(input_ids_ph.name), 'input_mask': sess.graph.get_tensor_by_name(input_mask_ph.name), 'segment_ids': sess.graph.get_tensor_by_name(segment_ids_ph.name), } outputs = { 'probabilities': sess.graph.get_tensor_by_name(probabilities.name) } try: sess.run(probabilities) except: pass neuron_op = [op for op in sess.graph.get_operations() if op.type == 'NeuronOp'][0] if not neuron_op.get_attr('executable'): raise AttributeError('Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly (`pip install neuron-cc` to install neuron-cc).') tf.saved_model.simple_save(sess, args.output_saved_model, inputs, outputs) class NeuronBERTMRPC: def __init__(self, bert_saved_model, dtype=tf.float16, batch_size=4, seq_len=128, crude_gelu=False, aggressive_fp16_cast=False): predictor = tf.contrib.predictor.from_saved_model(bert_saved_model) sess = predictor.session self.input_ids = predictor.feed_tensors['input_ids'] self.input_mask = predictor.feed_tensors['input_mask'] self.segment_ids = predictor.feed_tensors['segment_ids'] weights_dict = {} for op in sess.graph.get_operations(): if op.type == 'Const': tensor = op.outputs[0] weights_dict[tensor.name] = tensor if op.type == 'Identity' and op.name.endswith('read'): tensor = op.outputs[0] weights_dict[tensor.op.inputs[0].name] = tensor self.weights_dict = sess.run(weights_dict) self.dtype = dtype self.batch_size = batch_size self.seq_len = seq_len self.hid_size, self.inter_size = self.weights_dict['bert/encoder/layer_0/intermediate/dense/kernel:0'].shape self.num_heads = sess.graph.get_tensor_by_name('bert/encoder/layer_0/attention/self/Reshape:0').shape.as_list()[2] self.head_size = self.hid_size // self.num_heads self.eps = self.weights_dict['bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add/y:0'] self.crude_gelu = crude_gelu self.layer_norm_dtype = tf.float16 if aggressive_fp16_cast else tf.float32 sess.close() def encoder(self, tensor, bias_tensor): tensor = tf.reshape(tensor, [self.batch_size * self.seq_len, self.hid_size]) for layer_id in range(24): mid_layer_name = 'layer_{}'.format(layer_id) tensor = self.self_attention(tensor, bias_tensor, mid_layer_name) tensor = self.layer_norm(tensor, 'encoder/' + mid_layer_name + '/attention/output') tensor = self.fully_connected(tensor, mid_layer_name) tensor = self.layer_norm(tensor, 'encoder/' + mid_layer_name + '/output') logits = self.pooler_loss(tensor) return logits def fully_connected(self, input_tensor, layer_name): inter_kernel = self.weights_dict['bert/encoder/{}/intermediate/dense/kernel:0'.format(layer_name)] inter_bias = self.weights_dict['bert/encoder/{}/intermediate/dense/bias:0'.format(layer_name)] out_kernel = self.weights_dict['bert/encoder/{}/output/dense/kernel:0'.format(layer_name)] out_bias = self.weights_dict['bert/encoder/{}/output/dense/bias:0'.format(layer_name)] with tf.name_scope('bert/encoder/{}/fully_connected/intermediate/dense'.format(layer_name)): matmul = tf.matmul(input_tensor, inter_kernel.astype(self.dtype.as_numpy_dtype)) bias_add = tf.nn.bias_add(matmul, inter_bias.astype(self.dtype.as_numpy_dtype)) gelu = self.gelu_sigmoid(bias_add) if self.crude_gelu else self.gelu_tanh(bias_add) with tf.name_scope('bert/encoder/{}/fully_connected/output/dense'.format(layer_name)): matmul = tf.matmul(gelu, out_kernel.astype(self.dtype.as_numpy_dtype)) bias_add = tf.nn.bias_add(matmul, out_bias.astype(self.dtype.as_numpy_dtype)) output_tensor = bias_add + input_tensor return output_tensor def self_attention(self, input_tensor, bias_tensor, layer_name): query_kernel = self.weights_dict['bert/encoder/{}/attention/self/query/kernel:0'.format(layer_name)] * 0.125 query_bias = self.weights_dict['bert/encoder/{}/attention/self/query/bias:0'.format(layer_name)] * 0.125 key_kernel = self.weights_dict['bert/encoder/{}/attention/self/key/kernel:0'.format(layer_name)] key_bias = self.weights_dict['bert/encoder/{}/attention/self/key/bias:0'.format(layer_name)] value_kernel = self.weights_dict['bert/encoder/{}/attention/self/value/kernel:0'.format(layer_name)] value_bias = self.weights_dict['bert/encoder/{}/attention/self/value/bias:0'.format(layer_name)] output_kernel = self.weights_dict['bert/encoder/{}/attention/output/dense/kernel:0'.format(layer_name)] output_bias = self.weights_dict['bert/encoder/{}/attention/output/dense/bias:0'.format(layer_name)] with tf.name_scope('bert/encoder/{}/attention/self'.format(layer_name)): matmul = tf.matmul(input_tensor, query_kernel.astype(self.dtype.as_numpy_dtype)) query = tf.nn.bias_add(matmul, query_bias.astype(self.dtype.as_numpy_dtype)) query_r = tf.reshape(query, [self.batch_size, self.seq_len, self.num_heads, self.head_size]) query_rt = tf.transpose(query_r, [0, 2, 1, 3]) matmul = tf.matmul(input_tensor, key_kernel.astype(self.dtype.as_numpy_dtype)) key = tf.nn.bias_add(matmul, key_bias.astype(self.dtype.as_numpy_dtype)) key_r = tf.reshape(key, [self.batch_size, self.seq_len, self.num_heads, self.head_size]) key_rt = tf.transpose(key_r, [0, 2, 1, 3]) # [b, n, l, h] query_key = tf.matmul(query_rt, key_rt, transpose_b=True) # [b, n, lq, h] @ [b, n, lk, h] -> [b, n, lq, lk] bias_query_key = tf.add(query_key, bias_tensor) softmax_weights = tf.nn.softmax(bias_query_key) matmul = tf.matmul(input_tensor, value_kernel.astype(self.dtype.as_numpy_dtype)) value = tf.nn.bias_add(matmul, value_bias.astype(self.dtype.as_numpy_dtype)) value_r = tf.reshape(value, [self.batch_size, self.seq_len, self.num_heads, self.head_size]) value_rt = tf.transpose(value_r, [0, 2, 3, 1]) weighted_value_rt = tf.matmul(softmax_weights, value_rt, transpose_b=True) # [b, n, lq, lk] @ [b, n, h, lv] -> [b, n, lq, h] weighted_value_r = tf.transpose(weighted_value_rt, [0, 2, 1, 3]) # [b, lq, n, h] weighted_value = tf.reshape(weighted_value_r, [self.batch_size * self.seq_len, self.hid_size]) with tf.name_scope('bert/encoder/{}/attention/output'.format(layer_name)): matmul = tf.matmul(weighted_value, output_kernel.astype(self.dtype.as_numpy_dtype)) unnorm_output = tf.nn.bias_add(matmul, output_bias.astype(self.dtype.as_numpy_dtype)) output_tensor = tf.add(input_tensor, unnorm_output) return output_tensor def layer_norm(self, input_tensor, layer_name, force_float32=False): dtype = tf.float32 if force_float32 else self.layer_norm_dtype gamma = dtype.as_numpy_dtype(self.weights_dict['bert/{}/LayerNorm/gamma:0'.format(layer_name)]) beta = dtype.as_numpy_dtype(self.weights_dict['bert/{}/LayerNorm/beta:0'.format(layer_name)]) with tf.name_scope('bert/{}/LayerNorm'.format(layer_name)): input_tensor = tf.cast(input_tensor, dtype) mean = tf.reduce_mean(input_tensor, axis=[-1], keepdims=True, name='mean') residuals = tf.subtract(input_tensor, mean, name='residuals') var = tf.reduce_mean(residuals * residuals, axis=[-1], keepdims=True, name='var') rsqrt = tf.rsqrt(var + dtype.as_numpy_dtype(self.eps)) norm_output = tf.multiply(residuals, rsqrt, name='normalized') output_tensor = norm_output * gamma + beta output_tensor = tf.cast(output_tensor, self.dtype) return output_tensor def pooler_loss(self, input_tensor): pooler_kernel = self.weights_dict['bert/pooler/dense/kernel:0'] pooler_bias = self.weights_dict['bert/pooler/dense/bias:0'] loss_kernel = self.weights_dict['output_weights:0'].T loss_bias = self.weights_dict['output_bias:0'] with tf.name_scope('bert/pooler_loss'): reshape = tf.reshape(input_tensor, [self.batch_size, self.seq_len, self.hid_size]) reshape_1 = tf.reshape(reshape[:, 0:1, :], [self.batch_size, self.hid_size]) matmul = tf.matmul(reshape_1, pooler_kernel.astype(self.dtype.as_numpy_dtype)) bias_add = tf.nn.bias_add(matmul, pooler_bias.astype(self.dtype.as_numpy_dtype)) tanh = tf.tanh(bias_add) matmul = tf.matmul(tanh, loss_kernel.astype(self.dtype.as_numpy_dtype)) output_tensor = tf.nn.bias_add(matmul, loss_bias.astype(self.dtype.as_numpy_dtype)) return output_tensor def gelu_tanh(self, tensor): pow3 = 0.044714998453855515 * tensor * tensor * tensor + tensor shifted = (tf.tanh(0.7978845834732056 * pow3) + 1.0) * tensor return tf.multiply(shifted, 0.5) def gelu_sigmoid(self, tensor): return tf.sigmoid(1.702 * tensor) * tensor if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/bert_demo/bert_model_server.py ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ import os import argparse import subprocess import time _ONE_DAY_IN_SECONDS = 60 * 60 * 24 def serve(): parser = argparse.ArgumentParser() parser.add_argument('--serving', required=True, help='Path to tf-serving binary') parser.add_argument('--dir', required=True, help='TensorFlow SavedModel dir') parser.add_argument('--port', default=8500, help='gRPC port') parser.add_argument('--parallel', type=int, default=8, help='Number of predictors') args = parser.parse_args() model = os.path.abspath(args.dir) model_with_version = os.path.join(model, '1') if not os.path.exists(model_with_version): os.makedirs(model_with_version) os.symlink(os.path.join(model, 'variables'), os.path.join(model_with_version, 'variables')) os.symlink(os.path.join(model, 'saved_model.pb'), os.path.join(model_with_version, 'saved_model.pb')) process_list = [] for _ in range(args.parallel): proc = subprocess.Popen([ args.serving, '--model_base_path={}'.format(model), '--port={}'.format(args.port), '--tensorflow_intra_op_parallelism=1', '--tensorflow_inter_op_parallelism=1' ]) process_list.append(proc) try: time.sleep(_ONE_DAY_IN_SECONDS) except KeyboardInterrupt: for proc in process_list: proc.terminate() proc.wait() if __name__ == '__main__': serve() ================================================ FILE: src/examples/tensorflow/bert_demo/bert_no_model.py ================================================ # bert_no_model.py import argparse import tensorflow as tf import tensorflow.neuron as tfn def main(): parser = argparse.ArgumentParser() parser.add_argument('--input_saved_model', required=True, help='Original SaveModel') parser.add_argument('--output_saved_model', required=True, help='Output SavedModel that runs on Inferentia') parser.add_argument('--batch_size', type=int, default=1) args = parser.parse_args() pred = tf.contrib.predictor.from_saved_model(args.input_saved_model) no_fuse_ops = [op.name for op in pred.graph.get_operations()] def force_fuse_condition(op_name): exclude_scopes = [ 'bert/encoder/strided_slice', 'bert/encoder/ones', 'bert/encoder/Reshape', 'bert/encoder/Shape', 'bert/encoder/Cast', ] for scope in exclude_scopes: if op_name == scope or op_name.startswith('{}/'.format(scope)): return False return op_name.startswith('bert/encoder') or op_name.startswith('bert/pooler') force_fuse_ops = [op.name for op in pred.graph.get_operations() if force_fuse_condition(op.name)] compilation_result = tfn.saved_model.compile( args.input_saved_model, args.output_saved_model, batch_size=args.batch_size, no_fuse_ops=no_fuse_ops, force_fuse_ops=force_fuse_ops, ) print(compilation_result) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/bert_demo/bert_server.py ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ import sys import os import collections import argparse import time import csv import random from concurrent import futures import multiprocessing from multiprocessing.dummy import Pool from threading import Lock import pkg_resources from distutils.version import LooseVersion import grpc import numpy as np import tensorflow as tf import mrpc_feature import tokenization import mrpc_pb2 sys.path.append(os.path.dirname(__file__)) import mrpc_pb2_grpc _ONE_DAY_IN_SECONDS = 60 * 60 * 24 total_tpt = 0 num_tpt = 0 class BERTService(mrpc_pb2_grpc.mrpcServicer): def __init__(self, model_path, parallel, batch_size, bootstrap, vocab_txt, num_thread_per_predictor=2): num_queues = parallel * num_thread_per_predictor config = tf.ConfigProto(inter_op_parallelism_threads=num_queues, intra_op_parallelism_threads=1) tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version >= LooseVersion('1.15.0.1.0.1333.0'): neuroncore_group_sizes = '{}x1'.format(parallel) predictor = tf.contrib.predictor.from_saved_model(model_path, config=config) self.predictor_list = [predictor for _ in range(num_queues)] else: neuroncore_group_sizes = ','.join('1' for _ in range(parallel)) predictor_list = [tf.contrib.predictor.from_saved_model(model_path, config=config) for _ in range(parallel)] self.predictor_list = [] for pred in predictor_list: self.predictor_list.extend(pred for _ in range(num_thread_per_predictor)) os.environ['NEURONCORE_GROUP_SIZES'] = neuroncore_group_sizes if self.predictor_list[0].feed_tensors['input_ids'].shape.is_fully_defined(): self.batch_size = self.predictor_list[0].feed_tensors['input_ids'].shape.as_list()[0] else: self.batch_size = batch_size self.bootstrap = bootstrap self.tokenizer = tokenization.FullTokenizer(vocab_file=vocab_txt, do_lower_case=True) self.num_infer = 0 self.num_correct = 0 self.output_name = list(self.predictor_list[0].fetch_tensors.keys())[0] self.iid = 0 self.throughput_list = [] self.latency_list = [] self.max_len_latency_list = 1000 self.iid_lock = Lock() if bootstrap: self.request_queue_list = [collections.deque() for _ in self.predictor_list] eval_data_path = os.path.join(os.path.dirname(__file__), 'glue_mrpc_dev.tsv') tsv = mrpc_feature.read_tsv(eval_data_path) for request_queue in self.request_queue_list: for _ in range(1024): data_list = random.choices(tsv[1:], k=self.batch_size) model_feed_dict_list = [mrpc_feature.text_pair_to_model_feed_dict(data[3], data[4], self.tokenizer) for data in data_list] label_list = [int(data[0]) for data in data_list] batch_labels = np.array(label_list) batch_feeds = { key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0) for key in model_feed_dict_list[0].keys() } request_queue.append((batch_feeds, batch_labels)) else: self.request_queue_list = [[] for _ in self.predictor_list] self.result_map = {} self.alive = True dummy_feed = { 'input_ids': np.zeros([1, 128], dtype=np.int32), 'input_mask': np.zeros([1, 128], dtype=np.int32), 'segment_ids': np.zeros([1, 128], dtype=np.int32), } self.dummy_feeds = [(None, dummy_feed) for _ in range(self.batch_size)] model_feed_dict_list = [dummy_feed for _ in range(self.batch_size)] batch_feeds = { key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0) for key in model_feed_dict_list[0].keys() } pool = Pool(len(self.predictor_list)) for pred in self.predictor_list: pool.apply_async(pred, (batch_feeds,)) time.sleep(1) pool.close() pool.join() def cleanup(self): for pred in self.predictor_list: print(pred) pred.session.close() def current_throughput(self): last_num_infer = self.num_infer global total_tpt global num_tpt while self.alive: current_num_infer = self.num_infer throughput = current_num_infer - last_num_infer self.throughput_list.append(throughput) print('current throughput {}'.format(throughput)) last_num_infer = current_num_infer if throughput != 0: total_tpt += throughput num_tpt += 1 time.sleep(1) def current_throughput_accuracy(self): last_num_infer = self.num_infer while self.alive: current_num_infer = self.num_infer accuracy = 0.0 if self.num_infer == 0 else self.num_correct / self.num_infer print('current throughput {}, accuracy {}'.format(current_num_infer - last_num_infer, accuracy)) last_num_infer = current_num_infer if throughput != 0: total_tpt += throughput num_tpt += 1 time.sleep(1) def paraphrase(self, text_pair, context): iid = self.put_input(text_pair.text_a, text_pair.text_b) yes_no = mrpc_pb2.YesNo() if self.get_output(iid) == 1: yes_no.message = b'paraphrase!' yes_no.prediction = b'1' else: yes_no.message = b'not paraphrase!' yes_no.prediction = b'0' return yes_no def put_input(self, text_a, text_b): model_feed_dict = mrpc_feature.text_pair_to_model_feed_dict(text_a, text_b, self.tokenizer) with self.iid_lock: self.iid += 1 iid = self.iid self.request_queue_list[iid % len(self.request_queue_list)].append((iid, model_feed_dict)) return iid def process_input(self, idx): print('input processor is waiting') request_queue = self.request_queue_list[idx] predictor = self.predictor_list[idx] while self.alive: if len(request_queue) > 0: sublist = request_queue[:self.batch_size] request_queue[:self.batch_size] = [] if len(sublist) != self.batch_size: print('batch with {} garbage entries!'.format(self.batch_size - len(sublist))) if len(sublist) < self.batch_size: pad_batch_size = self.batch_size - len(sublist) sublist.extend(self.dummy_feeds[:pad_batch_size]) iid_list = [iid for iid, _ in sublist] model_feed_dict_list = [feed for _, feed in sublist] batch_feeds = { key: np.concatenate([feed[key] for feed in model_feed_dict_list], axis=0) for key in model_feed_dict_list[0].keys() } start = time.time() batch_predictions = predictor(batch_feeds)[self.output_name].argmax(-1) latency = time.time() - start if len(self.latency_list) < self.max_len_latency_list: self.latency_list.append(latency) self.result_map.update({iid: pred for iid, pred in zip(iid_list, batch_predictions)}) time.sleep(0.001) def process_input_bootstrap(self, idx): print('input processor is waiting') request_queue = self.request_queue_list[idx] predictor = self.predictor_list[idx] while self.alive: if len(request_queue) > 0: batch_feeds, batch_labels = request_queue.popleft() batch_predictions = predictor(batch_feeds)[self.output_name].argmax(-1) self.num_infer += self.batch_size self.num_correct += (batch_predictions == batch_labels).sum() continue time.sleep(0.0001) def get_output(self, iid): while iid not in self.result_map: time.sleep(0.001) self.num_infer += 1 return self.result_map.pop(iid) def serve(): parser = argparse.ArgumentParser() parser.add_argument('--port', default=60061, help='gRPC port') parser.add_argument('--dir', required=True, help='TensorFlow SavedModel dir') parser.add_argument('--parallel', type=int, default=4, help='Number of predictors') parser.add_argument('--thread', type=int, default=2, help='Number of threads used by each predictor') parser.add_argument('--batch', type=int, default=4, help='Batch size') parser.add_argument('--bootstrap', action='store_true', help='Server loads a dataset and run inference itself') args = parser.parse_args() vocab_txt = os.path.join(os.path.dirname(__file__), 'uncased_L-24_H-1024_A-16.vocab.txt') bert_service = BERTService(args.dir, args.parallel, args.batch, args.bootstrap, vocab_txt, args.thread) server = grpc.server( futures.ThreadPoolExecutor(max_workers=128), options=[('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1)]) mrpc_pb2_grpc.add_mrpcServicer_to_server(bert_service, server) server.add_insecure_port('[::]:{}'.format(args.port)) server.start() try: pool = Pool(len(bert_service.predictor_list) + 1) # +1 for bert_service.current_throughput if args.bootstrap: monitor_func = bert_service.current_throughput_accuracy process_func = bert_service.process_input_bootstrap else: monitor_func = bert_service.current_throughput process_func = bert_service.process_input pool.apply_async(monitor_func) if args.parallel == 1: process_func(0) else: for idx in range(len(bert_service.predictor_list)): pool.apply_async(process_func, (idx,)) pool.close() time.sleep(_ONE_DAY_IN_SECONDS) except KeyboardInterrupt: pass bert_service.cleanup() bert_service.alive = False server.stop(0) if __name__ == '__main__': serve() print(f'Average Throughput: {total_tpt/num_tpt}') ================================================ FILE: src/examples/tensorflow/bert_demo/download_mrpc_data.py ================================================ import os import sys import argparse import urllib.request MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt' MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt' # This function is taken from https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e. def format_mrpc(data_dir, path_to_data, path_to_dev_tsv): print("Processing MRPC...") mrpc_dir = os.path.join(data_dir, "MRPC") if not os.path.isdir(mrpc_dir): os.mkdir(mrpc_dir) if path_to_data: mrpc_train_file = os.path.join(path_to_data, "msr_paraphrase_train.txt") mrpc_test_file = os.path.join(path_to_data, "msr_paraphrase_test.txt") else: try: mrpc_train_file = os.path.join(mrpc_dir, "msr_paraphrase_train.txt") mrpc_test_file = os.path.join(mrpc_dir, "msr_paraphrase_test.txt") urllib.request.urlretrieve(MRPC_TRAIN, mrpc_train_file) urllib.request.urlretrieve(MRPC_TEST, mrpc_test_file) except urllib.error.HTTPError: print("Error downloading MRPC") return assert os.path.isfile(mrpc_train_file), "Train data not found at %s" % mrpc_train_file assert os.path.isfile(mrpc_test_file), "Test data not found at %s" % mrpc_test_file with open(mrpc_test_file, encoding='utf-8') as data_fh, \ open(os.path.join(mrpc_dir, "test.tsv"), 'w', encoding='utf-8') as test_fh: header = data_fh.readline() test_fh.write("index\t#1 ID\t#2 ID\t#1 String\t#2 String\n") for idx, row in enumerate(data_fh): label, id1, id2, s1, s2 = row.strip().split('\t') test_fh.write("%d\t%s\t%s\t%s\t%s\n" % (idx, id1, id2, s1, s2)) dev_ids = [] with open(path_to_dev_tsv, encoding='utf-8') as dev_fh: header = dev_fh.readline() for row in dev_fh: _, id1, id2, _, _ = row.strip().split('\t') dev_ids.append([id1, id2]) with open(mrpc_train_file, encoding='utf-8') as data_fh, \ open(os.path.join(mrpc_dir, "train.tsv"), 'w', encoding='utf-8') as train_fh, \ open(os.path.join(mrpc_dir, "dev.tsv"), 'w', encoding='utf-8') as dev_fh: header = data_fh.readline() train_fh.write(header) dev_fh.write(header) for row in data_fh: label, id1, id2, s1, s2 = row.strip().split('\t') if [id1, id2] in dev_ids: dev_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2)) else: train_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2)) print("\tCompleted!") def main(arguments): parser = argparse.ArgumentParser() parser.add_argument('--data_dir', help='directory to save data to', type=str, default='glue_data') parser.add_argument('--path_to_mrpc', help='path to directory containing extracted MRPC data, msr_paraphrase_train.txt and msr_paraphrase_text.txt', type=str, default='') parser.add_argument('--path_to_dev_tsv', help='path to directory containing the glue_mrpc_dev.tsv', type=str, default='glue_mrpc_dev.tsv') args = parser.parse_args(arguments) if not os.path.isdir(args.data_dir): os.mkdir(args.data_dir) format_mrpc(args.data_dir, args.path_to_mrpc, args.path_to_dev_tsv) if __name__ == '__main__': sys.exit(main(sys.argv[1:])) ================================================ FILE: src/examples/tensorflow/bert_demo/glue_mrpc_dev.tsv ================================================ Quality #1 ID #2 ID #1 String #2 String 1 1355540 1355592 He said the foodservice pie business doesn 't fit the company 's long-term growth strategy . " The foodservice pie business does not fit our long-term growth strategy . 0 2029631 2029565 Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war . His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war . 0 487993 487952 The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat . The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent . 1 1989515 1989458 The AFL-CIO is waiting until October to decide if it will endorse a candidate . The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries . 0 1783137 1782659 No dates have been set for the civil or the criminal trial . No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty . 1 3039165 3039036 Wal-Mart said it would check all of its million-plus domestic workers to ensure they were legally employed . It has also said it would review all of its domestic employees more than 1 million to ensure they have legal status . 0 1490811 1490840 While dioxin levels in the environment were up last year , they have dropped by 75 percent since the 1970s , said Caswell . The Institute said dioxin levels in the environment have fallen by as much as 76 percent since the 1970s . 1 426112 426210 This integrates with Rational PurifyPlus and allows developers to work in supported versions of Java , Visual C # and Visual Basic .NET. IBM said the Rational products were also integrated with Rational PurifyPlus , which allows developers to work in Java , Visual C # and VisualBasic .Net. 1 1439663 1439808 The top rate will go to 4.45 percent for all residents with taxable incomes above $ 500,000 . For residents with incomes above $ 500,000 , the income-tax rate will increase to 4.45 percent . 1 3147370 3147525 The results appear in the January issue of Cancer , an American Cancer Society journal , being published online today . The results appear in the January issue of Cancer , an American Cancer Society ( news - web sites ) journal , being published online Monday . 1 3300040 3299992 The delegates said raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . Bin Laden ’ s men pointed out that raising and distributing funds has been complicated by the U.S. crackdown on jihadi charitable foundations , bank accounts of terror-related organizations and money transfers . 0 524136 524119 " Sanitation is poor ... there could be typhoid and cholera , " he said . " Sanitation is poor , drinking water is generally left behind . . . there could be typhoid and cholera . " 0 969512 969295 The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . The technology-laced Nasdaq Composite Index was down 25.36 points , or 1.53 percent , at 1,628.26 . 1 1685339 1685429 The only announced Republican to replace Davis is Rep. Darrell Issa of Vista , who has spent $ 1.71 million of his own money to force a recall . So far the only declared major party candidate is Rep. Darrell Issa , a Republican who has spent $ 1.5 million of his own money to fund the recall . 1 1967578 1967664 The decision to issue new guidance has been prompted by intelligence passed to Britain by the FBI in a secret briefing in late July . Scotland Yard 's decision to issue new guidance has been prompted by new intelligence passed to Britain by the FBI in late July . 1 2047034 2046820 Unable to find a home for him , a judge told mental health authorities they needed to find supervised housing and treatment for DeVries somewhere in California . The judge had told the state Department of Mental Health to find supervised housing and treatment for DeVries somewhere in California . 1 2046630 2046644 The decision came a year after Whipple ended federal oversight of the district 's racial balance , facilities , budget , and busing . The decision came a year after Whipple ended federal oversight of school busing as well as the district 's racial balance , facilities and budget . 0 2221603 2221633 In midafternoon trading , the Nasdaq composite index was up 8.34 , or 0.5 percent , to 1,790.47 . The Nasdaq Composite Index .IXIC dipped 8.59 points , or 0.48 percent , to 1,773.54 . 1 129995 129864 Morgan Stanley raised its rating on the beverage maker to " overweight " from " equal-weight " saying in part that pricing power with its bottlers should improve in 2004 . Morgan Stanley raised its rating on the company to " overweight " from " equal-weight , " saying the beverage maker 's pricing power with bottlers should improve in 2004 . 0 919683 919782 The pound also made progress against the dollar , reached fresh three-year highs at $ 1.6789 . The British pound flexed its muscle against the dollar , last up 1 percent at $ 1.6672 . 0 970740 971209 Friday , Stanford ( 47-15 ) blanked the Gamecocks 8-0 . Stanford ( 46-15 ) has a team full of such players this season . 1 2745055 2745022 Last month Intel raised its revenue guidance for the quarter to between $ 7.6 billion and $ 7.8 billion . At the end of the second quarter , Intel initially predicted sales of between $ 6.9 billion and $ 7.5 billion . 0 2199097 2199072 The driver , Eugene Rogers , helped to remove children from the bus , Wood said . At the accident scene , the driver was " covered in blood " but helped to remove children , Wood said . 1 1609290 1609098 ONG KONG , July 9 Tens of thousands of demonstrators gathered tonight before the legislature building here to call for free elections and the resignation of Hong Kong 's leader . Tens of thousands of demonstrators gathered yesterday evening to stand before this city 's legislature building and call for free elections and the resignation of Hong Kong 's leader . 1 1597193 1597119 Saddam loyalists have been blamed for sabotaging the nation 's infrastructure , as well as frequent attacks on U.S. soldiers . Hussein loyalists have been blamed for sabotaging the nation 's infrastructure and attacking US soldiers . 1 2758944 2758975 Its closest living relatives are a family frogs called sooglossidae that are found only in the Seychelles in the Indian Ocean . Its closest relative is found in the Seychelles Archipelago , near Madagascar in the Indian Ocean . 0 2584416 2584653 Cooley said he expects Muhammad will similarly be called as a witness at a pretrial hearing for Malvo . Lee Boyd Malvo will be called as a witness Wednesday in a pretrial hearing for fellow sniper suspect John Allen Muhammad . 1 86007 86373 " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " " Instead of pursuing the most imminent and real threats - international terrorists - this Bush administration has chosen to settle old scores , " Graham said . 1 1602860 1602844 He said they lied on a sworn affidavit that requires them to list prior marriages . Morgenthau said the women , all U.S. citizens , lied on a sworn affidavit that requires them to list prior marriages . 1 1201306 1201329 The association said 28.2 million DVDs were rented in the week that ended June 15 , compared with 27.3 million VHS cassettes . The Video Software Dealers Association said 28.2 million DVDs were rented out last week , compared to 27.3 million VHS cassettes . 0 461779 461815 With these assets , Funny Cide has a solid chance to become the first Triple Crown winner since Affirmed in 1978 . Funny Cide is looking to become horse racing 's first Triple Crown winner in a generation . 1 1438666 1438643 Intel was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel , " spokesman Chuck Mulloy said . Intel spokesman Chuck Mulloy said the company was disappointed and assessing its " options in the event Mr. Hamidi resumes his spamming activity against Intel . " 1 3261484 3261306 Mr Annan also warned the US should not use the war on terror as an excuse to suppress " long-cherished freedoms " . Annan warned that the dangers of extremism after September 11 should not be used as an excuse to suppress " long-cherished " freedoms . 1 1277539 1277527 At community colleges , tuition will jump to $ 2,800 from $ 2,500 . Community college students will see their tuition rise by $ 300 to $ 2,800 or 12 percent . 1 3035788 3035918 He made a point of saying during Tuesdays debate that the Confederate flag was a racist symbol . Though Dean made a point of saying during the debate that the Confederate flag is a racist symbol . 0 132553 132725 Bush wanted " to see an aircraft landing the same way that the pilots saw an aircraft landing , " White House press secretary Ari Fleischer said yesterday . On Tuesday , before Byrd 's speech , Fleischer said Bush wanted ' ' to see an aircraft landing the same way that the pilots saw an aircraft landing . 0 2259788 2259747 On Monday the Palestinian Prime Minister , Mahmoud Abbas , will report to the Palestinian parliament on his Government 's achievements in its first 100 days in office . Palestinian Prime Minister Mahmoud Abbas must defend the record of his first 100 days in office before Parliament today as the death toll in the occupied territories continues to rise . 0 2307064 2307235 The civilian unemployment rate improved marginally last month -- slipping to 6.1 percent -- even as companies slashed payrolls by 93,000 . The civilian unemployment rate improved marginally last month _ sliding down to 6.1 percent _ as companies slashed payrolls by 93,000 amid continuing mixed signals about the nation 's economic health . 1 3046488 3046824 Per-user pricing is $ 29 for Workplace Messaging , $ 89 for Team Collaboration and $ 35 for Collaborative Learning . Workplace Messaging is $ 29 , Workplace Team Collaboration is $ 89 , and Collaborative Learning is $ 35 . 1 86020 86007 " Instead of pursuing the most imminent and real threats – international terrorism – this Bush administration chose to settle old scores , " Mr. Graham said . " Instead of pursuing the most imminent and real threats - international terrorists , " Graham said , " this Bush administration chose to settle old scores . " 0 1100998 1100441 SARS has killed about 800 people and affected more than 8400 since being detected in China in November . SARS has killed about 800 people and sickened more than 8,400 worldwide , mostly in Asia . 1 2268396 2268480 Authorities had no evidence to suggest the two incidents were connected . There was no immediate evidence that the two incidents were connected , police said . 0 1984039 1983986 " Jeremy 's a good guy , " Barber said , adding : " Jeremy is living the dream life of the New York athlete . He also said Shockey is " living the dream life of a New York athlete . 0 2697659 2697747 Ratliff 's daughters , Margaret and Martha Ratliff , were adopted by Peterson after their mother 's death . Peterson helped raise Ratliff 's two daughters , Margaret and Martha Ratliff , who supported him throughout the trial . 0 2175939 2176090 After losing as much as 84.56 earlier , the Dow Jones industrial average closed up 22.81 , or 0.2 percent , at 9,340.45 . In midday trading , the Dow Jones industrial average lost 68.84 , or 0.7 percent , to 9,248.80 . 1 886618 886456 Rumsfeld , who has been feuding for two years with Army leadership , passed over nine active-duty four-star generals . Rumsfeld has been feuding for a long time with Army leadership , and he passed over nine active-duty four-star generals . 1 588637 588864 Consumers who said jobs are difficult to find jumped from 29.4 to 32.6 , while those claiming work was plentiful slipped from 13 to 12.6 . Consumers who said jobs are difficult to find jumped to 32.6 from 29.4 , while those saying work was plentiful slipped to 12.6 from 13 in April . 0 2252795 2252970 He has no immediate plans for television advertising , believing it is unnecessary this early . A Lieberman aide said there were no immediate plans for television advertising . 1 1756329 1756394 " I think it happened very quickly , " Houston Police Department homicide investigator Phil Yochum said of the crime . " I think it happened very quickly , " said Investigator Phil Yochum of the Houston Police Department 's homicide division . 1 1673112 1673068 United issued a statement saying it will " work professionally and cooperatively with all its unions . " Senior vice president Sara Fields said the airline " will work professionally and cooperatively with all our unions . " 1 2357324 2357271 " But they never climb out of the pot of beer again . " It 's just that they never climb out of the beer again . " 1 780408 780363 Chief financial officer Andy Bryant has said that hike had a greater affect volume than officials expected . Bryant has said that hike had a greater effect on demand than officials expected . 1 821523 821385 Robert Liscouski , the Assistant Secretary of Homeland Security for Infrastructure Protection , will oversee NCSD . NCSD 's chief will be Robert Liscouski , the assistant secretary of Homeland Security for Infrastructure Protection . 1 2304696 2304863 HP 's shipments increased 48 percent year-over-year , compared to an increase of 31 percent for Dell . HPs shipments increased 48 per cent year-on-year , compared to an increase of 31 per cent for Dell . 1 2531749 2531607 Chirac , who can pardon a law-breaker , refused Humbert 's request last year but kept in close touch with the family . Chirac , who has the authority to pardon law-breakers , refused Humbert 's request to be allowed to die last year but kept in close touch with the family . 1 3180014 3179967 The charges allege that he was part of the conspiracy to kill and kidnap persons in a foreign country . The government now charges that Sattar conspired with Rahman to kill and kidnap individuals in foreign countries . 1 726966 726945 In the 2002 study , the margin of error ranged from 1.8 to 4.4 percentage points . It has a margin of error of plus or minus three to four percentage points . 1 2638861 2638982 Mr. Clinton 's national security adviser , Sandy Berger , said that the White House wasn 't informed of the FBI activities . Clinton ’ s national security adviser , Sandy Berger , said in an interview that the White House was not informed of the FBI activities . 1 2495223 2495307 " This decision is clearly incorrect , " FTC Chairman Timothy Muris said in a written statement . The decision is " clearly incorrect , " FTC Chairman Tim Muris said . 1 55187 54831 Prosecutors allege that Nichols and co-conspirator Timothy McVeigh worked together to prepare a bomb that destroyed the Alfred P. Murrah Federal Building . Prosecutors allege that Nichols and coconspirator Timothy McVeigh worked together to prepare a 4,000-pound fuel-and-fertilizer bomb that destroyed the Murrah building . 0 2763381 2763517 Terri Schiavo , 39 , is expected to die sometime in the next two weeks in the Tampa-area hospice where she has spent the past several years . Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . 1 1990975 1991132 Secretary of State Colin Powell designated the Chechen leader believed responsible for last year 's hostage standoff in a Moscow theater as a threat to U.S. security Friday . U.S. Secretary of State Colin Powell on Friday designated Chechen rebel leader Shamil Basayev a threat to the security of the United States and to U.S. citizens . 1 2204353 2204418 " Today , we are trying to convey this problem to Russian President Vladimir Putin and US President George W Bush . " " Today , we are trying to convey this problem to Russian President Vladimir Putin ( news - web sites ) and President Bush ( news - web sites ) . " 1 60122 60445 That would be a potential setback to Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . The inquiry may hinder Chief Executive Phil Condit 's strategy of bolstering defense-related sales during a slump in jetliner deliveries . 1 961836 962243 PeopleSoft also said its board had officially rejected Oracle 's offer . Thursday morning , PeopleSoft 's board rejected the Oracle takeover offer . 0 3140260 3140288 The Dow Jones industrial average ended the day down 10.89 at 9,837.94 , after advancing 111.04 Wednesday . The Dow Jones industrial average fell 10.89 points , or 0.11 percent , to 9,837.94 . 1 1720166 1720115 Cortisol levels in the saliva of day care children were highest and rose most steeply in those judged by day care center personnel to be the shyest . Cortisol levels in the saliva of day-care children were highest and rose most steeply in those whom day-care centre staffed judged to be the shyest . 1 2573262 2573319 " The idea that Tony Abbott is in some way a one-dimensional political head-kicker couldn 't be more wrong , " Mr Howard said . " The idea that Tony Abbott is in some way a one-dimensional political head kicker couldn 't be more wrong . " 0 1353356 1353174 " Biotech products , if anything , may be safer than conventional products because of all the testing , " Fraley said , adding that 18 countries have adopted biotechnology . " Biotech products , if anything , may be safer than conventional products because of all the testing , " said Robert Fraley , Monsanto 's executive vice president . 1 2738677 2738741 The rate of skin cancer has tripled since the 1950s in Norway and Sweden , according to the study . The study also found that skin cancer nearly tripled in Norway and Sweden since the 1950s . 1 1638813 1639087 We acted because we saw the existing evidence in a new light , through the prism of our experience on 11 September , " Rumsfeld said . Rather , the US acted because the administration saw " existing evidence in a new light , through the prism of our experience on September 11 " . 1 1605350 1605425 Trans fat makes up only 1 percent to 3 percent of the total fat Americans consume , compared with 14 percent for saturated fat . Trans fat accounts for 2.5 percent of Americans ' daily calories , compared to 11 percent to 12 percent for saturated fat . 1 2494149 2494073 However , a recent slide in prices and OPEC 's expectations of a surge in oil inventories have compounded its fears about a further softening of the market . A 14 percent slide in crude prices this month and expectations of a build up in oil inventories compounded OPEC 's fears of a further softening of the market . 1 3023029 3023229 Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . Peterson , 31 , is charged with two counts of first-degree murder in the slayings of his wife , Laci , and their unborn son , Conner . 1 1351550 1351155 Carlson on Tuesday said he would not recuse himself from the case . Service officials said Carlson refused to recuse himself from the case . 1 981185 981234 The program will grow to include ports in Dubai , Turkey and Malaysia , among others . The program will be expanded to include areas of the Middle East such as Dubai , Turkey and Malaysia , Mr. Ridge said . 0 2111629 2111786 McCabe said he was considered a witness , not a suspect . " He is not considered a suspect , " McCabe said . 1 655498 655391 The woman was exposed to the SARS virus while in the hospital but was not a health care worker , said Dr. Colin D ’ Cunha , Ontario ’ s commissioner of public health . The woman was exposed to the SARS virus while in the hospital but was not a health-care worker , said Dr Colin D 'Cunha , Ontario 's commissioner of public health . 1 533823 533909 He added that those " are not solely American principles , nor are they exclusively Western . " " These are not solely American principles nor are they exclusively Western , " Rumsfeld said . 1 581592 581570 " If we don 't march into Tehran , I think we will be in pretty good shape , " he said . " As long as we don 't march on Tehran , I think we are going to be in pretty good shape , " he said . 0 1010655 1010430 On Saturday , a 149mph serve against Agassi equalled Rusedski 's world record . On Saturday , Roddick equalled the world record with a 149 m.p.h. serve in beating Andre Agassi . 1 2241925 2242066 Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new technologies and methods to communicate more quickly and efficiently . Chad Kolton , emergency management spokesman with the Department of Homeland Security , said the government is open to new ways to communicate . 1 2796978 2797024 " APEC leaders are painfully aware that security and prosperity are inseparable , " Thai Prime Minister Thaksin Shinawatra told business leaders . " APEC leaders are painfully aware that security and prosperity are inseparable , " Thaksin said . 0 101746 101775 Danbury prosecutor Warren Murray could not be reached for comment Monday . Prosecutors could not be reached for comment after the legal papers were obtained late Monday afternoon . 1 327839 327748 Wittig resigned last year after being indicted on federal bank fraud charges involving a real estate loan unrelated to Westar business . Wittig resigned in late November about two weeks after being indicted on bank fraud charges in a real estate case unrelated to the company . 0 2988297 2988555 Shattered Glass , " starring Hayden Christensen as Stephen Glass , debuted well with $ 80,000 in eight theaters . " Shattered Glass " _ starring Hayden Christensen as Stephen Glass , The New Republic journalist fired for fabricating stories _ debuted well with $ 80,000 in eight theaters . 1 2217613 2217659 He was arrested Friday night at an Alpharetta seafood restaurant while dining with his wife , singer Whitney Houston . He was arrested again Friday night at an Alpharetta restaurant where he was having dinner with his wife . 0 2128530 2128455 However , EPA officials would not confirm the 20 percent figure . Only in the past few weeks have officials settled on the 20 percent figure . 1 2208376 2208198 University of Michigan President Mary Sue Coleman said in a statement on the university 's Web site , " Our fundamental values haven 't changed . " Our fundamental values haven 't changed , " Mary Sue Coleman , president of the university , said in a statement in Ann Arbor . 1 1980654 1980641 The first products are likely to be dongles costing between US $ 100 and US $ 150 that will establish connections between consumer electronics devices and PCs . The first products will likely be dongles costing $ 100 to $ 150 that will establish connections between consumer electronics devices and PCs . 0 589579 589557 However , Lapidus expects foreign brands ' sales to be up 4 percent , driven by strong truck sales at Honda Motor Co . Lapidus expects Ford to be down 5 percent , Chrysler down 10 percent and foreign brands up 4 percent driven by strong truck sales at Honda . 1 1636060 1635946 Michel , who remains in the government , denied that US pressure had provoked the government 's move . Michel , who has stayed in the new government , denied that it was U.S. pressure which had provoked the government 's move . 1 1630585 1630657 Some of the computers also are used to send spam e-mail messages to drum up traffic to the sites . Some are also used to send spam e-mail messages to boost traffic to the sites . 0 447728 447699 Indonesia 's army has often been accused of human rights abuses during GAM 's battle for independence , charges it has generally denied while accusing the separatists of committing rights violations . Indonesia 's army has been accused of human rights abuses during its earlier battles with GAM , charges it has generally denied . 1 1606495 1606619 Bush also hoped to polish his anti-AIDS credentials in Uganda , which has been hailed as an African pioneer in fighting the killer disease . President Bush flies to Uganda Friday hoping to polish his anti- AIDS credentials in a country hailed as an African pioneer in fighting the epidemic . 1 1550897 1550977 Later this year , the command will send trainers with soldiers from four North African nations on patrolling and intelligence gathering missions . This fall the command will send trainers to work with soldiers from four North African nations on patrolling and gathering intelligence . 0 490376 490490 The reports helped overcome investor jitters after the euro briefly hit an all-time high against the dollar Tuesday . Stocks slipped at the open after the euro hit record highs against the dollar . 1 3084554 3084612 Sales for the quarter beat expectations , rising 37 percent year-on-year to 1.76 billion euros . Sales rose 37 per cent year-on-year to 1.76bn , beating expectations . 1 315647 315778 If the MTA 's appeal to a higher court is successful , the $ 2 bus and subway base fare won 't be rolled back . If the MTA 's appeal is successful , the $ 2 bus and subway base fare won 't change . 1 3428298 3428362 Robert Walsh , 40 , remained in critical but stable condition Friday at Staten Island University Hospital 's north campus . Walsh , also 40 , was in critical but stable condition at Staten Island University Hospital last night . 1 2523564 2523358 The Guru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS ( Basic Input Output System ) update and a troubleshooting-assistance feature called Black Box . The µGuru microcontroller serves four functions : hardware monitoring , overclocking management , BIOS update and a troubleshooting-assistance feature called Black Box . 1 2079200 2079131 U.S. corporate bond yield spreads tightened in spotty trading on Friday as Wall Street labored to get back on its feet after the largest power outage ever in North America . U.S. stocks rose slightly on feather-light volume on Friday , as Wall Street regrouped after the biggest-ever power outage in North America . 1 818091 817811 The company said it would issue revised guidance for the full fiscal year next month when it releases its Q2 results . The company said it would renew its guidance for 2003 when it announces its second quarter results in mid-July . 1 1580638 1580663 " I stand 100 percent by it , and I think our intelligence services gave us the correct information at the time . " I stand 100 percent by it , and I think that our intelligence services gave us the correct intelligence and information at the time , " Blair said . 0 1919740 1919926 " I don 't know if the person I 'm talking to now may end up being someone else at another time that may not follow the rules , " Parrish said . " I don 't know whether the person I 'm talking to now may end up being someone else , " Parrish said . 1 2748287 2748550 " I think it 's going to be a close vote , but I think the grant proposal is going to win , " McConnell said . " I think it 's going to be a close vote , but I think the grant proposal 's going to win , " said Sen. Mitch McConnell , assistant majority leader . 1 3394891 3394775 Twenty-eight people were believed to have been spending Christmas Day with the caretaker of the St Sophia 's camp , when the mudslide smashed into two cabins . Twenty-seven people were believed to have been spending Christmas Day with the caretaker of Saint Sophia Camp , a Greek Orthodox facility , when the mudslide roared through . 0 2963943 2963880 One , Capt. Doug McDonald , remained hospitalized in critical condition on Thursday . Her 20-year-old sister , Allyson , was severely burned and remained hospitalized in critical condition . 0 1865364 1865251 The United States finally relented during President Bush 's visit to Africa earlier this month . During President Bush 's trip to Africa earlier this month , however , Washington said it would support the increase . 1 263690 263819 " There is no conscious policy of the United States , I can assure you of this , to move the dollar at all , " he said . He also said there is no conscious policy by the United States to move the value of the dollar . 1 283751 283290 It 's the first such drill since the September 11 terrorist attacks on New York and Washington . It is the nation 's first large-scale counterterrorism exercise since the Sept . 11 terrorist attacks . 1 2517014 2516995 Myanmar 's pro-democracy leader Aung San Suu Kyi will return home late Friday but will remain in detention after recovering from surgery at a Yangon hospital , her personal physician said . Myanmar 's pro-democracy leader Aung San Suu Kyi will be kept under house arrest following her release from a hospital where she underwent surgery , her personal physician said Friday . 1 1330643 1330622 According to the Merchant Marine Ministry , the 37-year-old ship is registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . The Baltic Sky is a 37-year-old ship registered to Alpha Shipping Inc. based in the Pacific Ocean nation of Marshall Islands . 1 3111452 3111428 In an unusual move , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages that critics contend could disrupt millions of Web sites . In an unusual move that critics contend could disrupt millions of Web sites , the U.S. Patent and Trademark Office is reconsidering a patent affecting Internet pages . 0 1167835 1167651 Kansas Department of Health and Environment records show there were 88 abortions performed on girls age 14 and younger last year . Statistics from the Kansas Department of Health and Environment show that 11,844 abortions were performed in the state last year . 0 1423836 1423708 A European Union spokesman said the Commission was consulting EU member states " with a view to taking appropriate action if necessary " on the matter . Laos 's second most important export destination - said it was consulting EU member states ' ' with a view to taking appropriate action if necessary ' ' on the matter . 1 2090911 2091154 Waiting crowds filling the streets on both sides overwhelmed the peacekeepers soon after daylight , sweeping past the barbed wire barricades . But waiting crowds filling the streets rushed the bridges soon after daylight , overrunning razor-wire barricades . 1 2265271 2265152 Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products not sold in the United States . Barry Callebaut will be able to use Brach 's retail network to sell products made from its German subsidiary Stollwerck , which makes chocolate products unknown to the American market . 1 3062202 3062308 By skirting the FDA 's oversight , Eagan said , the quality of the imported drugs is " less predictable " than for those obtained in the United States . By skirting the FDA 's oversight , Eagan said the quality of the imported drugs is " less predictable " than U.S. drugs . 1 2155514 2155377 He said : " For the first time there is an easy and affordable way of making this treasure trove of BBC content available to all . " " For the first time , there is an easy and affordable way of making this treasure trove of BBC content available to all , " Dyke said . 1 1552068 1551928 Three such vigilante-style attacks forced the hacker organizer , who identified himself only as " Eleonora [ 67 ] , " to extend the contest until 7 p.m. EST Sunday . Three such vigilante-style attacks forced the hacker organiser , who identified himself only as " Eleonora67 ] , " to extend the contest until 8am ( AEST ) today . 1 936978 937500 Eric Gagne pitched a perfect ninth for his 23rd save in as many opportunities . Gagne struck out two in a perfect ninth inning for his 23rd save . 0 985015 984975 One way or another , Harry Potter And The Order Of The Phoenix will be in your hands by Saturday . Just about everything about " Harry Potter and the Order of the Phoenix " will set records . 1 1430357 1430425 " Allison just proves you don 't need to wait until August or September to have a disaster , " said Josh Lichter , a meteorologist with the Houston-Galveston weather office . " Allison just proves you don 't need to wait until August or September to have a disaster , " Lichter said . 1 3039310 3039413 Today , analysts say , UN members can no longer ignore the shifts since the September 11 2001 attacks . On Wednesday , analysts say , UN members can no longer ignore the shifts since the attacks in the US of September 11 2001 . 1 34513 34742 Police say CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the United States . Mr McKinlay said that CIBA was involved in the importation of qat , a narcotic substance legal in Britain but banned in the US . 1 368067 368018 Chiron already has nearly 20 percent acceptances from PowderJect 's shareholders . Chiron has acceptances from holders of nearly 20 percent of PowderJect shares . 0 611663 611716 Ernst & Young has denied any wrongdoing and plans to fight the allegations . Ernst & Young has denied the SEC 's claims , and called its recommendations " irresponsible " . 1 98432 98657 The attack followed several days of disturbances in the city where American soldiers exchanged fire with an unknown number of attackers as civilians carried out demonstrations against the American presence . The attack came after several days of disturbance in the city in which U.S. soldiers exchanged fire with an unknown number of attackers as civilians protested the American presence . 1 3039007 3038845 No company employee has received an individual target letter at this time . She said no company official had received " an individual target letter at this time . " 1 1708040 1708062 Second-quarter results reflected a gain of 10 cents per diluted share , while the 2002 results included a loss of 19 cents per diluted share . The second-quarter results had a non-operating gain of 10 cents a share while the 2002 second-quarter performance had a net non-operating loss of 19 cents a share . 0 1757264 1757375 He allegedly told his ex-wife in an angry phone call that he had no intention of following their new custody agreement . The two had battled over custody and he allegedly told her in an angry phone call that he had no intention of following their new custody agreement . 1 383417 383558 Worldwide , more than 50 million people have seen " Les Miz , " with gross receipts of $ 1.8 billion . Worldwide , Les Misérables has been seen by over 50 million people , with a total gross of over $ 2 billion . 0 2766112 2766084 In fiction : Edward P. Jones ( " The Known World " ) and Scott Spencer ( " A Ship Made of Paper " ) . The fifth nominee for fiction is Scott Spencer , for A Ship Made of Paper . 1 1261116 1261234 " Overwhelmingly the Windows brand really resonated with them . " " Windows was the part of the experience that really resonated with people . " 1 3028143 3028234 The Centers for Medicare and Medicaid Services , the federal agency that runs Medicare , last year began a similar effort for nursing homes . The Centers for Medicare and Medicaid launched a similar consumer tool for nursing homes last year . 0 249699 249623 Vivace was founded in 1999 and has raised over $ 118 million in three rounds of venture financing . During difficult times for technology venture capital , Vivace raised over $ 118 million in three rounds of venture financing . 0 3448488 3448449 The Dow Jones industrial average < .DJI > added 28 points , or 0.27 percent , at 10,557 , hitting its highest level in 21 months . The Dow Jones industrial average < .DJI > rose 49 points , or 0.47 percent , to 10,578 . 1 2749322 2749663 The Democratic candidates also began announcing their fund-raising totals before Wednesday 's deadline to file quarterly reports with the Federal Election Commission . The Democratic candidates also began announcing their fund-raising totals in advance of the deadline today to file quarterly reports with the Federal Election Commission . 0 2204592 2204588 Sun Microsystems Inc. on Thursday said it had added 100 new third-party systems and 100 new components to its Hardware Compatibility List for the Solaris x86 operating system Platform Edition . The vendor has added 100 new third-party systems and 100 new components to the operating system 's Hardware Compatibility List ( HCL ) . 1 2889005 2888954 Prosecutors said PW Marketing violated the state 's 1998 anti-spam law by sending unsolicited e-mail without a toll-free number for recipients to call to stop additional mailings . Prosecutors said PW Marketing violated the 1998 anti-spam law because these unsolicited e-mails were sent without a free call number for recipients to phone to stop additional mailings . 0 1657632 1657619 The Neighbours star and singer spent yesterday resting at her family home in Sydney and will have more tests today . Goodrem spent yesterday resting in her family home in Sydney and will have more tests today to determine her exact treatment . 0 555617 555528 The 3 rd Armored Cavalry Regiment is 5,200 strong and the largest combat unit at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 2396937 2396818 " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the Fed said in a statement accompanying the unanimous decision . " The risk of inflation becoming undesirably low remains the predominant concern for the foreseeable future , " the policy-setting Federal Open Market Committee said . 0 2339738 2339771 " It is bad for Symbian , " said Per Lindberg , analyst at Dresdner Kleinwort Wasserstein . " Motorola has displayed clear disloyalty " to Symbian , said Per Lindberg , an analyst at Dresdner Kleinwort Wasserstein in London . 0 1616174 1616206 Bob Richter , a spokesman for House Speaker Tom Craddick , had no comment about the ruling . Bob Richter , spokesman for Craddick , R-Midland , said the speaker had not seen the ruling and could not comment . 1 635783 635802 But Ms Ward said the headroom under its financial covenants was " tight " and that there could be another downgrade if Southcorp breached any of its banking covenants . But Ms Ward said the headroom under its financial covenants was " tight " and that there could be a rating downgrade if Southcorp did breach any banking covenants . 1 3444633 3444733 He added : ``I 've never heard of more reprehensiblebehaviour by a doctor . The Harrisons ’ lawyer Paul LiCalsi said : “ I ’ ve never heard of more reprehensible behaviour by a doctor . 1 555553 555528 Broomhead was assigned to 2nd Squadron , 3rd Armor Cavalry Regiment , based at Fort Carson . Broomhead , 34 , was assigned to the 2nd Squadron , 3rd Armored Cavalry Regiment . 1 1112021 1111925 Other staff members , however , defended the document , saying it would still help policy-makers and the agency improve efforts to address the climate issue . Some E.P.A. staff members defended the document , saying that although pared down it would still help policy makers and the agency address the climate issue . 0 2749410 2749625 President Bush raised a record-breaking $ 49.5 million for his re-election campaign over the last three months , with contributions from 262,000 Americans , the president 's campaign chairman said Tuesday . President Bush has raised $ 83.9 million since beginning his re-election campaign in May , and has $ 70 million of that left to spend , his campaign said Tuesday . 1 1629064 1629043 An episode is declared when the ozone reaches .20 parts per million parts of air for one hour . A Stage 1 episode is declared when ozone levels reach 0.20 parts per million . 1 789691 789665 " He may not have been there , " the defence official said on Thursday . " He may not have been there , " said a defence official speaking on condition of anonymity . 1 844421 844679 The U.N. troops are in Congo to protect U.N. installations and personnel , and they can only fire in self defense and have been unable to stem the violence . The troops - whose mandate is to protect U.N. installations and personnel - can only fire in self-defense and have been unable to stem the violence . 1 58540 58567 North American markets grabbed early gains Monday morning , as earnings season begins to slow and economic indicators take the spotlight . North American futures pointed to a strong start to the first trading session of the week Monday , as earnings season slows and economic indicators take the spotlight . 1 781439 781461 Xerox itself paid a $ 10 million fine last year to settle similar SEC charges . Xerox itself previously paid a $ 10-million penalty to settle the SEC accusations . 1 1909579 1909408 " This deal makes sense for both companies , " said National Chief Executive Brian Halla . " This deal makes sense for both companies , " Halla said in a prepared statement . 0 787432 787464 The blasts killed two people and injured more than 150 others . The Atlanta Olympic Games attack killed one woman and injured more than 100 other people . 0 52758 52343 Morrill 's wife , Ellie , sobbed and hugged Bondeson 's sister-in-law during the service . At the service Morrill 's widow , Ellie , sobbed and hugged Bondeson 's sister-in-law as people consoled her . 1 1675025 1675047 Spansion products are to be available from both AMD and Fujitsu , AMD said . Spansion Flash memory solutions are available worldwide from AMD and Fujitsu . 1 2131318 2131372 About 1,500 police will be deployed for the visit . Around 1,500 police are to be deployed at Niigata for the ferry 's visit . 1 325763 325928 Gamarekian told The News she remembers only the woman 's first name - and refused to reveal it . She told the New York Daily News she remembers only the intern 's first name , which she refused to reveal . 1 2638975 2638855 One of the FBI ’ s key operatives , who had a falling out with the bureau , provided an account of the operation at a friend ’ s closed immigration court proceeding . One of the FBI 's key operatives , who has had a falling-out with the bureau , provided an account of the operation at a friend 's closed immigration court proceeding . 1 2198694 2198937 A nationally board certified teacher with a master 's degree , Kelley makes a salary of $ 65,000 in his 30th year . A nationally board certified teacher with a master 's degree , Kelley , in his 30th year teaching , makes $ 65,000 . 1 1825432 1825301 A man arrested for allegedly threatening to shoot and kill a city councilman from Queens was ordered held on $ 100,000 bail during an early morning court appearance Saturday . The Queens man arrested for allegedly threatening to shoot City Councilman Hiram Monserrate was held on $ 100,000 bail Saturday , a spokesman for the Queens district attorney said . 1 2906104 2906322 They were being held Sunday in the Camden County Jail on $ 100,000 bail . They remained in Camden County Jail on Sunday on $ 100,000 bail . 1 722278 722383 Ms Stewart , the chief executive , was not expected to attend . Ms Stewart , 61 , its chief executive officer and chairwoman , did not attend . 0 101747 101777 Christina 's aunt , Shelley Riling , said the defense 's claims were preposterous . Christina 's aunt , Shelley Riling , said she will address the court . 1 2224884 2224819 The Justice Department Aug. 19 gave pre-clearance for the Oct. 7 date for the election to recall Gov. Gray Davis , saying it would not affect minority voting rights . The Justice Department on Aug. 19 sanctioned the Oct. 7 date for recall election , saying it would not affect voting rights . 0 977938 978162 Lord Falconer hailed the changes as " a new beginning as far as the courts , Crown Prosecution Service and police are concerned " . " It 's a new beginning as far as the courts , Crown Prosecution Service and police are concerned , making the criminal justice system work better . " 0 1015010 1014963 GE stock closed at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 1513190 1513246 At least 27 US troops have been killed in hostile fire since Bush 's statement . At least 26 American troops have been killed in hostile fire since major combat was officially declared over on May 1 . 1 2385348 2385394 A recent poll showed Edwards with a narrow lead in South Carolina , and he plans a rally there later on Tuesday . A recent poll showed Edwards in a virtual four-way tie at the top in South Carolina , and he plans a rally there later on Tuesday . 1 2317018 2317252 November 17 's last victim was British defence attache Stephen Saunders , who was shot on an Athens road in June 2000 . November 17 's last victim was British defense attache Stephen Saunders , who was shot and killed at point-blank range on a busy Athens road in June 2000 . 0 1831696 1831660 The agency charged that one WD Energy worker discussed false reporting with traders at two other energy companies . The agency found further that a WD Energy employee discussed false reporting with traders at two other energy companies , which the CFTC didn 't identify . 1 1528383 1528083 Zulifquar Ali , a worshipper slightly wounded by shrapnel , said the assailants first targeted the mosque 's security guards . Witness Zulfiqar Ali , who was slightly wounded by shrapnel , said the attackers had focused on the mosque 's guards . 1 917965 918315 For the second year in a row , rises in hospital costs accounted for much of the inflation , accounting for 51 percent of the overall cost increase . For the second year in a row , rises in hospital costs dominated the increase , accounting for 51 percent of the overall cost spiral . 0 3218713 3218830 Q : Can I buy coverage for prescription drugs right away ? Congress has added a new benefit - an option to buy insurance coverage for prescription drugs . 1 221079 221003 The airline also said it has the option to buy 380 more airplanes , orders that would be split evenly between the two manufacturers . The airline has the option to buy 380 more , split evenly between the two manufacturers . 1 2546175 2546198 Dr Mark McClean , Jonathan 's family doctor , said if the drug had been administered earlier Jonathan would have retained more of his brain functions . Dr Mark McClean , the family 's GP , said had the drug been administered to Jonathan earlier , he would have retained more of his brain function . 0 799346 799268 The chain operates more than 3,400 stores , and has annual revenue of about $ 15.8 billion . The chain , which has been under new management since late 1999 , has more than 3,400 stores and $ 15.8 billion in annual revenue . 0 2673104 2673130 All patients developed some or all of the symptoms of E. coli food poisoning : bloody diarrhea , vomiting , abdominal cramping and nausea . Symptoms of the E. coli infection include bloody diarrhea , nausea , vomiting and abdominal cramping . 1 1354501 1354476 Federal regulators have turned from sour to sweet on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings Inc. and Dreyer 's Grand Ice Cream Inc . Federal regulators have changed their minds on a proposed $ 2.8 billion merger of ice cream giants Nestle Holdings and Dreyer 's Grand Ice Cream . 1 3070979 3070949 Environmental campaigners are using this weekend ’ s lunar eclipse to highlight the huge increase in light pollution across the UK . Environmental campaigners used the eclipse to highlight the surge in light pollution across Britain . 0 1264509 1264471 Available July 7 , the software supports the Solaris , IBM AIX , Red Hat Linux and Windows operating systems . The OpForce product currently works with Solaris , AIX , Red Hat Linux and Windows servers . 1 103280 103431 Justice Minister Martin Cauchon and Prime Minister Jean Chrétien have both said the Liberal government will introduce legislation soon to decriminalize possession of small amounts of pot for personal use . Justice Minister Martin Cauchon and Prime Minister Jean Chretien both have said the government will introduce legislation to decriminalize possession of small amounts of pot . 0 110731 110648 But Chauncey Billups demonstrated he 's also capable of big games , scoring 77 points over the final two games against the Magic . Billups scored 77 points in the final two games of the first-round series against the Magic . 1 2274844 2274714 Kelly killed himself after being exposed as the source for a BBC report which claimed the government had embellished evidence of Iraq 's banned weapons to justify the war . He killed himself after being exposed as the source for a BBC report which claimed the government exaggerated the case for war against Iraq . 0 1050307 1050144 And it 's going to be a wild ride , " said Allan Hoffenblum , a Republican consultant . Now the rest is just mechanical , " said Allan Hoffenblum , a Republican consultant . 1 2810634 2810670 While the Ibrahims had one separation operation , Goodrich and Dr. David Staffenberg plan about three for the Aguirres , with several weeks between each . Instead of one long operation to separate the twins , Goodrich and Dr. David Staffenberg plan about three , with several weeks between each . 1 3073773 3073779 Lay had contended that turning over the documents would violate his Fifth Amendment right against self-incrimination . Lay had refused to turn over the papers , asserting his Fifth Amendment right against self-incrimination . 0 261202 260995 The WHO experts didn 't say how many cases in Hebei were in rural areas . Hebei has reported 191 cases and eight deaths , though the WHO experts did not say how many were in rural areas . 1 1824224 1824209 Nearly 300 mutinous troops who seized a Manila shopping and apartment complex demanding the government resign gave up and retreated peacefully after some 19 hours . Mutinous troops who seized a Manila shopping and apartment complex demanding the government resign ended a 19-hour standoff late Sunday and returned to barracks without a shot fired . 1 548867 548785 In three years , Lend Lease has slipped from a top-five stock , when its share price was around $ 24 , to 37th . In the space of three years , Lend Lease has slipped from a top-five 5 stock when its share price hovered around $ 24 to 37th on the list . 0 2796658 2796682 About two hours later , his body , wrapped in a blanket , was found dumped a few blocks away . Then his body was dumped a few blocks away , found in a driveway on Argyle Road . 1 1808166 1808434 Columbia broke up over Texas upon re-entry on Feb. 1 . Columbia broke apart in the skies above Texas on Feb. 1 . 1 853475 853342 A year or two later , 259 , or 10 per cent , of the youths reported that they had started to smoke , or had taken just a few puffs . Within two years , 259 , or 10 percent , of the youths reported they had started to smoke or had at least taken a few puffs . 0 977772 977804 The Lord Chancellor was guardian of the Great Seal , used to stamp all official documents from the sovereign . Falconer will hold on , for now , to the Lord Chancellor 's Great Seal , used to sign off instructions from the sovereign . 1 577854 578500 Cindy Yeast , a 50-year-old Washington-area publicist , says she began taking supplements two years ago in part to avoid mild dementia that affects her elderly parents . She started taking supplements two years ago - partly to stave off mild dementia that affects her elderly parents . 1 2829194 2829229 The two are not related , but have referred to each other as father and son . He 's not related to Malvo , but the two have referred to each other as father and son . 1 2074182 2074668 Gibson said last month in a press statement that " neither I nor my film are anti-Semitic . Gibson said in a June statement that he and his film are not anti-Semitic . 0 2758265 2758282 The world 's largest software company said it recognized the difficulty the multiple patches posed for companies , and set out to make it easier for them to apply the updates . The world 's largest software company said it recognized the difficulty the multiple patches posed for companies trying to apply them . 1 1958079 1958143 The Dow Jones industrial average .DJI ended up 64.64 points , or 0.71 percent , at 9,191.09 , according to the latest available data . The blue-chip Dow Jones industrial average .DJI added 38 points , or 0.42 percent , to 9,165 . 1 544217 544325 The vote came just two days after Kurds swept City Council elections , taking the largest single block of votes on the 30-seat council . The vote for mayor followed City Council elections that gave Kurds the largest block of votes on the 30-seat council . 1 2385288 2385256 Large swells and dangerous surf already were being felt along sections of the coast . Already large swells and dangerous surf have arrived along the mid-Atlantic . 0 2324708 2325028 Based on a separate survey of households , the unemployment rate fell in August to 6.1 percent from 6.2 percent . Labor Department analysts discounted a slight improvement in the national unemployment rate , which fell in August to 6.1 percent from 6.2 percent . 1 2139506 2139427 " We will work with the board to ensure a smooth transition . " He said federal regulators would work with the corporation to ensure a " smooth transition . " 1 2965576 2965701 Gasps could be heard in the courtroom when the photo was displayed . Gasps could be heard as the photo was projected onto the screen . 1 2931098 2931144 Gilead had earnings of $ 73.1 million , or 33 cents a share , compared with $ 20.8 million , or 10 cents , in the year-ago quarter . Quarterly profit climbed to $ 73.1 million , or 33 cents a share , from $ 20.8 million , or 10 cents , a year earlier , the company said . 0 644788 644816 " I had one bad stretch of holes that put me out of contention to win , " Woods said . " I had one bad stretch of holes that put me out of contention , " Woods said , referring to his 42 on the front nine Saturday . 0 2551891 2551563 The poll had a margin of error of plus or minus 2 percentage points . It had a margin of sampling error of plus or minus four percentage points and was conducted Thursday through Saturday . 1 1089053 1089297 Sen. Patrick Leahy of Vermont , the committee 's senior Democrat , later said the problem is serious but called Hatch 's suggestion too drastic . Sen. Patrick Leahy , the committee 's senior Democrat , later said the problem is serious but called Hatch 's idea too drastic a remedy to be considered . 1 3435735 3435717 The broad Standard & Poor 's 500 < .SPX > eased 0.37 of a point , or 0.03 percent , at 1,121 . The Standard & Poor 's 500 Index < .SPX > slipped 0.26 point , or 0.02 percent , to 1,121.96 . 0 1954 2142 Watertown , Saugus and Framingham also are going smoke-free Monday , joining a growing number of cities around the country . Along with Boston , Watertown , Saugus and Framingham also are going smoke-free Monday . 1 3400796 3400822 That is evident from their failure , three times in a row , to get a big enough turnout to elect a president . Three times in a row , they failed to get a big _ enough turnout to elect a president . 1 1220668 1220801 We firmly believe we have an absolute right to use the common word ' spike ' as the name of our network . " We firmly believe that we have an absolute right to use the common word ' spike ' to name our network . 1 1889954 1889847 Sources who knew of the bidding said last week that cable TV company Comcast Corp. was also looking at VUE . Late last week , sources told Reuters cable TV company Comcast Corp. CMCSA.O also was looking at buying VUE assets . 1 315785 315653 But MTA officials appropriated the money to the 2003 and 2004 budgets without notifying riders or even the MTA board members considering the 50-cent hike , Hevesi found . MTA officials appropriated the surplus money to later years ' budgets without notifying riders or the MTA board members when the 50-cent hike was being considered , he said . 0 1521034 1520582 White , who had suffered kidney failure from years of high blood pressure , died at Cedars-Sinai Medical Center around 9 : 30 a.m. , said manager Ned Shankman . White , who had kidney failure from years of high blood pressure , had been undergoing dialysis and had been hospitalized since a September stroke . 1 2083598 2083810 About 10 percent of high school and 16 percent of elementary students must be proficient at math . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 1910610 1910455 The legal ruling follows three days of intense speculation Hewlett-Packard Co. may be bidding for the company . The legal ruling follows three days of wild volatility in RIM 's stock over speculation that PC giant Hewlett-Packard Co. may be bidding for the company . 1 3113791 3113782 The European Commission , the EU 's antitrust enforcer , is expected to issue its decision next spring — unless a settlement is reached . The European Commission is expected to issue its decision in the case next spring — unless a settlement is reached . 1 3214517 3214483 " So Sebastian did his best to convincingly confess to a crime that he didn 't commit in order to survive , " she told jurors . " Sebastian did his best to confess convincingly to a crime he didn 't do in order to survive , " Ms. Richardson declared . 0 2083612 2083810 Twenty percent of Latino students and 23 percent of black students performed at proficient or higher . In math , 16 percent of elementary and middle school students and 9.6 percent of high school students must be proficient . 1 661390 661218 He is charged in three bombings in Atlanta including a blast at the 1996 Olympics and one in Alabama . He is charged in three bombings in Atlanta - including a blast at the 1996 Olympics - along with the bombing in Alabama . 1 1269572 1269682 The men were remanded in custody and are due to appear again before court on July 8 . They were remanded in custody and will appear in court again on July 8 . 1 1095780 1095652 " No matter who becomes the sponsor for stock-car racing 's top series , NASCAR will need an all-star event , " Wheeler said in a statement . No matter who becomes the sponsor for stock-car racings top series , NASCAR will need an all-star event , Wheeler said Tuesday . 1 116294 116332 The Phillies were upset that Counsell had stolen second in the sixth inning with Arizona leading 7-1 . The Phillies were apparently upset when Counsell stole during the sixth with the Diamondbacks up 7-1 . 1 941617 941673 He said his hatred for such people grew from these discussions and had helped convince him violence was the answer . His hatred for these people had germinated from these discussions and helped cement his belief that violence was the panacea . 1 2640607 2640576 " There is no need for one deadline for all to create the ASEAN Economic Community , " Thaksin said . Thus , he said , there did not have to one deadline to create the economic community . 1 3310210 3310286 The announcement was made during the recording of a Christmas concert attended by top Vatican cardinals , bishops , and many elite from Italian society , witnesses said . The broadside came during the recording on Saturday night of a Christmas concert attended by top Vatican cardinals , bishops and many elite of Italian society , witnesses said . 1 3376093 3376101 The additional contribution brings total U.S. food aid to North Korea this year to 100,000 tonnes . The donation of 60,000 tons brings the total of U.S. contributions for the year to 100,000 . 1 1549586 1549609 Leon Williams ' body was found inside his third-floor apartment at 196 Bay St. , in Tompkinsville . The dead man , Leon Williams , was found in his third-floor apartment . 1 460211 460445 The player 's eyes were bloodshot and a blood-alcohol test produced a reading of 0.18 - well above Tennessee 's level of presumed intoxication of 0.10 , the report said . He failed a field sobriety test and a blood-alcohol test produced a reading of 0.18 – well above Tennessee 's level of presumed intoxication of 0.10 , the report said . 1 1196962 1197061 But Virgin wants to operate Concorde on routes to New York , Barbados and Dubai . Branson said that his preference would be to operate a fully commercial service on routes to New York , Barbados and Dubai . 0 862804 862715 He tried to fight off officers and was taken to a hospital after a police dog bit him but was later released . Cruz tried to fight off officers and was hospitalized after a police dog bit him , Sgt. Steve Dixon said . 1 1726935 1726879 The announcement , which economists said was not a surprise , may be bittersweet for the millions of Americans without jobs . Economists said the announcement was not a surprise , and politicians said it offered little comfort to the millions of Americans without jobs . 0 331980 332110 Asked if the delegates could leave on Friday , police intelligence chief in Aceh , Surya Dharma , told reporters they could not because they did not have proper permission . Asked if the delegates could leave on Friday , police intelligence chief Surya Dharma told reporters : " Of course they may not go . 1 173879 173832 Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid the yen 's rise against the dollar . Dealers said the dollar also drew some downside support as Japanese investors are expected to keep snapping up foreign bonds amid ever-falling domestic interest rates . 0 2834988 2835026 Iran has until the end of the month to satisfy the agency it has no plans for nuclear weapons . The Iranians have until the end of the month to answer all the agency 's questions about their past nuclear activities . 1 2587300 2587243 Her father , Florin Cioaba , the king of Transylvania 's Gypsies , had her brought back and she was married against her will . Her father , Roma King Florin Cioaba , had her brought back and she was promptly married against her will . 0 554905 554627 Claire had advanced to the third round of the 76th annual Scripps Howard National Spelling Bee . One by one they strolled to the microphone , all 251 youngsters in the 76th Scripps Howard National Spelling Bee . 1 1912524 1912648 Citigroup Inc . C.N , the world 's largest financial services company , on Wednesday promoted Marjorie Magner to chairman and chief executive of its global consumer group . Citigroup ( C ) on Wednesday named Marjorie Magner chairman and chief executive of its colossal global consumer business . 1 3255597 3255668 " They 've been in the stores for over six weeks , " says Carney . The quarterlies usually stay in stores for between six to eight weeks , " Carney added . 1 629316 629289 Let me just say this : the evidence that we have of weapons of mass destruction was evidence drawn up and accepted by the joint intelligence community . " The evidence that we had of weapons of mass destruction was drawn up and accepted by the Joint Intelligence Committee , " he said . 1 54181 53570 Ridge said no actual explosives or other harmful substances will be used . Ridge said no real explosives or harmful devices will be used in the exercise . 1 723557 724115 Thus far , Stewart 's company appears ready to stand behind her . For now , the company 's management appears to be standing behind Stewart . 0 2607718 2607708 But late Thursday night , the campaign issued a statement saying there would be no news conference and no big announcement . But late yesterday , the campaign and the state Democratic Party said there would be no news conference . 1 753858 753890 There 's also a flaw that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 587009 586969 Another $ 100-million in savings will come from management layoffs and pay cuts . The airline expects to save another $ 100-million a year through management layoffs and pay cuts . 1 308567 308525 He called on Prime Minister John Howard to establish a royal commission on child sex abuse . The Senate motion also called on Prime Minister John Howard to hold a royal commission into child sex abuse . 0 665419 665612 " We think that the United States of America should support the free speech of all groups , " Mr. White said , objecting to Mr. Olson 's recommendation . We think that the United States of America should support the free speech of all groups , he said . 1 2763517 2763576 Terri Schiavo , 39 , underwent the procedure at the Tampa Bay area hospice where she has been living for several years , said her father , Bob Schindler . The tube was removed Wednesday from Terri Schiavo , 39 , at the Tampa Bay-area hospice where she has lived for several years . 0 3107118 3107136 After 18 months , Nissen found that Lipitor stopped plaque buildup in the patients ' arteries . After 18 months , the atorvastatin patients had no change in the plaque in their arteries . 1 780604 780466 Toll , Australia 's second-largest transport company , last week offered NZ75 a share for Tranz Rail . Toll last week offered to buy the company for NZ75c a share , or $ NZ158 million . 0 1989213 1989116 " This child was literally neglected to death , " Armstrong County District Attorney Scott Andreassi said . Armstrong County District Attorney Scott Andreassi said the many family photos in the home did not include Kristen . 1 1462409 1462504 Wal-Mart , the nation 's largest private employer , has expanded its antidiscrimination policy to protect gay and lesbian employees , company officials said Tuesday . Wal-Mart Stores Inc . , the nation 's largest private employer , will now include gays and lesbians in its anti-discrimination policy , company officials said Wednesday . 1 260952 260924 Metro , bus and local rail services in France 's four largest towns -- Paris , Lyon , Lille and Marseille -- were severely disrupted , Europe 1 radio reported . Subway , bus and suburban rail services in France 's four largest cities -- Paris , Lyon , Lille and Marseille -- were severely disrupted , transport authorities said . 1 1224743 1225510 In the undergraduate case , Rehnquist said the use of race was not " narrowly tailored " to achieve the university 's asserted interest in diversity . Rehnquist wrote that the system was not narrowly tailored to achieve the interest in educational diversity . 0 3329379 3329416 SP2 is basically about security enhancements to Windows , such as the improved Internet Connection Firewall ( ICF ) . The firewall in the current Windows XP was known as the Internet Connection Firewall ( ICF ) . 1 2362761 2362698 A landslide in central Chungchong province derailed a Seoul-bound train and 28 passengers were injured , television said . In central Chungchong province , a landslide caused a Seoul-bound Saemaeul Express train to derail , injuring 28 people , local television said . 0 1465073 1464854 They will help draft a plan to attack obesity that Kraft will implement over three to four years . The team will help draft a plan by the end of the year to attack obesity . 1 195728 196099 But that amount would probably be impossible to pass in the Senate , where Republican moderates have refused to go above $ 350 billion . Such an amount would probably be unable to summon a majority of the Senate , where Republican moderates have refused to go above $ 350 billion . 1 2587767 2587673 In the clash with police , Lt. Mothana Ali said about 1,000 demonstrators had gone to the station demanding jobs . In Baghdad , police Lieut . Mothana Ali said about 1,000 demonstrators arrived at the station demanding jobs . 0 1490044 1489975 Corixa shares rose 54 cents to $ 7.74 yesterday on the Nasdaq Stock Market . Shares of Corixa rose 54 cents , or about 8 percent , to close at $ 7.74 . 1 958161 957782 Committee approval , expected today , would set the stage for debate on the Senate floor beginning Monday . That would clear the way for debate in the full Senate beginning on Monday . 1 1033204 1033365 O 'Brien was charged with leaving the scene of a fatal accident , a felony . Bishop Thomas O 'Brien , 67 , was booked on a charge of leaving the scene of a fatal accident . 0 2996241 2996734 Tom Hamilton said his daughter was conscious and alert and in stable condition after the attack Friday morning . Bethany , who remained in stable condition after the attack Friday morning , talked of the attack Saturday . 0 2015389 2015410 The Calgary woman , who is in her twenties , donated blood on Aug. 7 . The woman -- who has no symptoms of illness -- donated blood Aug. 7 . 1 221515 221509 Quattrone lawyer John W. Keker said his client is innocent . In a statement Monday , his lawyer John Keker said ``Frank Quattrone is innocent . 0 2283737 2283794 In the weeks leading up to the execution , several Florida officials received anonymous threatening letters . Several Florida officials connected to the case have received threatening letters , accompanied by rifle bullets . 1 2826681 2826474 The disagreement over online music sales was disclosed in documents filed last week with the judge and made available by the court yesterday . The fight over online music sales was disclosed in documents made available Monday by the court . 1 2249237 2249305 Parson was charged with intentionally causing and attempting to cause damage to protected computers . Parson is charged with one count of intentionally causing damage to a protected computer . 1 389239 389299 " The court and the public need to know much more of the details of the defendant 's seemingly massive fraud , " the judge said . " The court and the public need to know more of the defendants ' seemingly massive fraud , " he said . 1 2652187 2652218 The U.S. Supreme Court will hear arguments on Wednesday on whether companies can be sued under the Americans with Disabilities Act for refusing to rehire rehabilitated drug users . The high court will hear arguments today on whether companies can be sued under the ADA for refusing to rehire rehabilitated drug users . 1 2945693 2945847 The IRS said taxpayers can avoid undelivered checks by having refunds deposited directly into their checking or savings accounts . The IRS said taxpayers can avoid problems with lost or stolen refunds by having refunds deposited directly into personal checking or savings accounts . 1 2065523 2065836 " More than 70,000 men and women from bases in Southern California were deployed in Iraq . In all , more than 70,000 troops based in Southern California were deployed to Iraq . 1 2222998 2223097 BP shares slipped 0.8 percent to 433.50 pence ( $ 6.85 ) each in afternoon trading on the London Stock Exchange . BP shares slipped 48 cents to $ 41.72 Friday in trading on the New York Stock Exchange . 1 2561999 2561941 Because of the accounting charge , the company now says it lost $ 1.04 billion , or 32 cents a share , in the quarter ended June 30 . Including the charge , the Santa Clara , Calif.-based company said Monday it lost $ 1.04 billion , or 32 cents per share , in the period ending June 30 . 0 2324704 2325023 Friday 's report raised new worries that a weak job market could shackle the budding economic recovery despite a slight improvement in the overall unemployment rate . U.S. companies slashed payrolls for a seventh straight month in August , raising new worries that a weak jobs market could shackle the budding economic recovery . 1 2336453 2336545 Federal Emergency Management Administration designated $ 20 million to establish the registry . The registry was launched with $ 20 million from the Federal Emergency Management Agency . 1 720572 720486 BREAST cancer cases in the UK have hit an all-time high with more than 40,000 women diagnosed with the disease each year , Cancer Re-search UK revealed yesterday . Cases of breast cancer in Britain have reached a record high , with the number of women diagnosed with the disease passing the 40,000 mark for the first time . 1 1605818 1605806 " It was never our intention to sell the product , " said Health Minister Anne McClellan , a skeptic of medical marijuana use . " It was never the intention of us to sell product , " federal Health Minister Anne McLellan said yesterday in Edmonton . 0 2440680 2440474 GM , the world 's largest automaker , has 115,000 active UAW workers and another 340,000 retirees and spouses . They cover more than 300,000 UAW workers and 500,000 retirees and spouses . 0 726399 726078 Rosenthal is hereby sentenced to custody of the Federal Bureau of prisons for one day with credit for time served , " Breyer said to tumultuous cheers in the courtroom . " Rosenthal is hereby sentenced to custody of the Federal Bureau of Prisons for one day with credit for time served . " 1 533903 533818 " We are committed to helping the Iraqi people get on the path to a free society , " Rumsfeld said in a speech to the Council on Foreign Relations . " We are committed to helping the Iraqi people get on the path to a free society , " he said . 1 1166473 1166857 Mr. Young said he was disappointed that the government didn 't see the severe acute respiratory syndrome crisis as worthy of federal disaster-relief money . Young said he was disappointed the government didn 't see the SARS crisis as worthy of federal disaster relief money . 1 144089 143697 The 12-nation currency has risen by 33 percent against the dollar over the past 15 months . The euro is up 9 percent against the dollar in the past six weeks . 1 3439854 3439874 In February 2000 , the officers — Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy — were acquitted of all charges in the killing . The officers -- Kenneth Boss , Sean Carroll , Edward McMellon and Richard Murphy -- were acquitted in 2000 of state murder charges . 1 3464314 3464302 I was surprised it turned out me talking and the president just listening . " I was surprised it turned out me talking and the president just listening . . . It was mostly a monologue . " 1 2008984 2009175 The state 's House delegation currently consists of 17 Democrats and 15 Republicans . Democrats hold a 17-15 edge in the state 's U.S. House delegation . 0 816867 816831 Freddie also said Leland C. Brendsel will retire as chairman and chief executive and resign from the board . He replaces Leland Brendsel , 61 , who retired as chairman and chief executive . 1 192285 192327 We 'll be listening carefully to the [ IAEA ] director general 's report at the next board meeting . " We 'll be listening carefully to the ( IAEA ) director-general 's report at the next board meeting . " 1 2688145 2688162 In that position , Elias will report to Joe Tucci , president and CEO of EMC . As executive vice president of new ventures , Elias will report to Joe Tucci , EMC 's president and chief executive . 1 3294207 3294290 But with the PM due to leave tomorrow afternoon for personal reasons there was a risk he might not be present when the final decision was made . But with the Prime Minister due to leave tomorrow , a day early , he may not be present when the final decision is made . 0 205100 205145 A pro-independence radical , Miodrag Zivkovic , of the Liberal Alliance , came in second with 31 percent of the vote . Miodrag Zivkovic , of the Liberal Alliance of Montenegro , won 31 percent of the vote while the independent Dragan Hajdukovic got four percent . 0 3242051 3241897 Mr. Kerkorian tried unsuccessfully to take over Chrysler in 1995 , but did win representation on its board . Kerkorian and Tracinda had also tried to take over Chrysler in 1995 . 0 1076861 1077018 Glover spoke at a news conference that included about 20 relatives of the victims . About 20 family members of the victims were invited to the news conference . 1 2095803 2095786 Drax faced a financial crisis late last year after it lost its most lucrative sales contract , held with insolvent utility TXU Europe . Drax ’ s troubles began late last year when it lost its most lucrative sales contract , with the insolvent utility TXU Europe . 1 2112330 2112376 But I would rather be talking about high standards than low standards . " " I would rather be talking about positive numbers rather than negative . 1 3389318 3389271 It was not immediately known how many people were on flight UTA 141 , which could carry 141 passengers and crew . It was still not known exactly how many people were on the plane , which could carry 141 passengers and crew . 1 698948 698933 The market remains pinned in a narrow range after a powerful rally drove the broad Standard & Poor 's 500 index .SPX up more than 20 percent since mid-March . The market remains pinned in a narrow range after a powerful rally pushed the broad S & P 500 index up more than 20 percent since mid-March . 1 539585 539355 Witnesses said they believed the man planned to crash the Launceston-bound Qantas flight 1737 , which was carrying 47 passengers and six crew . Witnesses believe he wanted to crash Flight 1737 , which had 47 passengers and six crew . 1 684848 684557 As Samudra sat down to hear the indictment , he looked over to his nine lawyers and shouted ``God is Great ' ' three times . As he sat down to hear the indictment , Samudra looked over to his nine lawyers and shouted " Takbir ! " , or " Proclaim ! " , a religious rallying cry . 1 347017 347002 In hardest-hit Taipei , traffic has disappeared from once bustling streets , ubiquitous department stores stand mostly empty and restaurants are eerily quiet . In hardest-hit Taipei , traffic has disappeared from once-bustling streets and department stores and restaurants are virtually empty . 1 1592037 1592076 In a statement , Lee said he " no longer believes that Viacom deliberately intended to trade on my name when naming Spike TV . " Spike Lee no longer believes that Viacom deliberately intended to trade on his name by calling its own venture " Spike TV , " according to a statement read in court Tuesday . 0 3013483 3013540 Singapore Prime Minister Goh Chok Tong says China plays an important role in the integration of Asia , including managing the stresses and strains both within and between countries . HAINAN PROVINCE , China : Singapore Prime Minister Goh Chok Tong said China plays an important role in the integration of Asia . 1 2020252 2020081 The worm attacks Windows computers via a hole in the operating system , an issue Microsoft on July 16 had warned about . The worm attacks Windows computers via a hole in the operating system , which Microsoft warned of 16 July . 0 2614947 2614904 The premium edition adds OfficeFront Page 2003 , Acceleration Server 2000 , and SQL Server 2000 . The premium edition adds ISA Server , SQL Server and a specialized edition of BizTalk 2004 . 0 1744257 1744378 In the year-ago quarter , the steelmaker recorded a profit of $ 16.2 million , or 15 cents per share , on sales of $ 1.14 billion . In the second quarter last year , AK Steel reported a profit of $ 16.2 million , or 15 cents a share . 0 1119721 1119714 Sony claimed that the reader 's capacitance sensing technology cannot be fooled by paper copies and does not require cleaning . Its capacitance sensing technology electronically reads a fingerprint ; Sony says it can 't be fooled by paper copies and doesn 't require cleaning . 1 1186754 1187056 Amazon.com shipped out more than a million copies of the new book , making Saturday the largest distribution day of a single item in e-commerce history . Amazon.com shipped more than a million copies by Saturday afternoon , making Saturday the largest distribution day of a single item in e-commerce history . 1 2842562 2842582 The show 's closure affected third-quarter earnings per share by a penny . The company said this impacted earnings by a penny a share . 0 431076 431242 After the two-hour meeting on May 14 , publisher Arthur O. Sulzberger Jr . , executive editor Howell Raines and managing editor Gerald Boyd pledged quick remedies to staff grievances . The committee will make recommendations to Publisher Arthur Sulzberger , Executive Editor Howell Raines and Managing Editor Gerald Boyd . 1 1393764 1393984 It 's been a busy couple of days for security gurus assigned to keep their companies safe and sound . It 's been a busy couple of days for enterprise security gurus tasked with the job of keeping their companies safe and sound . 0 2916199 2916164 Lu reclined in a soft chair wearing a woolly coat near the blackened capsule . " It 's great to be back home , " said Lu , dressed in a woolly coat near the blackened capsule . 1 2530671 2530542 Gov. Bob Riley proposed the budget cuts after Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 . After Alabama voters rejected his $ 1.2 billion tax plan Sept . 9 , Riley forecast significant cuts in state programs . 1 219064 218969 " It is probably not the easiest time to come in and take over the shuttle program , but then again , I look forward to the challenge , " he said . " It 's probably not the easiest time to come in and take over the shuttle program , but I look forward to the challenge , " Parsons told reporters at NASA headquarters . 0 2377289 2377259 Estonia 's place in the European mainstream and safeguard its independence regained in 1991 . Estonia was forcibly incorporated in the Soviet Union in 1940 and regained its independence only in 1991 . 0 2110220 2110199 Franklin County Judge-Executive Teresa Barton said a firefighter was struck by lightning and was taken to the Frankfort Regional Medical Center . A county firefighter , was struck by lightning and was in stable condition at Frankfort Regional Medical Center . 0 1864253 1863810 Police suspected that Shaichat , 20 , had been abducted either by Palestinians or by Israeli Arabs . Nobody claimed responsibility for Schaichat 's death , but police suspect that the 20-year-old soldier was abducted either by Palestinians or Israeli Arabs . 0 3150803 3150839 During this year 's August to October quarter , Lowe 's opened 38 new stores , including two relocations . During the third quarter , Lowe 's opened 38 new stores and now has 932 stores in 45 states . 0 969381 969512 The technology-laced Nasdaq Composite Index < .IXIC > declined 25.78 points , or 1.56 percent , to 1,627.84 . The broader Standard & Poor 's 500 Index .SPX gave up 11.91 points , or 1.19 percent , at 986.60 . 1 271891 271839 Sony said the PSP would also feature a 4.5-inch LCD screen , Memory Stick expansion slots . It also features a 4.5 in back-lit LCD screen and memory expansion facilities . 0 2829648 2829613 Clinton did not mention that two Democratic senators , Charles Robb of Virginia and Wendell Ford of Kentucky , voted to shelve the McCain bill . Two Democrats , Sen. Charles Robb of Virginia and Wendell Ford of Kentucky , voted with the 40 Republicans . 1 886904 887158 Some of the company 's software developers will join Microsoft , but details haven 't been finalized , said Mike Nash , corporate vice president of Microsoft 's security business unit . Some of the companys software developers will join Microsoft , but details havent been finalized , said Mike Nash , corporate vice president of Microsofts security business unit . 0 2632692 2632767 Wal-Mart has said it plans to open at least 40 Supercenters in the state in the coming years ; analysts expect four or more to be in San Diego County . At least 40 of the outlets will be in California , and analysts expect four or more to be in San Diego County . 1 2240399 2240149 Cintas is battling efforts to unionize 17,000 of its workers and to let unions organize the workers by signing cards , rather than by a lengthy election process . Cintas is battling efforts to unionize 17,000 of its workers and labor 's demands to let its workers organize by signing cards , rather than by a lengthy election process . 1 805457 805985 The opposition would resort to rolling mass action " at strategic times of our choice and without warning to the dictatorship , " he said . " From now onwards we will embark on rolling mass action at strategic times of our choice and without any warning to the dictatorship , " he said . 1 2896308 2896334 Federal Agriculture Minister Warren Truss said the Government still did not know the real reason the sheep were rejected at the Saudi port of Jeddah on August 21 . He said the Government still did not know the real reason the original Saudi buyer pulled out on August 21 . 1 2110775 2110924 Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said that scenario is one among many that investigators are considering . Tom Kraynak , manager of operations and resources for the Canton , Ohio-based East Central Area Reliability Council , said investigators are considering the scenario . 1 1762569 1762526 Hester said Sanmina was the best fit among several purchase offers the company received from electronics manufacturers and computer makers . Hester said Sanmina 's offer was the best among several Newisys received from electronics manufacturers and computer makers . 0 2706154 2706185 The other inmate fell but Selenski shimmed down the makeshift rope to a second-story roof and used the mattress to scale a razor-wire fence , Fischi said . After the other inmate fell , Selenski used the mattress to scale a 10-foot , razor-wire fence , Fischi said . 1 1057995 1057778 The hearing , expected to last a week , will determine whether Akbar faces a court-martial . The purpose of the hearing is to determine whether Akbar should be court-martialled . 1 1386884 1386857 He said he has begun a court action to seize Beacon Hill 's assets and has frozen more than $ 13 million Beacon Hill had when it closed . He said he has initiated a forfeiture action in court and frozen more than $ 13 million Beacon Hill had when it closed . 1 3093023 3092996 Speaking for the first time yesterday , Brigitte 's maternal aunt said his family was unaware he had was in prison or that he had remarried . Brigitte 's maternal aunt said his family was unaware he had been sent to prison , or that he had remarried in Sydney . 1 1661381 1661317 " Close co-operation between our law enforcement agencies , close co-operation between our intelligence services lie at the heart of the ongoing fight against terrorism . " Close cooperation between regional law enforcement agencies and intelligence services was at the heart of the fight against terrorism , he said . 0 2926039 2925982 The mother of a Briton held by Colombian guerrillasspoke of her relief yesterday after hearing that he might be freed in the next few weeks . The parents of a Briton being held hostage by Colombian rebels spoke yesterday of their optimism that he would be freed in time for his birthday next month . 0 637168 637447 We strongly disagree with Novell 's position and view it as a desperate measure to curry favor with the Linux community . McBride characterized Novell 's move as " a desperate measure to curry favor with the Linux community . " 1 696677 696932 After more than two years ' detention under the State Security Bureau , the four were found guilty of subversion in Beijing 's No. 1 Intermediate Court last Wednesday . After more than two years in detention by the State Security Bureau , the four were found guilty last Wednesday of subversion . 1 3122429 3122305 Mr Russell , 46 , a coal miner from Brisbane , said : " They are obviously hurting , so we are basically going over there to help them . " " They are obviously hurting so we are basically going over there to help them , " Russell , 46 , said . 1 1348909 1348954 The New York Democrat and former first lady has said she will not run for the White House in 2004 , but has not ruled out a race in later years . The former first lady has said she will not run for the White House in 2004 but has not ruled out a race later on . 0 162203 162101 It does not affect the current Windows Media Player 9.0 Series . Windows Media Player has had security problems before . 0 71501 71627 The seizure took place at 4 a.m. on March 18 , just hours before the first American air assault . The time was about 4 a.m. on March 18 , just hours before the first pinpoint missiles rained down on the capital . 1 2907762 2907649 Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations and large branches of the United Way by 15 percent and 28.6 percent , respectively . Donations stemming from the Sept . 11 attacks helped push up contributions to human service organizations by 15 percent and to large branches of the United Way by 28.6 percent . 1 2167771 2167744 In May , Mr. Hatfill said he was struck by a vehicle being driven by an FBI employee who was tailing him in Georgetown . Last May , Hatfill was struck by a vehicle being driven by an FBI employee who was tailing him in Washington 's Georgetown neighborhood . 1 3320577 3320553 " I will support a constitutional amendment which would honor marriage between a man and a woman , codify that , " he said . " If necessary , I will support a constitutional amendment which would honour marriage between a man and a woman , codify that . " 1 849291 849442 IBM of the US and Infineon Technologies of Germany will today announce a technological development that could threaten multi-billion dollar memory chip markets . IBMof the US andInfineon Technologies of Germany willon Tuesdayannounce a technological development that could threaten multi-billion dollar memory chip markets . 0 763948 763991 Costa 's semifinal opponent is Spaniard Juan Carlos Ferrero , whom he beat in last year 's final . Costa will play Juan Carlos Ferrero next in a rematch of last year 's final . 1 1908763 1908744 A former employee of a local power company pleaded guilty Wednesday to setting off a bomb that knocked out a power substation during the Winter Olympics last year . A former Utah Power meter reader pleaded guilty Wednesday to bombing a power substation during the 2002 Winter Olympics . 0 1876120 1876059 Thyroid hormones are known to help in weight loss by stimulating metabolism - and cutting cholesterol - but come with the unwanted side effect of speeding up the heartbeat . Thyroid hormones are known to help in weight loss by stimulating metabolism , and they can help cut cholesterol too . 1 518089 518133 Judge Craig Doran said it wasn 't his role to determine if Hovan was " an evil man " but maintained that " he has committed an evil act . " Judge Craig Doran said he couldn 't determine if Hovan was " an evil man " but said he " has committed an evil act . " 0 224932 224868 The Hartford shares rose $ 2.88 , or 6.6 percent , to close Monday at $ 46.50 on the New York Stock Exchange . Shares of Hartford rose $ 2.88 to $ 46.50 in New York Stock Exchange composite trading . 1 1771131 1771091 It also offers a built-in NAND flash boot loader so that high-density NAND flash memory can be used without having to install an additional support chip . The S3C2440 has a built-in NAND flash boot loader , for example , so that high-density NAND flash memory can be installed without an additional support chip . 0 2728425 2728251 It decided instead to issue them before the stock market opened Monday after the downgrade of its debt late Friday by Moody 's , the credit rating agency . It decided instead to issue them before the stock market opened Monday to counteract the downgrade of its debt late Friday by Moody 's to one step above junk status . 0 953733 953537 Altria shares fell 2.5 percent or $ 1.11 to $ 42.57 and were the Dow 's biggest percentage loser . Its shares fell $ 9.61 to $ 50.26 , ranking as the NYSE 's most-active issue and its biggest percentage loser . 1 349215 349241 It will be followed in November by a third movie , " The Matrix Revolutions . " The film is the second of a trilogy , which will wrap up in November with " The Matrix Revolutions . " 1 2919853 2919804 Massachusetts regulators and the Securities and Exchange Commission on Tuesday pressed securities fraud charges against Putnam Investments and two of its former portfolio managers for alleged improper mutual fund trading . State and federal securities regulators filed civil charges against Putnam Investments and two portfolio managers in the ever-expanding mutual fund trading scandal . 1 954526 954607 He is blocking them until the Air Force assigns four additional C-130 cargo planes to Gowen Field , an Idaho Air National Guard base in Boise . He is holding them up until the Air Force agrees to assign four additional C-130 cargo planes to the Idaho Air National Guard . 1 69773 69792 Cisco pared spending to compensate for sluggish sales . In response to sluggish sales , Cisco pared spending . 0 2823575 2823513 The study , published Monday in the journal Molecular Brain Research , is likely to also apply to humans , its authors said . The study , conducted on the brains of developing mice , was being published today in the journal Molecular Brain Research . 1 2455942 2455978 My decision today is not based on any one event . " Governor Rowland said his decision was " not based on any one event . " 1 131979 131957 Nelson , 27 , is being retried on civil-rights charges stemming from the disturbance which led to Rosenbaum 's death . Nelson , 27 , is being retried on civil rights charges stemming from the disturbance that led to Rosenbaum 's death . 0 2010705 2010779 " The government elements who have been causing trouble are still in place . The government elements who have been causing trouble are still in place , they are attacking us . " 1 54142 53641 Next Monday at about 2 p.m. ( CST ) , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . Around the same time , hospital officials in and near Chicago will notice a sudden increase in people complaining of flu-like symptoms . 1 1015249 1015204 Wal-Mart Stores Inc . , Kohl 's Corp. , Family Dollar Stores Inc. and Big Lots Inc. were among the merchants posting May sales that fell below Wall Street 's modest expectations . Wal- Mart , Kohl 's Corp. , Family Dollar Stores Inc . , and Big Lots Inc. posted May sales that fell below Wall Street 's modest expectations . 0 753928 753890 The patch also fixes a vulnerability that results because IE does not implement an appropriate block on a file download dialog box . The second vulnerability is a result of IE not implementing a block on a file download dialog box . 1 3022833 3023029 Peterson , a former fertilizer salesman , is charged with murder in the deaths of his 27-year-old wife and the baby boy she was carrying . Peterson , 31 , is now charged with murder in the deaths of his 27-year-old wife and their unborn son . 0 751520 751373 SPOT products run a Microsoft operating system and the company 's DirectBand radio technology developed with SCA Data Systems . The DirectBand network was developed with the assistance of SCA Data Systems . 0 218848 218851 He replaces Ron Dittemore , who announced his resignation in April . Dittemore announced his plans to resign on April 23 . 1 3181118 3181443 Detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , of the arrest shortly after Perry was apprehended . Shortly after his arrest , detectives told Deasean 's father , Stelly Chisolm , a college student , and mother , Kimberly Hill , a medical assistant , about the development . 1 515581 515752 They were among about 40 people attending the traditional Jewish ceremony colored by some non-traditional touches . He said about 40 people attended the traditional Jewish ceremony colored by some nontraditional touches . 1 347022 347003 Taiwan had been relatively free of the viral infection until a fiasco at a Taipei hospital in late April caused the number of infections to skyrocket . Taiwan had been relatively free of the viral infection until a severe outbreak at a Taipei hospital in late April . 1 3311600 3311633 Mr. Rowland attended a party in South Windsor for the families of Connecticut National Guard soldiers called to active duty . Rowland was making an appearance at a holiday party for families of Connecticut National Guard soldiers assigned to duty in Iraq and Afghanistan . 0 3439114 3439084 Ross Garber , Rowland 's lawyer , said Tuesday he would attend the meeting and would ask to speak on the issue . Ross Garber , Rowland 's legal counsel , said the governor would have no comment on the condo deal . 0 487951 488007 The euro was at 1.5281 versus the Swiss franc EURCHF = , up 0.2 percent on the session , after hitting its highest since mid-2001 around 1.5292 earlier in the session . The euro was steady versus the Swiss franc after hitting its highest since mid-2001 of 1.5261 earlier in the session . 0 314997 315030 On the stand Wednesday , she said she was referring only to the kissing . On the stand Wednesday , she testified that she was referring to the kissing before the alleged rape . 0 4733 4557 Garner said the group would probably be expanded to include , for example , a Christian and perhaps another Sunni leader . The group has already met several times and Gen. Garner said it probably will be expanded to include a Christian and perhaps another Sunni Muslim leader . 1 2820371 2820525 Blair 's Foreign Secretary Jack Straw was to take his place on Monday to give a statement to parliament on the European Union . Blair 's office said his Foreign Secretary Jack Straw would take his place on Monday to give a statement to parliament on the EU meeting the prime minister attended last week . 1 801552 801516 " There were more people surrounding the clubhouse than the Unabomber 's house up in the hills , " Baker said . " There are more people surrounding the clubhouse than surrounded the Unabomber 's home in the hills . 1 1704987 1705268 Charles O. Prince , 53 , was named as Mr. Weill 's successor . Mr. Weill 's longtime confidant , Charles O. Prince , 53 , was named as his successor . 1 396041 396188 Officials are also meeting with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . Canadian officials were also expected to meet yesterday with the International Organization for Epizootics ( OIE ) , which establishes animal-health standards for the world . 0 1014983 1014963 GE stock closed Friday at $ 30.65 a share , down about 42 cents , on the New York Stock Exchange . GE 's shares closed at $ 30.65 on Friday on the New York Stock Exchange . 1 2320654 2320666 The Midwestern research center will focus on the development of diagnostic , therapeutic and vaccine products for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . The Midwestern center will focus on diagnosis , treatment and vaccines for anthrax , botulism , tularemia , hemorrhagic fever viruses and plague . 1 1057876 1057778 The hearing is to determine whether there is enough evidence to order Akbar to a general court-martial proceeding . The purpose of the hearing is to determine whether Akbar should be court-martialled . 0 2116843 2116883 In the United States , heart attacks kill about 460,000 year , in Canada about 80,000 . In the United States , heart attacks kill about 460,000 yearly , according to the National Institutes of Health . 1 1461629 1461781 Ninety-five percent of international cargo to the United States is carried by ship . Ships carry 95 percent of international cargo to the United States . 0 374015 374162 " It 's a major victory for Maine , and it 's a major victory for other states . The Maine program could be a model for other states . 1 2493369 2493428 News that oil producers were lowering their output starting in November exacerbated a sell-off that was already under way on Wall Street . News that the Organization of Petroleum Exporting Countries was lowering output starting in November exacerbated a stock sell-off already under way yesterday . 1 490355 490378 They note that after several weeks of rallies on upbeat earnings , investors are looking for stronger evidence of a recovery before sending stocks higher . After several weeks of market rallies on upbeat earnings , many investors are looking for more concrete signs of an economic recovery . 1 2691044 2691264 Most economists had expected a more dire report , with many anticipating the fifth month of job losses in six months . Most economists had been expecting a far more dire report , with many expecting to see the fifth month of job losses in six months in September . 1 1831453 1831491 But software license revenues , a measure financial analysts watch closely , decreased 21 percent to $ 107.6 million . License sales , a key measure of demand , fell 21 percent to $ 107.6 million . 1 2380695 2380822 King , brand-name writer , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters . Stephen King , master of the horror story and e-book pioneer , is receiving this year 's medal for Distinguished Contributions to American Letters from the National Book Foundation . 1 2577517 2577531 The Denver-based natural gas producer and marketer said the inaccurate reporting was discovered after it received a subpoena from the U.S. Commodity Futures Trading Commission . The natural gas producer and marketer said the inaccurate reporting was discovered in response to a subpoena from the U.S. Commodity Futures Trading Commission , or CFTC . 1 3267026 3266930 The steel tariffs , which the U.S. president imposed in March 2002 , will officially end at midnight , instead of March 2005 as initially planned . The U.S. steel tariffs , which Bush imposed in March 2002 , were to officially end at midnight Thursday ( 0500 GMT ) , instead of March 2005 as initially planned . 1 360875 360943 Business Week 's online edition reported on Friday that WorldCom and the SEC could announce a settlement as early as Monday . BusinessWeek Online has learned that the settlement could come as early as Monday , May 19 . 1 162632 162653 Only one of the five buildings in the Baghdad compound of the United Nations Development Program escaped being burned , the UN said on its Web site . Only one of the five buildings in the compound in Baghdad run by the UN Development Program , escaped being burned , the UN said on its Web site . 1 1128884 1128865 Shares of Salix have rocketed 64 percent since Axcan made its first offer on April 10 . Since the initial takeover offer , Salix shares have risen about 35 percent . 1 3264732 3264648 The jury verdict , reached Wednesday after less than four hours of deliberation , followed a 2 week trial , during which Waagner represented himself . The quick conviction followed a 2 1 / 2 week trial , during which the Venango County man represented himself . 1 1721433 1721267 It 's happened five times in the last 11 years : A disaster puts this Southwestern town in the headlines during the summer tourist season . It 's happened five times in the last decade : A disaster puts this tourist town in the headlines during summer , its busiest season . 0 146112 146127 The broader Standard & Poor 's 500 Index .SPX edged down 9 points , or 0.98 percent , to 921 . The technology-laced Nasdaq Composite Index < .IXIC > shed 15 points , or 0.98 percent , to 1,492 . 1 389117 389052 The company emphasized that McDonald 's USA does not import any raw beef or hamburger patties from Canada for McDonald 's use in the United States . McDonald 's said in a statement that it does not import any raw beef or hamburger patties from Canada for use in the United States . 1 872784 872834 Gregory Parseghian , a former investment banker , was appointed chief executive . Greg Parseghian was appointed the new chief executive . 0 2977500 2977547 Their contract will expire at 12 : 01 a.m. Wednesday instead of 12 : 01 a.m. Sunday , said Rian Wathen , organizing director for United Food and Commercial Workers Local 700 . " It has outraged the membership , " said Rian Wathen , organizing director of United Food and Commercial Workers Local 700 . 1 3107137 3107119 But plaque volume increased by 2.7 percent in pravastatin patients . The volume of plaque in Pravachol patients ' arteries rose by 3 % . 1 1619244 1619274 Today in the US , the book - kept under wraps by its publishers , G. P. Putnam 's Sons , since its inception - will appear in bookstores . Tomorrow the book , kept under wraps by G. P. Putnam 's Sons since its inception , will appear in bookstores . 0 3061836 3062031 The S & P / TSX composite rose 87.74 points on the week , while the TSX Venture Exchange composite gained 44.49 points . On the week , the Dow Jones industrial average rose 11.56 points , while the Nasdaq Stock Market gained 39.42 points . 1 485999 486011 Ex-KGB agent Putin added that the Beatles were considered ' propaganda of an alien ideology ' . In Soviet times the Beatles ' music " was considered propaganda of an alien ideology . ================================================ FILE: src/examples/tensorflow/bert_demo/latency_printer.py ================================================ latency_list = [] with open('latencies.txt', 'r') as f: for line in f: latency_list.append(float(line.rstrip())) latency_list = sorted(latency_list) l = len(latency_list) print(f'p50 latency is {latency_list[int(.5 * l)]} seconds') print(f'p90 latency is {latency_list[int(.9 * l)]} seconds') print(f'p95 latency is {latency_list[int(.95 * l)]} seconds') print(f'p99 latency is {latency_list[int(.99 * l)]} seconds') ================================================ FILE: src/examples/tensorflow/bert_demo/mrpc.proto ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ syntax = "proto3"; package mrpc; service mrpc { rpc paraphrase (TextPair) returns (YesNo) {} } message TextPair { bytes text_a = 1; bytes text_b = 2; } message YesNo { bytes message = 1; bytes prediction = 2; } ================================================ FILE: src/examples/tensorflow/bert_demo/mrpc_feature.py ================================================ # coding=utf-8 # Copyright 2018 The Google AI Language Team Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Extract pre-computed feature vectors from BERT.""" import os import csv import time import numpy as np import tokenization class InputExample(object): """A single training/test example for simple sequence classification.""" def __init__(self, guid, text_a, text_b=None, label=None): """Constructs a InputExample. Args: guid: Unique id for the example. text_a: string. The untokenized text of the first sequence. For single sequence tasks, only this sequence must be specified. text_b: (Optional) string. The untokenized text of the second sequence. Only must be specified for sequence pair tasks. label: (Optional) string. The label of the example. This should be specified for train and dev examples, but not for test examples. """ self.guid = guid self.text_a = text_a self.text_b = text_b self.label = label class PaddingInputExample(object): """Fake example so the num input examples is a multiple of the batch size. When running eval/predict on the TPU, we need to pad the number of examples to be a multiple of the batch size, because the TPU requires a fixed batch size. The alternative is to drop the last batch, which is bad because it means the entire output data won't be generated. We use this class instead of `None` because treating `None` as padding battches could cause silent errors. """ class InputFeatures(object): """A single set of features of data.""" def __init__(self, input_ids, input_mask, segment_ids, label_id, is_real_example=True): self.input_ids = input_ids self.input_mask = input_mask self.segment_ids = segment_ids self.label_id = label_id self.is_real_example = is_real_example def convert_single_example(ex_index, example, label_list, max_seq_length, tokenizer): """Converts a single `InputExample` into a single `InputFeatures`.""" if isinstance(example, PaddingInputExample): return InputFeatures( input_ids=[0] * max_seq_length, input_mask=[0] * max_seq_length, segment_ids=[0] * max_seq_length, label_id=0, is_real_example=False) label_map = {} for (i, label) in enumerate(label_list): label_map[label] = i tokens_a = tokenizer.tokenize(example.text_a) tokens_b = None if example.text_b: tokens_b = tokenizer.tokenize(example.text_b) if tokens_b: # Modifies `tokens_a` and `tokens_b` in place so that the total # length is less than the specified length. # Account for [CLS], [SEP], [SEP] with "- 3" _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3) else: # Account for [CLS] and [SEP] with "- 2" if len(tokens_a) > max_seq_length - 2: tokens_a = tokens_a[0:(max_seq_length - 2)] # The convention in BERT is: # (a) For sequence pairs: # tokens: [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP] # type_ids: 0 0 0 0 0 0 0 0 1 1 1 1 1 1 # (b) For single sequences: # tokens: [CLS] the dog is hairy . [SEP] # type_ids: 0 0 0 0 0 0 0 # # Where "type_ids" are used to indicate whether this is the first # sequence or the second sequence. The embedding vectors for `type=0` and # `type=1` were learned during pre-training and are added to the wordpiece # embedding vector (and position vector). This is not *strictly* necessary # since the [SEP] token unambiguously separates the sequences, but it makes # it easier for the model to learn the concept of sequences. # # For classification tasks, the first vector (corresponding to [CLS]) is # used as the "sentence vector". Note that this only makes sense because # the entire model is fine-tuned. tokens = [] segment_ids = [] tokens.append("[CLS]") segment_ids.append(0) for token in tokens_a: tokens.append(token) segment_ids.append(0) tokens.append("[SEP]") segment_ids.append(0) if tokens_b: for token in tokens_b: tokens.append(token) segment_ids.append(1) tokens.append("[SEP]") segment_ids.append(1) input_ids = tokenizer.convert_tokens_to_ids(tokens) # The mask has 1 for real tokens and 0 for padding tokens. Only real # tokens are attended to. input_mask = [1] * len(input_ids) # Zero-pad up to the sequence length. while len(input_ids) < max_seq_length: input_ids.append(0) input_mask.append(0) segment_ids.append(0) assert len(input_ids) == max_seq_length assert len(input_mask) == max_seq_length assert len(segment_ids) == max_seq_length label_id = label_map[example.label] feature = InputFeatures( input_ids=input_ids, input_mask=input_mask, segment_ids=segment_ids, label_id=label_id, is_real_example=True) return feature def read_tsv(input_file, quotechar=None): """Reads a tab separated value file.""" with open(input_file, "r") as f: reader = csv.reader(f, delimiter="\t", quotechar=quotechar) lines = [] for line in reader: lines.append(line) return lines def create_examples(lines, set_type): """Creates examples for the training and dev sets.""" examples = [] for (i, line) in enumerate(lines): if i == 0: continue guid = "%s-%s" % (set_type, i) text_a = tokenization.convert_to_unicode(line[3]) text_b = tokenization.convert_to_unicode(line[4]) if set_type == "test": label = "0" else: label = tokenization.convert_to_unicode(line[0]) examples.append( InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label)) return examples def _truncate_seq_pair(tokens_a, tokens_b, max_length): """Truncates a sequence pair in place to the maximum length.""" # This is a simple heuristic which will always truncate the longer sequence # one token at a time. This makes more sense than truncating an equal percent # of tokens from each, since if one sequence is very short then each token # that's truncated likely contains more information than a longer sequence. while True: total_length = len(tokens_a) + len(tokens_b) if total_length <= max_length: break if len(tokens_a) > len(tokens_b): tokens_a.pop() else: tokens_b.pop() def get_eval_model_feed_dict_list(mrpc_tsv, vocab_txt): tsv = read_tsv(mrpc_tsv) result = create_examples(tsv, "dev") model_feed_dict_list = [] for example in result: tokenizer = tokenization.FullTokenizer(vocab_file=vocab_txt, do_lower_case=True) label_list = ['0', '1'] feature = convert_single_example(ex_index=0, example=example, label_list=label_list, max_seq_length=128, tokenizer=tokenizer) pre_model_feed_dict = { 'input_ids': feature.input_ids, 'input_mask': feature.input_mask, 'segment_ids': feature.segment_ids, 'label_id': feature.label_id, 'is_real_example': feature.is_real_example, } model_feed_dict = {} for key, value in pre_model_feed_dict.items(): if key in {'label_id', 'is_real_example'}: value = np.tile(np.int32(value), reps=[1]) else: value = np.tile(np.int32(value), reps=[1, 1]) model_feed_dict[key] = value model_feed_dict_list.append(model_feed_dict) return model_feed_dict_list def text_pair_to_model_feed_dict(text_a, text_b, tokenizer): fake_tsv = [['index', '#1 ID', '#2 ID', '#1 String', '#2 String'], ['', '', '', text_a, text_b]] result = create_examples(fake_tsv, "test") example = result[0] label_list = ['0', '1'] feature = convert_single_example(ex_index=0, example=example, label_list=label_list, max_seq_length=128, tokenizer=tokenizer) return { 'input_ids': np.tile(np.int32(feature.input_ids), reps=[1, 1]), 'input_mask': np.tile(np.int32(feature.input_mask), reps=[1, 1]), 'segment_ids': np.tile(np.int32(feature.segment_ids), reps=[1, 1]), } ================================================ FILE: src/examples/tensorflow/bert_demo/mrpc_pb2.py ================================================ # -*- coding: utf-8 -*- # Generated by the protocol buffer compiler. DO NOT EDIT! # source: mrpc.proto import sys _b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1')) from google.protobuf import descriptor as _descriptor from google.protobuf import message as _message from google.protobuf import reflection as _reflection from google.protobuf import symbol_database as _symbol_database # @@protoc_insertion_point(imports) _sym_db = _symbol_database.Default() DESCRIPTOR = _descriptor.FileDescriptor( name='mrpc.proto', package='mrpc', syntax='proto3', serialized_options=None, serialized_pb=_b('\n\nmrpc.proto\x12\x04mrpc\"*\n\x08TextPair\x12\x0e\n\x06text_a\x18\x01 \x01(\x0c\x12\x0e\n\x06text_b\x18\x02 \x01(\x0c\",\n\x05YesNo\x12\x0f\n\x07message\x18\x01 \x01(\x0c\x12\x12\n\nprediction\x18\x02 \x01(\x0c\x32\x33\n\x04mrpc\x12+\n\nparaphrase\x12\x0e.mrpc.TextPair\x1a\x0b.mrpc.YesNo\"\x00\x62\x06proto3') ) _TEXTPAIR = _descriptor.Descriptor( name='TextPair', full_name='mrpc.TextPair', filename=None, file=DESCRIPTOR, containing_type=None, fields=[ _descriptor.FieldDescriptor( name='text_a', full_name='mrpc.TextPair.text_a', index=0, number=1, type=12, cpp_type=9, label=1, has_default_value=False, default_value=_b(""), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR), _descriptor.FieldDescriptor( name='text_b', full_name='mrpc.TextPair.text_b', index=1, number=2, type=12, cpp_type=9, label=1, has_default_value=False, default_value=_b(""), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR), ], extensions=[ ], nested_types=[], enum_types=[ ], serialized_options=None, is_extendable=False, syntax='proto3', extension_ranges=[], oneofs=[ ], serialized_start=20, serialized_end=62, ) _YESNO = _descriptor.Descriptor( name='YesNo', full_name='mrpc.YesNo', filename=None, file=DESCRIPTOR, containing_type=None, fields=[ _descriptor.FieldDescriptor( name='message', full_name='mrpc.YesNo.message', index=0, number=1, type=12, cpp_type=9, label=1, has_default_value=False, default_value=_b(""), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR), _descriptor.FieldDescriptor( name='prediction', full_name='mrpc.YesNo.prediction', index=1, number=2, type=12, cpp_type=9, label=1, has_default_value=False, default_value=_b(""), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR), ], extensions=[ ], nested_types=[], enum_types=[ ], serialized_options=None, is_extendable=False, syntax='proto3', extension_ranges=[], oneofs=[ ], serialized_start=64, serialized_end=108, ) DESCRIPTOR.message_types_by_name['TextPair'] = _TEXTPAIR DESCRIPTOR.message_types_by_name['YesNo'] = _YESNO _sym_db.RegisterFileDescriptor(DESCRIPTOR) TextPair = _reflection.GeneratedProtocolMessageType('TextPair', (_message.Message,), { 'DESCRIPTOR' : _TEXTPAIR, '__module__' : 'mrpc_pb2' # @@protoc_insertion_point(class_scope:mrpc.TextPair) }) _sym_db.RegisterMessage(TextPair) YesNo = _reflection.GeneratedProtocolMessageType('YesNo', (_message.Message,), { 'DESCRIPTOR' : _YESNO, '__module__' : 'mrpc_pb2' # @@protoc_insertion_point(class_scope:mrpc.YesNo) }) _sym_db.RegisterMessage(YesNo) _MRPC = _descriptor.ServiceDescriptor( name='mrpc', full_name='mrpc.mrpc', file=DESCRIPTOR, index=0, serialized_options=None, serialized_start=110, serialized_end=161, methods=[ _descriptor.MethodDescriptor( name='paraphrase', full_name='mrpc.mrpc.paraphrase', index=0, containing_service=None, input_type=_TEXTPAIR, output_type=_YESNO, serialized_options=None, ), ]) _sym_db.RegisterServiceDescriptor(_MRPC) DESCRIPTOR.services_by_name['mrpc'] = _MRPC # @@protoc_insertion_point(module_scope) ================================================ FILE: src/examples/tensorflow/bert_demo/mrpc_pb2_grpc.py ================================================ # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ # Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT! import grpc import mrpc_pb2 as mrpc__pb2 class mrpcStub(object): # missing associated documentation comment in .proto file pass def __init__(self, channel): """Constructor. Args: channel: A grpc.Channel. """ self.paraphrase = channel.unary_unary( '/mrpc.mrpc/paraphrase', request_serializer=mrpc__pb2.TextPair.SerializeToString, response_deserializer=mrpc__pb2.YesNo.FromString, ) class mrpcServicer(object): # missing associated documentation comment in .proto file pass def paraphrase(self, request, context): # missing associated documentation comment in .proto file pass context.set_code(grpc.StatusCode.UNIMPLEMENTED) context.set_details('Method not implemented!') raise NotImplementedError('Method not implemented!') def add_mrpcServicer_to_server(servicer, server): rpc_method_handlers = { 'paraphrase': grpc.unary_unary_rpc_method_handler( servicer.paraphrase, request_deserializer=mrpc__pb2.TextPair.FromString, response_serializer=mrpc__pb2.YesNo.SerializeToString, ), } generic_handler = grpc.method_handlers_generic_handler( 'mrpc.mrpc', rpc_method_handlers) server.add_generic_rpc_handlers((generic_handler,)) ================================================ FILE: src/examples/tensorflow/bert_demo/protoc.sh ================================================ #!/bin/bash # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ python -m grpc_tools.protoc -I . --python_out=. --grpc_python_out=. mrpc.proto ================================================ FILE: src/examples/tensorflow/bert_demo/setup.py ================================================ import setuptools setuptools.setup( name='bert-demo', version='2019.12.13', description='BERT Client-Server Demo', author='Amazon AWS', author_email='aws-neuron-support@amazon.com', license='BSD', classifiers=[ 'Development Status :: 1 - Planning', 'Intended Audience :: Developers', 'Topic :: Scientific/Engineering :: Artificial Intelligence', 'License :: OSI Approved :: BSD License', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', ], keywords='bert', include_package_data=True, packages=setuptools.PEP420PackageFinder.find(), package_data={'': [ '*', ]}, entry_points={ 'console_scripts': [ 'neuron_bert_model=bert_demo.bert_model:main', 'bert_server=bert_demo.bert_server:serve', 'bert_client=bert_demo.bert_client:client', ], }, install_requires=[ ], ) ================================================ FILE: src/examples/tensorflow/bert_demo/tokenization.py ================================================ # coding=utf-8 # Copyright 2018 The Google AI Language Team Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. """Tokenization classes.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function import collections import re import unicodedata import six def validate_case_matches_checkpoint(do_lower_case, init_checkpoint): """Checks whether the casing config is consistent with the checkpoint name.""" # The casing has to be passed in by the user and there is no explicit check # as to whether it matches the checkpoint. The casing information probably # should have been stored in the bert_config.json file, but it's not, so # we have to heuristically detect it to validate. if not init_checkpoint: return m = re.match("^.*?([A-Za-z0-9_-]+)/bert_model.ckpt", init_checkpoint) if m is None: return model_name = m.group(1) lower_models = [ "uncased_L-24_H-1024_A-16", "uncased_L-12_H-768_A-12", "multilingual_L-12_H-768_A-12", "chinese_L-12_H-768_A-12" ] cased_models = [ "cased_L-12_H-768_A-12", "cased_L-24_H-1024_A-16", "multi_cased_L-12_H-768_A-12" ] is_bad_config = False if model_name in lower_models and not do_lower_case: is_bad_config = True actual_flag = "False" case_name = "lowercased" opposite_flag = "True" if model_name in cased_models and do_lower_case: is_bad_config = True actual_flag = "True" case_name = "cased" opposite_flag = "False" if is_bad_config: raise ValueError( "You passed in `--do_lower_case=%s` with `--init_checkpoint=%s`. " "However, `%s` seems to be a %s model, so you " "should pass in `--do_lower_case=%s` so that the fine-tuning matches " "how the model was pre-training. If this error is wrong, please " "just comment out this check." % (actual_flag, init_checkpoint, model_name, case_name, opposite_flag)) def convert_to_unicode(text): """Converts `text` to Unicode (if it's not already), assuming utf-8 input.""" if six.PY3: if isinstance(text, str): return text elif isinstance(text, bytes): return text.decode("utf-8", "ignore") else: raise ValueError("Unsupported string type: %s" % (type(text))) elif six.PY2: if isinstance(text, str): return text.decode("utf-8", "ignore") elif isinstance(text, unicode): return text else: raise ValueError("Unsupported string type: %s" % (type(text))) else: raise ValueError("Not running on Python2 or Python 3?") def printable_text(text): """Returns text encoded in a way suitable for print or `tf.logging`.""" # These functions want `str` for both Python2 and Python3, but in one case # it's a Unicode string and in the other it's a byte string. if six.PY3: if isinstance(text, str): return text elif isinstance(text, bytes): return text.decode("utf-8", "ignore") else: raise ValueError("Unsupported string type: %s" % (type(text))) elif six.PY2: if isinstance(text, str): return text elif isinstance(text, unicode): return text.encode("utf-8") else: raise ValueError("Unsupported string type: %s" % (type(text))) else: raise ValueError("Not running on Python2 or Python 3?") def load_vocab(vocab_file): """Loads a vocabulary file into a dictionary.""" vocab = collections.OrderedDict() index = 0 with open(vocab_file, "r") as reader: while True: token = convert_to_unicode(reader.readline()) if not token: break token = token.strip() vocab[token] = index index += 1 return vocab def convert_by_vocab(vocab, items): """Converts a sequence of [tokens|ids] using the vocab.""" output = [] for item in items: output.append(vocab[item]) return output def convert_tokens_to_ids(vocab, tokens): return convert_by_vocab(vocab, tokens) def convert_ids_to_tokens(inv_vocab, ids): return convert_by_vocab(inv_vocab, ids) def whitespace_tokenize(text): """Runs basic whitespace cleaning and splitting on a piece of text.""" text = text.strip() if not text: return [] tokens = text.split() return tokens class FullTokenizer(object): """Runs end-to-end tokenziation.""" def __init__(self, vocab_file, do_lower_case=True): self.vocab = load_vocab(vocab_file) self.inv_vocab = {v: k for k, v in self.vocab.items()} self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case) self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab) def tokenize(self, text): split_tokens = [] for token in self.basic_tokenizer.tokenize(text): for sub_token in self.wordpiece_tokenizer.tokenize(token): split_tokens.append(sub_token) return split_tokens def convert_tokens_to_ids(self, tokens): return convert_by_vocab(self.vocab, tokens) def convert_ids_to_tokens(self, ids): return convert_by_vocab(self.inv_vocab, ids) class BasicTokenizer(object): """Runs basic tokenization (punctuation splitting, lower casing, etc.).""" def __init__(self, do_lower_case=True): """Constructs a BasicTokenizer. Args: do_lower_case: Whether to lower case the input. """ self.do_lower_case = do_lower_case def tokenize(self, text): """Tokenizes a piece of text.""" text = convert_to_unicode(text) text = self._clean_text(text) # This was added on November 1st, 2018 for the multilingual and Chinese # models. This is also applied to the English models now, but it doesn't # matter since the English models were not trained on any Chinese data # and generally don't have any Chinese data in them (there are Chinese # characters in the vocabulary because Wikipedia does have some Chinese # words in the English Wikipedia.). text = self._tokenize_chinese_chars(text) orig_tokens = whitespace_tokenize(text) split_tokens = [] for token in orig_tokens: if self.do_lower_case: token = token.lower() token = self._run_strip_accents(token) split_tokens.extend(self._run_split_on_punc(token)) output_tokens = whitespace_tokenize(" ".join(split_tokens)) return output_tokens def _run_strip_accents(self, text): """Strips accents from a piece of text.""" text = unicodedata.normalize("NFD", text) output = [] for char in text: cat = unicodedata.category(char) if cat == "Mn": continue output.append(char) return "".join(output) def _run_split_on_punc(self, text): """Splits punctuation on a piece of text.""" chars = list(text) i = 0 start_new_word = True output = [] while i < len(chars): char = chars[i] if _is_punctuation(char): output.append([char]) start_new_word = True else: if start_new_word: output.append([]) start_new_word = False output[-1].append(char) i += 1 return ["".join(x) for x in output] def _tokenize_chinese_chars(self, text): """Adds whitespace around any CJK character.""" output = [] for char in text: cp = ord(char) if self._is_chinese_char(cp): output.append(" ") output.append(char) output.append(" ") else: output.append(char) return "".join(output) def _is_chinese_char(self, cp): """Checks whether CP is the codepoint of a CJK character.""" # This defines a "chinese character" as anything in the CJK Unicode block: # https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block) # # Note that the CJK Unicode block is NOT all Japanese and Korean characters, # despite its name. The modern Korean Hangul alphabet is a different block, # as is Japanese Hiragana and Katakana. Those alphabets are used to write # space-separated words, so they are not treated specially and handled # like the all of the other languages. if ((cp >= 0x4E00 and cp <= 0x9FFF) or # (cp >= 0x3400 and cp <= 0x4DBF) or # (cp >= 0x20000 and cp <= 0x2A6DF) or # (cp >= 0x2A700 and cp <= 0x2B73F) or # (cp >= 0x2B740 and cp <= 0x2B81F) or # (cp >= 0x2B820 and cp <= 0x2CEAF) or (cp >= 0xF900 and cp <= 0xFAFF) or # (cp >= 0x2F800 and cp <= 0x2FA1F)): # return True return False def _clean_text(self, text): """Performs invalid character removal and whitespace cleanup on text.""" output = [] for char in text: cp = ord(char) if cp == 0 or cp == 0xfffd or _is_control(char): continue if _is_whitespace(char): output.append(" ") else: output.append(char) return "".join(output) class WordpieceTokenizer(object): """Runs WordPiece tokenziation.""" def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=200): self.vocab = vocab self.unk_token = unk_token self.max_input_chars_per_word = max_input_chars_per_word def tokenize(self, text): """Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary. For example: input = "unaffable" output = ["un", "##aff", "##able"] Args: text: A single token or whitespace separated tokens. This should have already been passed through `BasicTokenizer. Returns: A list of wordpiece tokens. """ text = convert_to_unicode(text) output_tokens = [] for token in whitespace_tokenize(text): chars = list(token) if len(chars) > self.max_input_chars_per_word: output_tokens.append(self.unk_token) continue is_bad = False start = 0 sub_tokens = [] while start < len(chars): end = len(chars) cur_substr = None while start < end: substr = "".join(chars[start:end]) if start > 0: substr = "##" + substr if substr in self.vocab: cur_substr = substr break end -= 1 if cur_substr is None: is_bad = True break sub_tokens.append(cur_substr) start = end if is_bad: output_tokens.append(self.unk_token) else: output_tokens.extend(sub_tokens) return output_tokens def _is_whitespace(char): """Checks whether `chars` is a whitespace character.""" # \t, \n, and \r are technically contorl characters but we treat them # as whitespace since they are generally considered as such. if char == " " or char == "\t" or char == "\n" or char == "\r": return True cat = unicodedata.category(char) if cat == "Zs": return True return False def _is_control(char): """Checks whether `chars` is a control character.""" # These are technically control characters but we count them as whitespace # characters. if char == "\t" or char == "\n" or char == "\r": return False cat = unicodedata.category(char) if cat in ("Cc", "Cf"): return True return False def _is_punctuation(char): """Checks whether `chars` is a punctuation character.""" cp = ord(char) # We treat all non-letter/number ASCII as punctuation. # Characters such as "^", "$", and "`" are not in the Unicode # Punctuation class but we treat them as punctuation anyways, for # consistency. if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)): return True cat = unicodedata.category(char) if cat.startswith("P"): return True return False ================================================ FILE: src/examples/tensorflow/bert_demo/tune_save.sh ================================================ #!/bin/bash pushd $BERT_REPO_DIR python run_classifier.py \ --task_name=MRPC \ --do_train=true \ --do_eval=true \ --do_predict=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 \ --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=$BERT_REPO_DIR/MRPC_finetune python run_classifier.py \ --task_name=MRPC \ --do_predict=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 \ --output_dir=$BERT_REPO_DIR/MRPC_finetune popd ================================================ FILE: src/examples/tensorflow/bert_demo/uncased_L-24_H-1024_A-16.vocab.txt ================================================ [PAD] [unused0] [unused1] [unused2] [unused3] [unused4] [unused5] [unused6] [unused7] [unused8] [unused9] [unused10] [unused11] [unused12] [unused13] [unused14] [unused15] [unused16] [unused17] [unused18] [unused19] [unused20] [unused21] [unused22] [unused23] [unused24] [unused25] [unused26] [unused27] [unused28] [unused29] [unused30] [unused31] [unused32] [unused33] [unused34] [unused35] [unused36] [unused37] [unused38] [unused39] [unused40] [unused41] [unused42] [unused43] [unused44] [unused45] [unused46] [unused47] [unused48] [unused49] [unused50] [unused51] [unused52] [unused53] [unused54] [unused55] [unused56] [unused57] [unused58] [unused59] [unused60] [unused61] [unused62] [unused63] [unused64] [unused65] [unused66] [unused67] [unused68] [unused69] [unused70] [unused71] [unused72] [unused73] [unused74] [unused75] [unused76] [unused77] [unused78] [unused79] [unused80] [unused81] [unused82] [unused83] [unused84] [unused85] [unused86] [unused87] [unused88] [unused89] [unused90] [unused91] [unused92] [unused93] [unused94] [unused95] [unused96] [unused97] [unused98] [UNK] [CLS] [SEP] [MASK] [unused99] [unused100] [unused101] [unused102] [unused103] [unused104] [unused105] [unused106] [unused107] [unused108] [unused109] [unused110] [unused111] [unused112] [unused113] [unused114] [unused115] [unused116] [unused117] [unused118] [unused119] [unused120] [unused121] [unused122] [unused123] [unused124] [unused125] [unused126] [unused127] [unused128] [unused129] [unused130] [unused131] [unused132] [unused133] [unused134] [unused135] [unused136] [unused137] [unused138] [unused139] [unused140] [unused141] [unused142] [unused143] [unused144] [unused145] [unused146] [unused147] [unused148] [unused149] [unused150] [unused151] [unused152] [unused153] [unused154] [unused155] [unused156] [unused157] [unused158] [unused159] [unused160] [unused161] [unused162] [unused163] [unused164] [unused165] [unused166] [unused167] [unused168] [unused169] [unused170] [unused171] [unused172] [unused173] [unused174] [unused175] [unused176] [unused177] [unused178] [unused179] [unused180] [unused181] [unused182] [unused183] [unused184] [unused185] [unused186] [unused187] [unused188] [unused189] [unused190] [unused191] [unused192] [unused193] [unused194] [unused195] [unused196] [unused197] [unused198] [unused199] [unused200] [unused201] [unused202] [unused203] [unused204] [unused205] [unused206] [unused207] [unused208] [unused209] [unused210] [unused211] [unused212] [unused213] [unused214] [unused215] [unused216] [unused217] [unused218] [unused219] [unused220] [unused221] [unused222] [unused223] [unused224] [unused225] [unused226] [unused227] [unused228] [unused229] [unused230] [unused231] [unused232] [unused233] [unused234] [unused235] [unused236] [unused237] [unused238] [unused239] [unused240] [unused241] [unused242] [unused243] [unused244] [unused245] [unused246] [unused247] [unused248] [unused249] [unused250] [unused251] [unused252] [unused253] [unused254] [unused255] [unused256] [unused257] [unused258] [unused259] [unused260] [unused261] [unused262] [unused263] [unused264] [unused265] [unused266] [unused267] [unused268] [unused269] [unused270] [unused271] [unused272] [unused273] [unused274] [unused275] [unused276] [unused277] [unused278] [unused279] [unused280] [unused281] [unused282] [unused283] [unused284] [unused285] [unused286] [unused287] [unused288] [unused289] [unused290] [unused291] [unused292] [unused293] [unused294] [unused295] [unused296] [unused297] [unused298] [unused299] [unused300] [unused301] [unused302] [unused303] [unused304] [unused305] [unused306] [unused307] [unused308] [unused309] [unused310] [unused311] [unused312] [unused313] [unused314] [unused315] [unused316] [unused317] [unused318] [unused319] [unused320] [unused321] [unused322] [unused323] [unused324] [unused325] [unused326] [unused327] [unused328] [unused329] [unused330] [unused331] [unused332] [unused333] [unused334] [unused335] [unused336] [unused337] [unused338] [unused339] [unused340] [unused341] [unused342] [unused343] [unused344] [unused345] [unused346] [unused347] [unused348] [unused349] [unused350] [unused351] [unused352] [unused353] [unused354] [unused355] [unused356] [unused357] [unused358] [unused359] [unused360] [unused361] [unused362] [unused363] [unused364] [unused365] [unused366] [unused367] [unused368] [unused369] [unused370] [unused371] [unused372] [unused373] [unused374] [unused375] [unused376] [unused377] [unused378] [unused379] [unused380] [unused381] [unused382] [unused383] [unused384] [unused385] [unused386] [unused387] [unused388] [unused389] [unused390] [unused391] [unused392] [unused393] [unused394] [unused395] [unused396] [unused397] [unused398] [unused399] [unused400] [unused401] [unused402] [unused403] [unused404] [unused405] [unused406] [unused407] [unused408] [unused409] [unused410] [unused411] [unused412] [unused413] [unused414] [unused415] [unused416] [unused417] [unused418] [unused419] [unused420] [unused421] [unused422] [unused423] [unused424] [unused425] [unused426] [unused427] [unused428] [unused429] [unused430] [unused431] [unused432] [unused433] [unused434] [unused435] [unused436] [unused437] [unused438] [unused439] [unused440] [unused441] [unused442] [unused443] [unused444] [unused445] [unused446] [unused447] [unused448] [unused449] [unused450] [unused451] [unused452] [unused453] [unused454] [unused455] [unused456] [unused457] [unused458] [unused459] [unused460] [unused461] [unused462] [unused463] [unused464] [unused465] [unused466] [unused467] [unused468] [unused469] [unused470] [unused471] [unused472] [unused473] [unused474] [unused475] [unused476] [unused477] [unused478] [unused479] [unused480] [unused481] [unused482] [unused483] [unused484] [unused485] [unused486] [unused487] [unused488] [unused489] [unused490] [unused491] [unused492] [unused493] [unused494] [unused495] [unused496] [unused497] [unused498] [unused499] [unused500] [unused501] [unused502] [unused503] [unused504] [unused505] [unused506] [unused507] [unused508] [unused509] [unused510] [unused511] [unused512] [unused513] [unused514] [unused515] [unused516] [unused517] [unused518] [unused519] [unused520] [unused521] [unused522] [unused523] [unused524] [unused525] [unused526] [unused527] [unused528] [unused529] [unused530] [unused531] [unused532] [unused533] [unused534] [unused535] [unused536] [unused537] [unused538] [unused539] [unused540] [unused541] [unused542] [unused543] [unused544] [unused545] [unused546] [unused547] [unused548] [unused549] [unused550] [unused551] [unused552] [unused553] [unused554] [unused555] [unused556] [unused557] [unused558] [unused559] [unused560] [unused561] [unused562] [unused563] [unused564] [unused565] [unused566] [unused567] [unused568] [unused569] [unused570] [unused571] [unused572] [unused573] [unused574] [unused575] [unused576] [unused577] [unused578] [unused579] [unused580] [unused581] [unused582] [unused583] [unused584] [unused585] [unused586] [unused587] [unused588] [unused589] [unused590] [unused591] [unused592] [unused593] [unused594] [unused595] [unused596] [unused597] [unused598] [unused599] [unused600] [unused601] [unused602] [unused603] [unused604] [unused605] [unused606] [unused607] [unused608] [unused609] [unused610] [unused611] [unused612] [unused613] [unused614] [unused615] [unused616] [unused617] [unused618] [unused619] [unused620] [unused621] [unused622] [unused623] [unused624] [unused625] [unused626] [unused627] [unused628] [unused629] [unused630] [unused631] [unused632] [unused633] [unused634] [unused635] [unused636] [unused637] [unused638] [unused639] [unused640] [unused641] [unused642] [unused643] [unused644] [unused645] [unused646] [unused647] [unused648] [unused649] [unused650] [unused651] [unused652] [unused653] [unused654] [unused655] [unused656] [unused657] [unused658] [unused659] [unused660] [unused661] [unused662] [unused663] [unused664] [unused665] [unused666] [unused667] [unused668] [unused669] [unused670] [unused671] [unused672] [unused673] [unused674] [unused675] [unused676] [unused677] [unused678] [unused679] [unused680] [unused681] [unused682] [unused683] [unused684] [unused685] [unused686] [unused687] [unused688] [unused689] [unused690] [unused691] [unused692] [unused693] [unused694] [unused695] [unused696] [unused697] [unused698] [unused699] [unused700] [unused701] [unused702] [unused703] [unused704] [unused705] [unused706] [unused707] [unused708] [unused709] [unused710] [unused711] [unused712] [unused713] [unused714] [unused715] [unused716] [unused717] [unused718] [unused719] [unused720] [unused721] [unused722] [unused723] [unused724] [unused725] [unused726] [unused727] [unused728] [unused729] [unused730] [unused731] [unused732] [unused733] [unused734] [unused735] [unused736] [unused737] [unused738] [unused739] [unused740] [unused741] [unused742] [unused743] [unused744] [unused745] [unused746] [unused747] [unused748] [unused749] [unused750] [unused751] [unused752] [unused753] [unused754] [unused755] [unused756] [unused757] [unused758] [unused759] [unused760] [unused761] [unused762] [unused763] [unused764] [unused765] [unused766] [unused767] [unused768] [unused769] [unused770] [unused771] [unused772] [unused773] [unused774] [unused775] [unused776] [unused777] [unused778] [unused779] [unused780] [unused781] [unused782] [unused783] [unused784] [unused785] [unused786] [unused787] [unused788] [unused789] [unused790] [unused791] [unused792] [unused793] [unused794] [unused795] [unused796] [unused797] [unused798] [unused799] [unused800] [unused801] [unused802] [unused803] [unused804] [unused805] [unused806] [unused807] [unused808] [unused809] [unused810] [unused811] [unused812] [unused813] [unused814] [unused815] [unused816] [unused817] [unused818] [unused819] [unused820] [unused821] [unused822] [unused823] [unused824] [unused825] [unused826] [unused827] [unused828] [unused829] [unused830] [unused831] [unused832] [unused833] [unused834] [unused835] [unused836] [unused837] [unused838] [unused839] [unused840] [unused841] [unused842] [unused843] [unused844] [unused845] [unused846] [unused847] [unused848] [unused849] [unused850] [unused851] [unused852] [unused853] [unused854] [unused855] [unused856] [unused857] [unused858] [unused859] [unused860] [unused861] [unused862] [unused863] [unused864] [unused865] [unused866] [unused867] [unused868] [unused869] [unused870] [unused871] [unused872] [unused873] [unused874] [unused875] [unused876] [unused877] [unused878] [unused879] [unused880] [unused881] [unused882] [unused883] [unused884] [unused885] [unused886] [unused887] [unused888] [unused889] [unused890] [unused891] [unused892] [unused893] [unused894] [unused895] [unused896] [unused897] [unused898] [unused899] [unused900] [unused901] [unused902] [unused903] [unused904] [unused905] [unused906] [unused907] [unused908] [unused909] [unused910] [unused911] [unused912] [unused913] [unused914] [unused915] [unused916] [unused917] [unused918] [unused919] [unused920] [unused921] [unused922] [unused923] [unused924] [unused925] [unused926] [unused927] [unused928] [unused929] [unused930] [unused931] [unused932] [unused933] [unused934] [unused935] [unused936] [unused937] [unused938] [unused939] [unused940] [unused941] [unused942] [unused943] [unused944] [unused945] [unused946] [unused947] [unused948] [unused949] [unused950] [unused951] [unused952] [unused953] [unused954] [unused955] [unused956] [unused957] [unused958] [unused959] [unused960] [unused961] [unused962] [unused963] [unused964] [unused965] [unused966] [unused967] [unused968] [unused969] [unused970] [unused971] [unused972] [unused973] [unused974] [unused975] [unused976] [unused977] [unused978] [unused979] [unused980] [unused981] [unused982] [unused983] [unused984] [unused985] [unused986] [unused987] [unused988] [unused989] [unused990] [unused991] [unused992] [unused993] ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ° ± ² ³ ´ µ ¶ · ¹ º » ¼ ½ ¾ ¿ × ß æ ð ÷ ø þ đ ħ ı ł ŋ œ ƒ ɐ ɑ ɒ ɔ ɕ ə ɛ ɡ ɣ ɨ ɪ ɫ ɬ ɯ ɲ ɴ ɹ ɾ ʀ ʁ ʂ ʃ ʉ ʊ ʋ ʌ ʎ ʐ ʑ ʒ ʔ ʰ ʲ ʳ ʷ ʸ ʻ ʼ ʾ ʿ ˈ ː ˡ ˢ ˣ ˤ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω а б в г д е ж з и к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я ђ є і ј љ њ ћ ӏ ա բ գ դ ե թ ի լ կ հ մ յ ն ո պ ս վ տ ր ւ ք ־ א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן נ ס ע ף פ ץ צ ק ר ש ת ، ء ا ب ة ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ـ ف ق ك ل م ن ه و ى ي ٹ پ چ ک گ ں ھ ہ ی ے अ आ उ ए क ख ग च ज ट ड ण त थ द ध न प ब भ म य र ल व श ष स ह ा ि ी ो । ॥ ং অ আ ই উ এ ও ক খ গ চ ছ জ ট ড ণ ত থ দ ধ ন প ব ভ ম য র ল শ ষ স হ া ি ী ে க ச ட த ந ன ப ம ய ர ல ள வ ா ி ு ே ை ನ ರ ಾ ක ය ර ල ව ා ก ง ต ท น พ ม ย ร ล ว ส อ า เ ་ ། ག ང ད ན པ བ མ འ ར ལ ས မ ა ბ გ დ ე ვ თ ი კ ლ მ ნ ო რ ს ტ უ ᄀ ᄂ ᄃ ᄅ ᄆ ᄇ ᄉ ᄊ ᄋ ᄌ ᄎ ᄏ ᄐ ᄑ ᄒ ᅡ ᅢ ᅥ ᅦ ᅧ ᅩ ᅪ ᅭ ᅮ ᅯ ᅲ ᅳ ᅴ ᅵ ᆨ ᆫ ᆯ ᆷ ᆸ ᆼ ᴬ ᴮ ᴰ ᴵ ᴺ ᵀ ᵃ ᵇ ᵈ ᵉ ᵍ ᵏ ᵐ ᵒ ᵖ ᵗ ᵘ ᵢ ᵣ ᵤ ᵥ ᶜ ᶠ ‐ ‑ ‒ – — ― ‖ ‘ ’ ‚ “ ” „ † ‡ • … ‰ ′ ″ › ‿ ⁄ ⁰ ⁱ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁿ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₍ ₎ ₐ ₑ ₒ ₓ ₕ ₖ ₗ ₘ ₙ ₚ ₛ ₜ ₤ ₩ € ₱ ₹ ℓ № ℝ ™ ⅓ ⅔ ← ↑ → ↓ ↔ ↦ ⇄ ⇌ ⇒ ∂ ∅ ∆ ∇ ∈ − ∗ ∘ √ ∞ ∧ ∨ ∩ ∪ ≈ ≡ ≤ ≥ ⊂ ⊆ ⊕ ⊗ ⋅ ─ │ ■ ▪ ● ★ ☆ ☉ ♠ ♣ ♥ ♦ ♭ ♯ ⟨ ⟩ ⱼ ⺩ ⺼ ⽥ 、 。 〈 〉 《 》 「 」 『 』 〜 あ い う え お か き く け こ さ し す せ そ た ち っ つ て と な に ぬ ね の は ひ ふ へ ほ ま み む め も や ゆ よ ら り る れ ろ を ん ァ ア ィ イ ウ ェ エ オ カ キ ク ケ コ サ シ ス セ タ チ ッ ツ テ ト ナ ニ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ャ ュ ョ ラ リ ル レ ロ ワ ン ・ ー 一 三 上 下 不 世 中 主 久 之 也 事 二 五 井 京 人 亻 仁 介 代 仮 伊 会 佐 侍 保 信 健 元 光 八 公 内 出 分 前 劉 力 加 勝 北 区 十 千 南 博 原 口 古 史 司 合 吉 同 名 和 囗 四 国 國 土 地 坂 城 堂 場 士 夏 外 大 天 太 夫 奈 女 子 学 宀 宇 安 宗 定 宣 宮 家 宿 寺 將 小 尚 山 岡 島 崎 川 州 巿 帝 平 年 幸 广 弘 張 彳 後 御 德 心 忄 志 忠 愛 成 我 戦 戸 手 扌 政 文 新 方 日 明 星 春 昭 智 曲 書 月 有 朝 木 本 李 村 東 松 林 森 楊 樹 橋 歌 止 正 武 比 氏 民 水 氵 氷 永 江 沢 河 治 法 海 清 漢 瀬 火 版 犬 王 生 田 男 疒 発 白 的 皇 目 相 省 真 石 示 社 神 福 禾 秀 秋 空 立 章 竹 糹 美 義 耳 良 艹 花 英 華 葉 藤 行 街 西 見 訁 語 谷 貝 貴 車 軍 辶 道 郎 郡 部 都 里 野 金 鈴 镇 長 門 間 阝 阿 陳 陽 雄 青 面 風 食 香 馬 高 龍 龸 fi fl ! ( ) , - . / : ? ~ the of and in to was he is as for on with that it his by at from her ##s she you had an were but be this are not my they one which or have him me first all also their has up who out been when after there into new two its ##a time would no what about said we over then other so more ##e can if like back them only some could ##i where just ##ing during before ##n do ##o made school through than now years most world may between down well three ##d year while will ##ed ##r ##y later ##t city under around did such being used state people part know against your many second university both national ##er these don known off way until re how even get head ... didn ##ly team american because de ##l born united film since still long work south us became any high again day family see right man eyes house season war states including took life north same each called name much place however go four group another found won area here going 10 away series left home music best make hand number company several never last john 000 very album take end good too following released game played little began district ##m old want those side held own early county ll league use west ##u face think ##es 2010 government ##h march came small general town june ##on line based something ##k september thought looked along international 2011 air july club went january october our august april york 12 few 2012 2008 east show member college 2009 father public ##us come men five set station church ##c next former november room party located december 2013 age got 2007 ##g system let love 2006 though every 2014 look song water century without body black night within great women single ve building large population river named band white started ##an once 15 20 should 18 2015 service top built british open death king moved local times children february book why 11 door need president order final road wasn although due major died village third knew 2016 asked turned st wanted say ##p together received main son served different ##en behind himself felt members power football law voice play ##in near park history 30 having 2005 16 ##man saw mother ##al army point front help english street art late hands games award ##ia young 14 put published country division across told 13 often ever french london center six red 2017 led days include light 25 find tell among species really according central half 2004 form original gave office making enough lost full opened must included live given german player run business woman community cup might million land 2000 court development 17 short round ii km seen class story always become sure research almost director council la ##2 career things using island ##z couldn car ##is 24 close force ##1 better free support control field students 2003 education married ##b nothing worked others record big inside level anything continued give james ##3 military established non returned feel does title written thing feet william far co association hard already 2002 ##ra championship human western 100 ##na department hall role various production 21 19 heart 2001 living fire version ##ers ##f television royal ##4 produced working act case society region present radio period looking least total keep england wife program per brother mind special 22 ##le am works soon ##6 political george services taken created ##7 further able reached david union joined upon done important social information either ##ic ##x appeared position ground lead rock dark election 23 board france hair course arms site police girl instead real sound ##v words moment ##te someone ##8 summer project announced san less wrote past followed ##5 blue founded al finally india taking records america ##ne 1999 design considered northern god stop battle toward european outside described track today playing language 28 call 26 heard professional low australia miles california win yet green ##ie trying blood ##ton southern science maybe everything match square 27 mouth video race recorded leave above ##9 daughter points space 1998 museum change middle common ##0 move tv post ##ta lake seven tried elected closed ten paul minister ##th months start chief return canada person sea release similar modern brought rest hit formed mr ##la 1997 floor event doing thomas 1996 robert care killed training star week needed turn finished railway rather news health sent example ran term michael coming currently yes forces despite gold areas 50 stage fact 29 dead says popular 2018 originally germany probably developed result pulled friend stood money running mi signed word songs child eventually met tour average teams minutes festival current deep kind 1995 decided usually eastern seemed ##ness episode bed added table indian private charles route available idea throughout centre addition appointed style 1994 books eight construction press mean wall friends remained schools study ##ch ##um institute oh chinese sometimes events possible 1992 australian type brown forward talk process food debut seat performance committee features character arts herself else lot strong russian range hours peter arm ##da morning dr sold ##ry quickly directed 1993 guitar china ##w 31 list ##ma performed media uk players smile ##rs myself 40 placed coach province towards wouldn leading whole boy official designed grand census ##el europe attack japanese henry 1991 ##re ##os cross getting alone action lower network wide washington japan 1990 hospital believe changed sister ##ar hold gone sir hadn ship ##ka studies academy shot rights below base bad involved kept largest ##ist bank future especially beginning mark movement section female magazine plan professor lord longer ##ian sat walked hill actually civil energy model families size thus aircraft completed includes data captain ##or fight vocals featured richard bridge fourth 1989 officer stone hear ##ism means medical groups management self lips competition entire lived technology leaving federal tournament bit passed hot independent awards kingdom mary spent fine doesn reported ##ling jack fall raised itself stay true studio 1988 sports replaced paris systems saint leader theatre whose market capital parents spanish canadian earth ##ity cut degree writing bay christian awarded natural higher bill ##as coast provided previous senior ft valley organization stopped onto countries parts conference queen security interest saying allowed master earlier phone matter smith winning try happened moving campaign los ##ley breath nearly mid 1987 certain girls date italian african standing fell artist ##ted shows deal mine industry 1986 ##ng everyone republic provide collection library student ##ville primary owned older via heavy 1st makes ##able attention anyone africa ##ri stated length ended fingers command staff skin foreign opening governor okay medal kill sun cover job 1985 introduced chest hell feeling ##ies success meet reason standard meeting novel 1984 trade source buildings ##land rose guy goal ##ur chapter native husband previously unit limited entered weeks producer operations mountain takes covered forced related roman complete successful key texas cold ##ya channel 1980 traditional films dance clear approximately 500 nine van prince question active tracks ireland regional silver author personal sense operation ##ine economic 1983 holding twenty isbn additional speed hour edition regular historic places whom shook movie km² secretary prior report chicago read foundation view engine scored 1982 units ask airport property ready immediately lady month listed contract ##de manager themselves lines ##ki navy writer meant ##ts runs ##ro practice championships singer glass commission required forest starting culture generally giving access attended test couple stand catholic martin caught executive ##less eye ##ey thinking chair quite shoulder 1979 hope decision plays defeated municipality whether structure offered slowly pain ice direction ##ion paper mission 1981 mostly 200 noted individual managed nature lives plant ##ha helped except studied computer figure relationship issue significant loss die smiled gun ago highest 1972 ##am male bring goals mexico problem distance commercial completely location annual famous drive 1976 neck 1978 surface caused italy understand greek highway wrong hotel comes appearance joseph double issues musical companies castle income review assembly bass initially parliament artists experience 1974 particular walk foot engineering talking window dropped ##ter miss baby boys break 1975 stars edge remember policy carried train stadium bar sex angeles evidence ##ge becoming assistant soviet 1977 upper step wing 1970 youth financial reach ##ll actor numerous ##se ##st nodded arrived ##ation minute ##nt believed sorry complex beautiful victory associated temple 1968 1973 chance perhaps metal ##son 1945 bishop ##et lee launched particularly tree le retired subject prize contains yeah theory empire ##ce suddenly waiting trust recording ##to happy terms camp champion 1971 religious pass zealand names 2nd port ancient tom corner represented watch legal anti justice cause watched brothers 45 material changes simply response louis fast ##ting answer 60 historical 1969 stories straight create feature increased rate administration virginia el activities cultural overall winner programs basketball legs guard beyond cast doctor mm flight results remains cost effect winter ##ble larger islands problems chairman grew commander isn 1967 pay failed selected hurt fort box regiment majority journal 35 edward plans ##ke ##ni shown pretty irish characters directly scene likely operated allow spring ##j junior matches looks mike houses fellow ##tion beach marriage ##ham ##ive rules oil 65 florida expected nearby congress sam peace recent iii wait subsequently cell ##do variety serving agreed please poor joe pacific attempt wood democratic piece prime ##ca rural mile touch appears township 1964 1966 soldiers ##men ##ized 1965 pennsylvania closer fighting claimed score jones physical editor ##ous filled genus specific sitting super mom ##va therefore supported status fear cases store meaning wales minor spain tower focus vice frank follow parish separate golden horse fifth remaining branch 32 presented stared ##id uses secret forms ##co baseball exactly ##ck choice note discovered travel composed truth russia ball color kiss dad wind continue ring referred numbers digital greater ##ns metres slightly direct increase 1960 responsible crew rule trees troops ##no broke goes individuals hundred weight creek sleep memory defense provides ordered code value jewish windows 1944 safe judge whatever corps realized growing pre ##ga cities alexander gaze lies spread scott letter showed situation mayor transport watching workers extended ##li expression normal ##ment chart multiple border ##ba host ##ner daily mrs walls piano ##ko heat cannot ##ate earned products drama era authority seasons join grade ##io sign difficult machine 1963 territory mainly ##wood stations squadron 1962 stepped iron 19th ##led serve appear sky speak broken charge knowledge kilometres removed ships article campus simple ##ty pushed britain ##ve leaves recently cd soft boston latter easy acquired poland ##sa quality officers presence planned nations mass broadcast jean share image influence wild offer emperor electric reading headed ability promoted yellow ministry 1942 throat smaller politician ##by latin spoke cars williams males lack pop 80 ##ier acting seeing consists ##ti estate 1961 pressure johnson newspaper jr chris olympics online conditions beat elements walking vote ##field needs carolina text featuring global block shirt levels francisco purpose females et dutch duke ahead gas twice safety serious turning highly lieutenant firm maria amount mixed daniel proposed perfect agreement affairs 3rd seconds contemporary paid 1943 prison save kitchen label administrative intended constructed academic nice teacher races 1956 formerly corporation ben nation issued shut 1958 drums housing victoria seems opera 1959 graduated function von mentioned picked build recognized shortly protection picture notable exchange elections 1980s loved percent racing fish elizabeth garden volume hockey 1941 beside settled ##ford 1940 competed replied drew 1948 actress marine scotland steel glanced farm steve 1957 risk tonight positive magic singles effects gray screen dog ##ja residents bus sides none secondary literature polish destroyed flying founder households 1939 lay reserve usa gallery ##ler 1946 industrial younger approach appearances urban ones 1950 finish avenue powerful fully growth page honor jersey projects advanced revealed basic 90 infantry pair equipment visit 33 evening search grant effort solo treatment buried republican primarily bottom owner 1970s israel gives jim dream bob remain spot 70 notes produce champions contact ed soul accepted ways del ##ally losing split price capacity basis trial questions ##ina 1955 20th guess officially memorial naval initial ##ization whispered median engineer ##ful sydney ##go columbia strength 300 1952 tears senate 00 card asian agent 1947 software 44 draw warm supposed com pro ##il transferred leaned ##at candidate escape mountains asia potential activity entertainment seem traffic jackson murder 36 slow product orchestra haven agency bbc taught website comedy unable storm planning albums rugby environment scientific grabbed protect ##hi boat typically 1954 1953 damage principal divided dedicated mount ohio ##berg pick fought driver ##der empty shoulders sort thank berlin prominent account freedom necessary efforts alex headquarters follows alongside des simon andrew suggested operating learning steps 1949 sweet technical begin easily 34 teeth speaking settlement scale ##sh renamed ray max enemy semi joint compared ##rd scottish leadership analysis offers georgia pieces captured animal deputy guest organized ##lin tony combined method challenge 1960s huge wants battalion sons rise crime types facilities telling path 1951 platform sit 1990s ##lo tells assigned rich pull ##ot commonly alive ##za letters concept conducted wearing happen bought becomes holy gets ocean defeat languages purchased coffee occurred titled ##q declared applied sciences concert sounds jazz brain ##me painting fleet tax nick ##ius michigan count animals leaders episodes ##line content ##den birth ##it clubs 64 palace critical refused fair leg laughed returning surrounding participated formation lifted pointed connected rome medicine laid taylor santa powers adam tall shared focused knowing yards entrance falls ##wa calling ##ad sources chosen beneath resources yard ##ite nominated silence zone defined ##que gained thirty 38 bodies moon ##ard adopted christmas widely register apart iran premier serves du unknown parties ##les generation ##ff continues quick fields brigade quiet teaching clothes impact weapons partner flat theater supreme 1938 37 relations ##tor plants suffered 1936 wilson kids begins ##age 1918 seats armed internet models worth laws 400 communities classes background knows thanks quarter reaching humans carry killing format kong hong setting 75 architecture disease railroad inc possibly wish arthur thoughts harry doors density ##di crowd illinois stomach tone unique reports anyway ##ir liberal der vehicle thick dry drug faced largely facility theme holds creation strange colonel ##mi revolution bell politics turns silent rail relief independence combat shape write determined sales learned 4th finger oxford providing 1937 heritage fiction situated designated allowing distribution hosted ##est sight interview estimated reduced ##ria toronto footballer keeping guys damn claim motion sport sixth stayed ##ze en rear receive handed twelve dress audience granted brazil ##well spirit ##ated noticed etc olympic representative eric tight trouble reviews drink vampire missing roles ranked newly household finals wave critics ##ee phase massachusetts pilot unlike philadelphia bright guns crown organizations roof 42 respectively clearly tongue marked circle fox korea bronze brian expanded sexual supply yourself inspired labour fc ##ah reference vision draft connection brand reasons 1935 classic driving trip jesus cells entry 1920 neither trail claims atlantic orders labor nose afraid identified intelligence calls cancer attacked passing stephen positions imperial grey jason 39 sunday 48 swedish avoid extra uncle message covers allows surprise materials fame hunter ##ji 1930 citizens figures davis environmental confirmed shit titles di performing difference acts attacks ##ov existing votes opportunity nor shop entirely trains opposite pakistan ##pa develop resulted representatives actions reality pressed ##ish barely wine conversation faculty northwest ends documentary nuclear stock grace sets eat alternative ##ps bag resulting creating surprised cemetery 1919 drop finding sarah cricket streets tradition ride 1933 exhibition target ear explained rain composer injury apartment municipal educational occupied netherlands clean billion constitution learn 1914 maximum classical francis lose opposition jose ontario bear core hills rolled ending drawn permanent fun ##tes ##lla lewis sites chamber ryan ##way scoring height 1934 ##house lyrics staring 55 officials 1917 snow oldest ##tic orange ##ger qualified interior apparently succeeded thousand dinner lights existence fans heavily 41 greatest conservative send bowl plus enter catch ##un economy duty 1929 speech authorities princess performances versions shall graduate pictures effective remembered poetry desk crossed starring starts passenger sharp ##ant acres ass weather falling rank fund supporting check adult publishing heads cm southeast lane ##burg application bc ##ura les condition transfer prevent display ex regions earl federation cool relatively answered besides 1928 obtained portion ##town mix ##ding reaction liked dean express peak 1932 ##tte counter religion chain rare miller convention aid lie vehicles mobile perform squad wonder lying crazy sword ##ping attempted centuries weren philosophy category ##ize anna interested 47 sweden wolf frequently abandoned kg literary alliance task entitled ##ay threw promotion factory tiny soccer visited matt fm achieved 52 defence internal persian 43 methods ##ging arrested otherwise cambridge programming villages elementary districts rooms criminal conflict worry trained 1931 attempts waited signal bird truck subsequent programme ##ol ad 49 communist details faith sector patrick carrying laugh ##ss controlled korean showing origin fuel evil 1927 ##ent brief identity darkness address pool missed publication web planet ian anne wings invited ##tt briefly standards kissed ##be ideas climate causing walter worse albert articles winners desire aged northeast dangerous gate doubt 1922 wooden multi ##ky poet rising funding 46 communications communication violence copies prepared ford investigation skills 1924 pulling electronic ##ak ##ial ##han containing ultimately offices singing understanding restaurant tomorrow fashion christ ward da pope stands 5th flow studios aired commissioned contained exist fresh americans ##per wrestling approved kid employed respect suit 1925 angel asking increasing frame angry selling 1950s thin finds ##nd temperature statement ali explain inhabitants towns extensive narrow 51 jane flowers images promise somewhere object fly closely ##ls 1912 bureau cape 1926 weekly presidential legislative 1921 ##ai ##au launch founding ##ny 978 ##ring artillery strike un institutions roll writers landing chose kevin anymore pp ##ut attorney fit dan billboard receiving agricultural breaking sought dave admitted lands mexican ##bury charlie specifically hole iv howard credit moscow roads accident 1923 proved wear struck hey guards stuff slid expansion 1915 cat anthony ##kin melbourne opposed sub southwest architect failure plane 1916 ##ron map camera tank listen regarding wet introduction metropolitan link ep fighter inch grown gene anger fixed buy dvd khan domestic worldwide chapel mill functions examples ##head developing 1910 turkey hits pocket antonio papers grow unless circuit 18th concerned attached journalist selection journey converted provincial painted hearing aren bands negative aside wondered knight lap survey ma ##ow noise billy ##ium shooting guide bedroom priest resistance motor homes sounded giant ##mer 150 scenes equal comic patients hidden solid actual bringing afternoon touched funds wedding consisted marie canal sr kim treaty turkish recognition residence cathedral broad knees incident shaped fired norwegian handle cheek contest represent ##pe representing beauty ##sen birds advantage emergency wrapped drawing notice pink broadcasting ##ong somehow bachelor seventh collected registered establishment alan assumed chemical personnel roger retirement jeff portuguese wore tied device threat progress advance ##ised banks hired manchester nfl teachers structures forever ##bo tennis helping saturday sale applications junction hip incorporated neighborhood dressed ceremony ##ds influenced hers visual stairs decades inner kansas hung hoped gain scheduled downtown engaged austria clock norway certainly pale protected 1913 victor employees plate putting surrounded ##ists finishing blues tropical ##ries minnesota consider philippines accept 54 retrieved 1900 concern anderson properties institution gordon successfully vietnam ##dy backing outstanding muslim crossing folk producing usual demand occurs observed lawyer educated ##ana kelly string pleasure budget items quietly colorado philip typical ##worth derived 600 survived asks mental ##ide 56 jake jews distinguished ltd 1911 sri extremely 53 athletic loud thousands worried shadow transportation horses weapon arena importance users tim objects contributed dragon douglas aware senator johnny jordan sisters engines flag investment samuel shock capable clark row wheel refers session familiar biggest wins hate maintained drove hamilton request expressed injured underground churches walker wars tunnel passes stupid agriculture softly cabinet regarded joining indiana ##ea ##ms push dates spend behavior woods protein gently chase morgan mention burning wake combination occur mirror leads jimmy indeed impossible singapore paintings covering ##nes soldier locations attendance sell historian wisconsin invasion argued painter diego changing egypt ##don experienced inches ##ku missouri vol grounds spoken switzerland ##gan reform rolling ha forget massive resigned burned allen tennessee locked values improved ##mo wounded universe sick dating facing pack purchase user ##pur moments ##ul merged anniversary 1908 coal brick understood causes dynasty queensland establish stores crisis promote hoping views cards referee extension ##si raise arizona improve colonial formal charged ##rt palm lucky hide rescue faces 95 feelings candidates juan ##ell goods 6th courses weekend 59 luke cash fallen ##om delivered affected installed carefully tries swiss hollywood costs lincoln responsibility ##he shore file proper normally maryland assistance jump constant offering friendly waters persons realize contain trophy 800 partnership factor 58 musicians cry bound oregon indicated hero houston medium ##ure consisting somewhat ##ara 57 cycle ##che beer moore frederick gotten eleven worst weak approached arranged chin loan universal bond fifteen pattern disappeared ##ney translated ##zed lip arab capture interests insurance ##chi shifted cave prix warning sections courts coat plot smell feed golf favorite maintain knife vs voted degrees finance quebec opinion translation manner ruled operate productions choose musician discovery confused tired separated stream techniques committed attend ranking kings throw passengers measure horror fan mining sand danger salt calm decade dam require runner ##ik rush associate greece ##ker rivers consecutive matthew ##ski sighed sq documents steam edited closing tie accused 1905 ##ini islamic distributed directors organisation bruce 7th breathing mad lit arrival concrete taste 08 composition shaking faster amateur adjacent stating 1906 twin flew ##ran tokyo publications ##tone obviously ridge storage 1907 carl pages concluded desert driven universities ages terminal sequence borough 250 constituency creative cousin economics dreams margaret notably reduce montreal mode 17th ears saved jan vocal ##ica 1909 andy ##jo riding roughly threatened ##ise meters meanwhile landed compete repeated grass czech regularly charges tea sudden appeal ##ung solution describes pierre classification glad parking ##ning belt physics 99 rachel add hungarian participate expedition damaged gift childhood 85 fifty ##red mathematics jumped letting defensive mph ##ux ##gh testing ##hip hundreds shoot owners matters smoke israeli kentucky dancing mounted grandfather emma designs profit argentina ##gs truly li lawrence cole begun detroit willing branches smiling decide miami enjoyed recordings ##dale poverty ethnic gay ##bi gary arabic 09 accompanied ##one ##ons fishing determine residential acid ##ary alice returns starred mail ##ang jonathan strategy ##ue net forty cook businesses equivalent commonwealth distinct ill ##cy seriously ##ors ##ped shift harris replace rio imagine formula ensure ##ber additionally scheme conservation occasionally purposes feels favor ##and ##ore 1930s contrast hanging hunt movies 1904 instruments victims danish christopher busy demon sugar earliest colony studying balance duties ##ks belgium slipped carter 05 visible stages iraq fifa ##im commune forming zero 07 continuing talked counties legend bathroom option tail clay daughters afterwards severe jaw visitors ##ded devices aviation russell kate ##vi entering subjects ##ino temporary swimming forth smooth ghost audio bush operates rocks movements signs eddie ##tz ann voices honorary 06 memories dallas pure measures racial promised 66 harvard ceo 16th parliamentary indicate benefit flesh dublin louisiana 1902 1901 patient sleeping 1903 membership coastal medieval wanting element scholars rice 62 limit survive makeup rating definitely collaboration obvious ##tan boss ms baron birthday linked soil diocese ##lan ncaa ##mann offensive shell shouldn waist ##tus plain ross organ resolution manufacturing adding relative kennedy 98 whilst moth marketing gardens crash 72 heading partners credited carlos moves cable ##zi marshall ##out depending bottle represents rejected responded existed 04 jobs denmark lock ##ating treated graham routes talent commissioner drugs secure tests reign restored photography ##gi contributions oklahoma designer disc grin seattle robin paused atlanta unusual ##gate praised las laughing satellite hungary visiting ##sky interesting factors deck poems norman ##water stuck speaker rifle domain premiered ##her dc comics actors 01 reputation eliminated 8th ceiling prisoners script ##nce leather austin mississippi rapidly admiral parallel charlotte guilty tools gender divisions fruit ##bs laboratory nelson fantasy marry rapid aunt tribe requirements aspects suicide amongst adams bone ukraine abc kick sees edinburgh clothing column rough gods hunting broadway gathered concerns ##ek spending ty 12th snapped requires solar bones cavalry ##tta iowa drinking waste index franklin charity thompson stewart tip flash landscape friday enjoy singh poem listening ##back eighth fred differences adapted bomb ukrainian surgery corporate masters anywhere ##more waves odd sean portugal orleans dick debate kent eating puerto cleared 96 expect cinema 97 guitarist blocks electrical agree involving depth dying panel struggle ##ged peninsula adults novels emerged vienna metro debuted shoes tamil songwriter meets prove beating instance heaven scared sending marks artistic passage superior 03 significantly shopping ##tive retained ##izing malaysia technique cheeks ##ola warren maintenance destroy extreme allied 120 appearing ##yn fill advice alabama qualifying policies cleveland hat battery smart authors 10th soundtrack acted dated lb glance equipped coalition funny outer ambassador roy possibility couples campbell dna loose ethan supplies 1898 gonna 88 monster ##res shake agents frequency springs dogs practices 61 gang plastic easier suggests gulf blade exposed colors industries markets pan nervous electoral charts legislation ownership ##idae mac appointment shield copy assault socialist abbey monument license throne employment jay 93 replacement charter cloud powered suffering accounts oak connecticut strongly wright colour crystal 13th context welsh networks voiced gabriel jerry ##cing forehead mp ##ens manage schedule totally remix ##ii forests occupation print nicholas brazilian strategic vampires engineers 76 roots seek correct instrumental und alfred backed hop ##des stanley robinson traveled wayne welcome austrian achieve 67 exit rates 1899 strip whereas ##cs sing deeply adventure bobby rick jamie careful components cap useful personality knee ##shi pushing hosts 02 protest ca ottoman symphony ##sis 63 boundary 1890 processes considering considerable tons ##work ##ft ##nia cooper trading dear conduct 91 illegal apple revolutionary holiday definition harder ##van jacob circumstances destruction ##lle popularity grip classified liverpool donald baltimore flows seeking honour approval 92 mechanical till happening statue critic increasingly immediate describe commerce stare ##ster indonesia meat rounds boats baker orthodox depression formally worn naked claire muttered sentence 11th emily document 77 criticism wished vessel spiritual bent virgin parker minimum murray lunch danny printed compilation keyboards false blow belonged 68 raising 78 cutting ##board pittsburgh ##up 9th shadows 81 hated indigenous jon 15th barry scholar ah ##zer oliver ##gy stick susan meetings attracted spell romantic ##ver ye 1895 photo demanded customers ##ac 1896 logan revival keys modified commanded jeans ##ious upset raw phil detective hiding resident vincent ##bly experiences diamond defeating coverage lucas external parks franchise helen bible successor percussion celebrated il lift profile clan romania ##ied mills ##su nobody achievement shrugged fault 1897 rhythm initiative breakfast carbon 700 69 lasted violent 74 wound ken killer gradually filmed °c dollars processing 94 remove criticized guests sang chemistry ##vin legislature disney ##bridge uniform escaped integrated proposal purple denied liquid karl influential morris nights stones intense experimental twisted 71 84 ##ld pace nazi mitchell ny blind reporter newspapers 14th centers burn basin forgotten surviving filed collections monastery losses manual couch description appropriate merely tag missions sebastian restoration replacing triple 73 elder julia warriors benjamin julian convinced stronger amazing declined versus merchant happens output finland bare barbara absence ignored dawn injuries ##port producers ##ram 82 luis ##ities kw admit expensive electricity nba exception symbol ##ving ladies shower sheriff characteristics ##je aimed button ratio effectively summit angle jury bears foster vessels pants executed evans dozen advertising kicked patrol 1889 competitions lifetime principles athletics ##logy birmingham sponsored 89 rob nomination 1893 acoustic ##sm creature longest ##tra credits harbor dust josh ##so territories milk infrastructure completion thailand indians leon archbishop ##sy assist pitch blake arrangement girlfriend serbian operational hence sad scent fur dj sessions hp refer rarely ##ora exists 1892 ##ten scientists dirty penalty burst portrait seed 79 pole limits rival 1894 stable alpha grave constitutional alcohol arrest flower mystery devil architectural relationships greatly habitat ##istic larry progressive remote cotton ##ics ##ok preserved reaches ##ming cited 86 vast scholarship decisions cbs joy teach 1885 editions knocked eve searching partly participation gap animated fate excellent ##ett na 87 alternate saints youngest ##ily climbed ##ita ##tors suggest ##ct discussion staying choir lakes jacket revenue nevertheless peaked instrument wondering annually managing neil 1891 signing terry ##ice apply clinical brooklyn aim catherine fuck farmers figured ninth pride hugh evolution ordinary involvement comfortable shouted tech encouraged taiwan representation sharing ##lia ##em panic exact cargo competing fat cried 83 1920s occasions pa cabin borders utah marcus ##isation badly muscles ##ance victorian transition warner bet permission ##rin slave terrible similarly shares seth uefa possession medals benefits colleges lowered perfectly mall transit ##ye ##kar publisher ##ened harrison deaths elevation ##ae asleep machines sigh ash hardly argument occasion parent leo decline 1888 contribution ##ua concentration 1000 opportunities hispanic guardian extent emotions hips mason volumes bloody controversy diameter steady mistake phoenix identify violin ##sk departure richmond spin funeral enemies 1864 gear literally connor random sergeant grab confusion 1865 transmission informed op leaning sacred suspended thinks gates portland luck agencies yours hull expert muscle layer practical sculpture jerusalem latest lloyd statistics deeper recommended warrior arkansas mess supports greg eagle 1880 recovered rated concerts rushed ##ano stops eggs files premiere keith ##vo delhi turner pit affair belief paint ##zing mate ##ach ##ev victim ##ology withdrew bonus styles fled ##ud glasgow technologies funded nbc adaptation ##ata portrayed cooperation supporters judges bernard justin hallway ralph ##ick graduating controversial distant continental spider bite ##ho recognize intention mixing ##ese egyptian bow tourism suppose claiming tiger dominated participants vi ##ru nurse partially tape ##rum psychology ##rn essential touring duo voting civilian emotional channels ##king apparent hebrew 1887 tommy carrier intersection beast hudson ##gar ##zo lab nova bench discuss costa ##ered detailed behalf drivers unfortunately obtain ##lis rocky ##dae siege friendship honey ##rian 1861 amy hang posted governments collins respond wildlife preferred operator ##po laura pregnant videos dennis suspected boots instantly weird automatic businessman alleged placing throwing ph mood 1862 perry venue jet remainder ##lli ##ci passion biological boyfriend 1863 dirt buffalo ron segment fa abuse ##era genre thrown stroke colored stress exercise displayed ##gen struggled ##tti abroad dramatic wonderful thereafter madrid component widespread ##sed tale citizen todd monday 1886 vancouver overseas forcing crying descent ##ris discussed substantial ranks regime 1870 provinces switch drum zane ted tribes proof lp cream researchers volunteer manor silk milan donated allies venture principle delivery enterprise ##ves ##ans bars traditionally witch reminded copper ##uk pete inter links colin grinned elsewhere competitive frequent ##oy scream ##hu tension texts submarine finnish defending defend pat detail 1884 affiliated stuart themes villa periods tool belgian ruling crimes answers folded licensed resort demolished hans lucy 1881 lion traded photographs writes craig ##fa trials generated beth noble debt percentage yorkshire erected ss viewed grades confidence ceased islam telephone retail ##ible chile m² roberts sixteen ##ich commented hampshire innocent dual pounds checked regulations afghanistan sung rico liberty assets bigger options angels relegated tribute wells attending leaf ##yan butler romanian forum monthly lisa patterns gmina ##tory madison hurricane rev ##ians bristol ##ula elite valuable disaster democracy awareness germans freyja ##ins loop absolutely paying populations maine sole prayer spencer releases doorway bull ##ani lover midnight conclusion ##sson thirteen lily mediterranean ##lt nhl proud sample ##hill drummer guinea ##ova murphy climb ##ston instant attributed horn ain railways steven ##ao autumn ferry opponent root traveling secured corridor stretched tales sheet trinity cattle helps indicates manhattan murdered fitted 1882 gentle grandmother mines shocked vegas produces ##light caribbean ##ou belong continuous desperate drunk historically trio waved raf dealing nathan bat murmured interrupted residing scientist pioneer harold aaron ##net delta attempting minority mini believes chorus tend lots eyed indoor load shots updated jail ##llo concerning connecting wealth ##ved slaves arrive rangers sufficient rebuilt ##wick cardinal flood muhammad whenever relation runners moral repair viewers arriving revenge punk assisted bath fairly breathe lists innings illustrated whisper nearest voters clinton ties ultimate screamed beijing lions andre fictional gathering comfort radar suitable dismissed hms ban pine wrist atmosphere voivodeship bid timber ##ned ##nan giants ##ane cameron recovery uss identical categories switched serbia laughter noah ensemble therapy peoples touching ##off locally pearl platforms everywhere ballet tables lanka herbert outdoor toured derek 1883 spaces contested swept 1878 exclusive slight connections ##dra winds prisoner collective bangladesh tube publicly wealthy thai ##ys isolated select ##ric insisted pen fortune ticket spotted reportedly animation enforcement tanks 110 decides wider lowest owen ##time nod hitting ##hn gregory furthermore magazines fighters solutions ##ery pointing requested peru reed chancellor knights mask worker eldest flames reduction 1860 volunteers ##tis reporting ##hl wire advisory endemic origins settlers pursue knock consumer 1876 eu compound creatures mansion sentenced ivan deployed guitars frowned involves mechanism kilometers perspective shops maps terminus duncan alien fist bridges ##pers heroes fed derby swallowed ##ros patent sara illness characterized adventures slide hawaii jurisdiction ##op organised ##side adelaide walks biology se ##ties rogers swing tightly boundaries ##rie prepare implementation stolen ##sha certified colombia edwards garage ##mm recalled ##ball rage harm nigeria breast ##ren furniture pupils settle ##lus cuba balls client alaska 21st linear thrust celebration latino genetic terror ##cia ##ening lightning fee witness lodge establishing skull ##ique earning hood ##ei rebellion wang sporting warned missile devoted activist porch worship fourteen package 1871 decorated ##shire housed ##ock chess sailed doctors oscar joan treat garcia harbour jeremy ##ire traditions dominant jacques ##gon ##wan relocated 1879 amendment sized companion simultaneously volleyball spun acre increases stopping loves belongs affect drafted tossed scout battles 1875 filming shoved munich tenure vertical romance pc ##cher argue ##ical craft ranging www opens honest tyler yesterday virtual ##let muslims reveal snake immigrants radical screaming speakers firing saving belonging ease lighting prefecture blame farmer hungry grows rubbed beam sur subsidiary ##cha armenian sao dropping conventional ##fer microsoft reply qualify spots 1867 sweat festivals ##ken immigration physician discover exposure sandy explanation isaac implemented ##fish hart initiated connect stakes presents heights householder pleased tourist regardless slip closest ##ction surely sultan brings riley preparation aboard slammed baptist experiment ongoing interstate organic playoffs ##ika 1877 130 ##tar hindu error tours tier plenty arrangements talks trapped excited sank ho athens 1872 denver welfare suburb athletes trick diverse belly exclusively yelled 1868 ##med conversion ##ette 1874 internationally computers conductor abilities sensitive hello dispute measured globe rocket prices amsterdam flights tigers inn municipalities emotion references 3d ##mus explains airlines manufactured pm archaeological 1873 interpretation devon comment ##ites settlements kissing absolute improvement suite impressed barcelona sullivan jefferson towers jesse julie ##tin ##lu grandson hi gauge regard rings interviews trace raymond thumb departments burns serial bulgarian scores demonstrated ##ix 1866 kyle alberta underneath romanized ##ward relieved acquisition phrase cliff reveals han cuts merger custom ##dar nee gilbert graduation ##nts assessment cafe difficulty demands swung democrat jennifer commons 1940s grove ##yo completing focuses sum substitute bearing stretch reception ##py reflected essentially destination pairs ##ched survival resource ##bach promoting doubles messages tear ##down ##fully parade florence harvey incumbent partial framework 900 pedro frozen procedure olivia controls ##mic shelter personally temperatures ##od brisbane tested sits marble comprehensive oxygen leonard ##kov inaugural iranian referring quarters attitude ##ivity mainstream lined mars dakota norfolk unsuccessful ##° explosion helicopter congressional ##sing inspector bitch seal departed divine ##ters coaching examination punishment manufacturer sink columns unincorporated signals nevada squeezed dylan dining photos martial manuel eighteen elevator brushed plates ministers ivy congregation ##len slept specialized taxes curve restricted negotiations likes statistical arnold inspiration execution bold intermediate significance margin ruler wheels gothic intellectual dependent listened eligible buses widow syria earn cincinnati collapsed recipient secrets accessible philippine maritime goddess clerk surrender breaks playoff database ##ified ##lon ideal beetle aspect soap regulation strings expand anglo shorter crosses retreat tough coins wallace directions pressing ##oon shipping locomotives comparison topics nephew ##mes distinction honors travelled sierra ibn ##over fortress sa recognised carved 1869 clients ##dan intent ##mar coaches describing bread ##ington beaten northwestern ##ona merit youtube collapse challenges em historians objective submitted virus attacking drake assume ##ere diseases marc stem leeds ##cus ##ab farming glasses ##lock visits nowhere fellowship relevant carries restaurants experiments 101 constantly bases targets shah tenth opponents verse territorial ##ira writings corruption ##hs instruction inherited reverse emphasis ##vic employee arch keeps rabbi watson payment uh ##ala nancy ##tre venice fastest sexy banned adrian properly ruth touchdown dollar boards metre circles edges favour comments ok travels liberation scattered firmly ##ular holland permitted diesel kenya den originated ##ral demons resumed dragged rider ##rus servant blinked extend torn ##ias ##sey input meal everybody cylinder kinds camps ##fe bullet logic ##wn croatian evolved healthy fool chocolate wise preserve pradesh ##ess respective 1850 ##ew chicken artificial gross corresponding convicted cage caroline dialogue ##dor narrative stranger mario br christianity failing trent commanding buddhist 1848 maurice focusing yale bike altitude ##ering mouse revised ##sley veteran ##ig pulls theology crashed campaigns legion ##ability drag excellence customer cancelled intensity excuse ##lar liga participating contributing printing ##burn variable ##rk curious bin legacy renaissance ##my symptoms binding vocalist dancer ##nie grammar gospel democrats ya enters sc diplomatic hitler ##ser clouds mathematical quit defended oriented ##heim fundamental hardware impressive equally convince confederate guilt chuck sliding ##ware magnetic narrowed petersburg bulgaria otto phd skill ##ama reader hopes pitcher reservoir hearts automatically expecting mysterious bennett extensively imagined seeds monitor fix ##ative journalism struggling signature ranch encounter photographer observation protests ##pin influences ##hr calendar ##all cruz croatia locomotive hughes naturally shakespeare basement hook uncredited faded theories approaches dare phillips filling fury obama ##ain efficient arc deliver min raid breeding inducted leagues efficiency axis montana eagles ##ked supplied instructions karen picking indicating trap anchor practically christians tomb vary occasional electronics lords readers newcastle faint innovation collect situations engagement 160 claude mixture ##feld peer tissue logo lean ##ration °f floors ##ven architects reducing ##our ##ments rope 1859 ottawa ##har samples banking declaration proteins resignation francois saudi advocate exhibited armor twins divorce ##ras abraham reviewed jo temporarily matrix physically pulse curled ##ena difficulties bengal usage ##ban annie riders certificate ##pi holes warsaw distinctive jessica ##mon mutual 1857 customs circular eugene removal loaded mere vulnerable depicted generations dame heir enormous lightly climbing pitched lessons pilots nepal ram google preparing brad louise renowned ##₂ liam ##ably plaza shaw sophie brilliant bills ##bar ##nik fucking mainland server pleasant seized veterans jerked fail beta brush radiation stored warmth southeastern nate sin raced berkeley joke athlete designation trunk ##low roland qualification archives heels artwork receives judicial reserves ##bed woke installation abu floating fake lesser excitement interface concentrated addressed characteristic amanda saxophone monk auto ##bus releasing egg dies interaction defender ce outbreak glory loving ##bert sequel consciousness http awake ski enrolled ##ress handling rookie brow somebody biography warfare amounts contracts presentation fabric dissolved challenged meter psychological lt elevated rally accurate ##tha hospitals undergraduate specialist venezuela exhibit shed nursing protestant fluid structural footage jared consistent prey ##ska succession reflect exile lebanon wiped suspect shanghai resting integration preservation marvel variant pirates sheep rounded capita sailing colonies manuscript deemed variations clarke functional emerging boxing relaxed curse azerbaijan heavyweight nickname editorial rang grid tightened earthquake flashed miguel rushing ##ches improvements boxes brooks 180 consumption molecular felix societies repeatedly variation aids civic graphics professionals realm autonomous receiver delayed workshop militia chairs trump canyon ##point harsh extending lovely happiness ##jan stake eyebrows embassy wellington hannah ##ella sony corners bishops swear cloth contents xi namely commenced 1854 stanford nashville courage graphic commitment garrison ##bin hamlet clearing rebels attraction literacy cooking ruins temples jenny humanity celebrate hasn freight sixty rebel bastard ##art newton ##ada deer ##ges ##ching smiles delaware singers ##ets approaching assists flame ##ph boulevard barrel planted ##ome pursuit ##sia consequences posts shallow invitation rode depot ernest kane rod concepts preston topic chambers striking blast arrives descendants montgomery ranges worlds ##lay ##ari span chaos praise ##ag fewer 1855 sanctuary mud fbi ##ions programmes maintaining unity harper bore handsome closure tournaments thunder nebraska linda facade puts satisfied argentine dale cork dome panama ##yl 1858 tasks experts ##ates feeding equation ##las ##ida ##tu engage bryan ##ax um quartet melody disbanded sheffield blocked gasped delay kisses maggie connects ##non sts poured creator publishers ##we guided ellis extinct hug gaining ##ord complicated ##bility poll clenched investigate ##use thereby quantum spine cdp humor kills administered semifinals ##du encountered ignore ##bu commentary ##maker bother roosevelt 140 plains halfway flowing cultures crack imprisoned neighboring airline ##ses ##view ##mate ##ec gather wolves marathon transformed ##ill cruise organisations carol punch exhibitions numbered alarm ratings daddy silently ##stein queens colours impression guidance liu tactical ##rat marshal della arrow ##ings rested feared tender owns bitter advisor escort ##ides spare farms grants ##ene dragons encourage colleagues cameras ##und sucked pile spirits prague statements suspension landmark fence torture recreation bags permanently survivors pond spy predecessor bombing coup ##og protecting transformation glow ##lands ##book dug priests andrea feat barn jumping ##chen ##ologist ##con casualties stern auckland pipe serie revealing ba ##bel trevor mercy spectrum yang consist governing collaborated possessed epic comprises blew shane ##ack lopez honored magical sacrifice judgment perceived hammer mtv baronet tune das missionary sheets 350 neutral oral threatening attractive shade aims seminary ##master estates 1856 michel wounds refugees manufacturers ##nic mercury syndrome porter ##iya ##din hamburg identification upstairs purse widened pause cared breathed affiliate santiago prevented celtic fisher 125 recruited byzantine reconstruction farther ##mp diet sake au spite sensation ##ert blank separation 105 ##hon vladimir armies anime ##lie accommodate orbit cult sofia archive ##ify ##box founders sustained disorder honours northeastern mia crops violet threats blanket fires canton followers southwestern prototype voyage assignment altered moderate protocol pistol ##eo questioned brass lifting 1852 math authored ##ual doug dimensional dynamic ##san 1851 pronounced grateful quest uncomfortable boom presidency stevens relating politicians chen barrier quinn diana mosque tribal cheese palmer portions sometime chester treasure wu bend download millions reforms registration ##osa consequently monitoring ate preliminary brandon invented ps eaten exterior intervention ports documented log displays lecture sally favourite ##itz vermont lo invisible isle breed ##ator journalists relay speaks backward explore midfielder actively stefan procedures cannon blond kenneth centered servants chains libraries malcolm essex henri slavery ##hal facts fairy coached cassie cats washed cop ##fi announcement item 2000s vinyl activated marco frontier growled curriculum ##das loyal accomplished leslie ritual kenny ##00 vii napoleon hollow hybrid jungle stationed friedrich counted ##ulated platinum theatrical seated col rubber glen 1840 diversity healing extends id provisions administrator columbus ##oe tributary te assured org ##uous prestigious examined lectures grammy ronald associations bailey allan essays flute believing consultant proceedings travelling 1853 kit kerala yugoslavia buddy methodist ##ith burial centres batman ##nda discontinued bo dock stockholm lungs severely ##nk citing manga ##ugh steal mumbai iraqi robot celebrity bride broadcasts abolished pot joel overhead franz packed reconnaissance johann acknowledged introduce handled doctorate developments drinks alley palestine ##nis ##aki proceeded recover bradley grain patch afford infection nationalist legendary ##ath interchange virtually gen gravity exploration amber vital wishes powell doctrine elbow screenplay ##bird contribute indonesian pet creates ##com enzyme kylie discipline drops manila hunger ##ien layers suffer fever bits monica keyboard manages ##hood searched appeals ##bad testament grande reid ##war beliefs congo ##ification ##dia si requiring ##via casey 1849 regret streak rape depends syrian sprint pound tourists upcoming pub ##xi tense ##els practiced echo nationwide guild motorcycle liz ##zar chiefs desired elena bye precious absorbed relatives booth pianist ##mal citizenship exhausted wilhelm ##ceae ##hed noting quarterback urge hectares ##gue ace holly ##tal blonde davies parked sustainable stepping twentieth airfield galaxy nest chip ##nell tan shaft paulo requirement ##zy paradise tobacco trans renewed vietnamese ##cker ##ju suggesting catching holmes enjoying md trips colt holder butterfly nerve reformed cherry bowling trailer carriage goodbye appreciate toy joshua interactive enabled involve ##kan collar determination bunch facebook recall shorts superintendent episcopal frustration giovanni nineteenth laser privately array circulation ##ovic armstrong deals painful permit discrimination ##wi aires retiring cottage ni ##sta horizon ellen jamaica ripped fernando chapters playstation patron lecturer navigation behaviour genes georgian export solomon rivals swift seventeen rodriguez princeton independently sox 1847 arguing entity casting hank criteria oakland geographic milwaukee reflection expanding conquest dubbed ##tv halt brave brunswick doi arched curtis divorced predominantly somerset streams ugly zoo horrible curved buenos fierce dictionary vector theological unions handful stability chan punjab segments ##lly altar ignoring gesture monsters pastor ##stone thighs unexpected operators abruptly coin compiled associates improving migration pin ##ose compact collegiate reserved ##urs quarterfinals roster restore assembled hurry oval ##cies 1846 flags martha ##del victories sharply ##rated argues deadly neo drawings symbols performer ##iel griffin restrictions editing andrews java journals arabia compositions dee pierce removing hindi casino runway civilians minds nasa hotels ##zation refuge rent retain potentially conferences suburban conducting ##tto ##tions ##tle descended massacre ##cal ammunition terrain fork souls counts chelsea durham drives cab ##bank perth realizing palestinian finn simpson ##dal betty ##ule moreover particles cardinals tent evaluation extraordinary ##oid inscription ##works wednesday chloe maintains panels ashley trucks ##nation cluster sunlight strikes zhang ##wing dialect canon ##ap tucked ##ws collecting ##mas ##can ##sville maker quoted evan franco aria buying cleaning eva closet provision apollo clinic rat ##ez necessarily ac ##gle ##ising venues flipped cent spreading trustees checking authorized ##sco disappointed ##ado notion duration trumpet hesitated topped brussels rolls theoretical hint define aggressive repeat wash peaceful optical width allegedly mcdonald strict copyright ##illa investors mar jam witnesses sounding miranda michelle privacy hugo harmony ##pp valid lynn glared nina 102 headquartered diving boarding gibson ##ncy albanian marsh routine dealt enhanced er intelligent substance targeted enlisted discovers spinning observations pissed smoking rebecca capitol visa varied costume seemingly indies compensation surgeon thursday arsenal westminster suburbs rid anglican ##ridge knots foods alumni lighter fraser whoever portal scandal ##ray gavin advised instructor flooding terrorist ##ale teenage interim senses duck teen thesis abby eager overcome ##ile newport glenn rises shame ##cc prompted priority forgot bomber nicolas protective 360 cartoon katherine breeze lonely trusted henderson richardson relax banner candy palms remarkable ##rio legends cricketer essay ordained edmund rifles trigger ##uri ##away sail alert 1830 audiences penn sussex siblings pursued indianapolis resist rosa consequence succeed avoided 1845 ##ulation inland ##tie ##nna counsel profession chronicle hurried ##una eyebrow eventual bleeding innovative cure ##dom committees accounting con scope hardy heather tenor gut herald codes tore scales wagon ##oo luxury tin prefer fountain triangle bonds darling convoy dried traced beings troy accidentally slam findings smelled joey lawyers outcome steep bosnia configuration shifting toll brook performers lobby philosophical construct shrine aggregate boot cox phenomenon savage insane solely reynolds lifestyle ##ima nationally holdings consideration enable edgar mo mama ##tein fights relegation chances atomic hub conjunction awkward reactions currency finale kumar underwent steering elaborate gifts comprising melissa veins reasonable sunshine chi solve trails inhabited elimination ethics huh ana molly consent apartments layout marines ##ces hunters bulk ##oma hometown ##wall ##mont cracked reads neighbouring withdrawn admission wingspan damned anthology lancashire brands batting forgive cuban awful ##lyn 104 dimensions imagination ##ade dante ##ship tracking desperately goalkeeper ##yne groaned workshops confident burton gerald milton circus uncertain slope copenhagen sophia fog philosopher portraits accent cycling varying gripped larvae garrett specified scotia mature luther kurt rap ##kes aerial 750 ferdinand heated es transported ##shan safely nonetheless ##orn ##gal motors demanding ##sburg startled ##brook ally generate caps ghana stained demo mentions beds ap afterward diary ##bling utility ##iro richards 1837 conspiracy conscious shining footsteps observer cyprus urged loyalty developer probability olive upgraded gym miracle insects graves 1844 ourselves hydrogen amazon katie tickets poets ##pm planes ##pan prevention witnessed dense jin randy tang warehouse monroe bang archived elderly investigations alec granite mineral conflicts controlling aboriginal carlo ##zu mechanics stan stark rhode skirt est ##berry bombs respected ##horn imposed limestone deny nominee memphis grabbing disabled ##als amusement aa frankfurt corn referendum varies slowed disk firms unconscious incredible clue sue ##zhou twist ##cio joins idaho chad developers computing destroyer 103 mortal tucker kingston choices yu carson 1800 os whitney geneva pretend dimension staged plateau maya ##une freestyle ##bc rovers hiv ##ids tristan classroom prospect ##hus honestly diploma lied thermal auxiliary feast unlikely iata ##tel morocco pounding treasury lithuania considerably 1841 dish 1812 geological matching stumbled destroying marched brien advances cake nicole belle settling measuring directing ##mie tuesday bassist capabilities stunned fraud torpedo ##list ##phone anton wisdom surveillance ruined ##ulate lawsuit healthcare theorem halls trend aka horizontal dozens acquire lasting swim hawk gorgeous fees vicinity decrease adoption tactics ##ography pakistani ##ole draws ##hall willie burke heath algorithm integral powder elliott brigadier jackie tate varieties darker ##cho lately cigarette specimens adds ##ree ##ensis ##inger exploded finalist cia murders wilderness arguments nicknamed acceptance onwards manufacture robertson jets tampa enterprises blog loudly composers nominations 1838 ai malta inquiry automobile hosting viii rays tilted grief museums strategies furious euro equality cohen poison surrey wireless governed ridiculous moses ##esh ##room vanished ##ito barnes attract morrison istanbul ##iness absent rotation petition janet ##logical satisfaction custody deliberately observatory comedian surfaces pinyin novelist strictly canterbury oslo monks embrace ibm jealous photograph continent dorothy marina doc excess holden allegations explaining stack avoiding lance storyline majesty poorly spike dos bradford raven travis classics proven voltage pillow fists butt 1842 interpreted ##car 1839 gage telegraph lens promising expelled casual collector zones ##min silly nintendo ##kh ##bra downstairs chef suspicious afl flies vacant uganda pregnancy condemned lutheran estimates cheap decree saxon proximity stripped idiot deposits contrary presenter magnus glacier im offense edwin ##ori upright ##long bolt ##ois toss geographical ##izes environments delicate marking abstract xavier nails windsor plantation occurring equity saskatchewan fears drifted sequences vegetation revolt ##stic 1843 sooner fusion opposing nato skating 1836 secretly ruin lease ##oc edit ##nne flora anxiety ruby ##ological ##mia tel bout taxi emmy frost rainbow compounds foundations rainfall assassination nightmare dominican ##win achievements deserve orlando intact armenia ##nte calgary valentine 106 marion proclaimed theodore bells courtyard thigh gonzalez console troop minimal monte everyday ##ence ##if supporter terrorism buck openly presbyterian activists carpet ##iers rubbing uprising ##yi cute conceived legally ##cht millennium cello velocity ji rescued cardiff 1835 rex concentrate senators beard rendered glowing battalions scouts competitors sculptor catalogue arctic ion raja bicycle wow glancing lawn ##woman gentleman lighthouse publish predicted calculated ##val variants ##gne strain ##ui winston deceased ##nus touchdowns brady caleb sinking echoed crush hon blessed protagonist hayes endangered magnitude editors ##tine estimate responsibilities ##mel backup laying consumed sealed zurich lovers frustrated ##eau ahmed kicking mit treasurer 1832 biblical refuse terrified pump agrees genuine imprisonment refuses plymouth ##hen lou ##nen tara trembling antarctic ton learns ##tas crap crucial faction atop ##borough wrap lancaster odds hopkins erik lyon ##eon bros ##ode snap locality tips empress crowned cal acclaimed chuckled ##ory clara sends mild towel ##fl ##day ##а wishing assuming interviewed ##bal ##die interactions eden cups helena ##lf indie beck ##fire batteries filipino wizard parted ##lam traces ##born rows idol albany delegates ##ees ##sar discussions ##ex notre instructed belgrade highways suggestion lauren possess orientation alexandria abdul beats salary reunion ludwig alright wagner intimate pockets slovenia hugged brighton merchants cruel stole trek slopes repairs enrollment politically underlying promotional counting boeing ##bb isabella naming ##и keen bacteria listing separately belfast ussr 450 lithuanian anybody ribs sphere martinez cock embarrassed proposals fragments nationals ##fs ##wski premises fin 1500 alpine matched freely bounded jace sleeve ##af gaming pier populated evident ##like frances flooded ##dle frightened pour trainer framed visitor challenging pig wickets ##fold infected email ##pes arose ##aw reward ecuador oblast vale ch shuttle ##usa bach rankings forbidden cornwall accordance salem consumers bruno fantastic toes machinery resolved julius remembering propaganda iceland bombardment tide contacts wives ##rah concerto macdonald albania implement daisy tapped sudan helmet angela mistress ##lic crop sunk finest ##craft hostile ##ute ##tsu boxer fr paths adjusted habit ballot supervision soprano ##zen bullets wicked sunset regiments disappear lamp performs app ##gia ##oa rabbit digging incidents entries ##cion dishes ##oi introducing ##ati ##fied freshman slot jill tackles baroque backs ##iest lone sponsor destiny altogether convert ##aro consensus shapes demonstration basically feminist auction artifacts ##bing strongest twitter halifax 2019 allmusic mighty smallest precise alexandra viola ##los ##ille manuscripts ##illo dancers ari managers monuments blades barracks springfield maiden consolidated electron ##end berry airing wheat nobel inclusion blair payments geography bee cc eleanor react ##hurst afc manitoba ##yu su lineup fitness recreational investments airborne disappointment ##dis edmonton viewing ##row renovation ##cast infant bankruptcy roses aftermath pavilion ##yer carpenter withdrawal ladder ##hy discussing popped reliable agreements rochester ##abad curves bombers 220 rao reverend decreased choosing 107 stiff consulting naples crawford tracy ka ribbon cops ##lee crushed deciding unified teenager accepting flagship explorer poles sanchez inspection revived skilled induced exchanged flee locals tragedy swallow loading hanna demonstrate ##ela salvador flown contestants civilization ##ines wanna rhodes fletcher hector knocking considers ##ough nash mechanisms sensed mentally walt unclear ##eus renovated madame ##cks crews governmental ##hin undertaken monkey ##ben ##ato fatal armored copa caves governance grasp perception certification froze damp tugged wyoming ##rg ##ero newman ##lor nerves curiosity graph 115 ##ami withdraw tunnels dull meredith moss exhibits neighbors communicate accuracy explored raiders republicans secular kat superman penny criticised ##tch freed update conviction wade ham likewise delegation gotta doll promises technological myth nationality resolve convent ##mark sharon dig sip coordinator entrepreneur fold ##dine capability councillor synonym blown swan cursed 1815 jonas haired sofa canvas keeper rivalry ##hart rapper speedway swords postal maxwell estonia potter recurring ##nn ##ave errors ##oni cognitive 1834 ##² claws nadu roberto bce wrestler ellie ##ations infinite ink ##tia presumably finite staircase 108 noel patricia nacional ##cation chill eternal tu preventing prussia fossil limbs ##logist ernst frog perez rene ##ace pizza prussian ##ios ##vy molecules regulatory answering opinions sworn lengths supposedly hypothesis upward habitats seating ancestors drank yield hd synthesis researcher modest ##var mothers peered voluntary homeland ##the acclaim ##igan static valve luxembourg alto carroll fe receptor norton ambulance ##tian johnston catholics depicting jointly elephant gloria mentor badge ahmad distinguish remarked councils precisely allison advancing detection crowded ##10 cooperative ankle mercedes dagger surrendered pollution commit subway jeffrey lesson sculptures provider ##fication membrane timothy rectangular fiscal heating teammate basket particle anonymous deployment ##ple missiles courthouse proportion shoe sec ##ller complaints forbes blacks abandon remind sizes overwhelming autobiography natalie ##awa risks contestant countryside babies scorer invaded enclosed proceed hurling disorders ##cu reflecting continuously cruiser graduates freeway investigated ore deserved maid blocking phillip jorge shakes dove mann variables lacked burden accompanying que consistently organizing provisional complained endless ##rm tubes juice georges krishna mick labels thriller ##uch laps arcade sage snail ##table shannon fi laurence seoul vacation presenting hire churchill surprisingly prohibited savannah technically ##oli 170 ##lessly testimony suited speeds toys romans mlb flowering measurement talented kay settings charleston expectations shattered achieving triumph ceremonies portsmouth lanes mandatory loser stretching cologne realizes seventy cornell careers webb ##ulating americas budapest ava suspicion ##ison yo conrad ##hai sterling jessie rector ##az 1831 transform organize loans christine volcanic warrant slender summers subfamily newer danced dynamics rhine proceeds heinrich gastropod commands sings facilitate easter ra positioned responses expense fruits yanked imported 25th velvet vic primitive tribune baldwin neighbourhood donna rip hay pr ##uro 1814 espn welcomed ##aria qualifier glare highland timing ##cted shells eased geometry louder exciting slovakia ##sion ##iz ##lot savings prairie ##ques marching rafael tonnes ##lled curtain preceding shy heal greene worthy ##pot detachment bury sherman ##eck reinforced seeks bottles contracted duchess outfit walsh ##sc mickey ##ase geoffrey archer squeeze dawson eliminate invention ##enberg neal ##eth stance dealer coral maple retire polo simplified ##ht 1833 hid watts backwards jules ##oke genesis mt frames rebounds burma woodland moist santos whispers drained subspecies ##aa streaming ulster burnt correspondence maternal gerard denis stealing ##load genius duchy ##oria inaugurated momentum suits placement sovereign clause thames ##hara confederation reservation sketch yankees lets rotten charm hal verses ultra commercially dot salon citation adopt winnipeg mist allocated cairo ##boy jenkins interference objectives ##wind 1820 portfolio armoured sectors ##eh initiatives ##world integrity exercises robe tap ab gazed ##tones distracted rulers 111 favorable jerome tended cart factories ##eri diplomat valued gravel charitable ##try calvin exploring chang shepherd terrace pdf pupil ##ural reflects ups ##rch governors shelf depths ##nberg trailed crest tackle ##nian ##ats hatred ##kai clare makers ethiopia longtime detected embedded lacking slapped rely thomson anticipation iso morton successive agnes screenwriter straightened philippe playwright haunted licence iris intentions sutton 112 logical correctly ##weight branded licked tipped silva ricky narrator requests ##ents greeted supernatural cow ##wald lung refusing employer strait gaelic liner ##piece zoe sabha ##mba driveway harvest prints bates reluctantly threshold algebra ira wherever coupled 240 assumption picks ##air designers raids gentlemen ##ean roller blowing leipzig locks screw dressing strand ##lings scar dwarf depicts ##nu nods ##mine differ boris ##eur yuan flip ##gie mob invested questioning applying ##ture shout ##sel gameplay blamed illustrations bothered weakness rehabilitation ##of ##zes envelope rumors miners leicester subtle kerry ##ico ferguson ##fu premiership ne ##cat bengali prof catches remnants dana ##rily shouting presidents baltic ought ghosts dances sailors shirley fancy dominic ##bie madonna ##rick bark buttons gymnasium ashes liver toby oath providence doyle evangelical nixon cement carnegie embarked hatch surroundings guarantee needing pirate essence ##bee filter crane hammond projected immune percy twelfth ##ult regent doctoral damon mikhail ##ichi lu critically elect realised abortion acute screening mythology steadily ##fc frown nottingham kirk wa minneapolis ##rra module algeria mc nautical encounters surprising statues availability shirts pie alma brows munster mack soup crater tornado sanskrit cedar explosive bordered dixon planets stamp exam happily ##bble carriers kidnapped ##vis accommodation emigrated ##met knockout correspondent violation profits peaks lang specimen agenda ancestry pottery spelling equations obtaining ki linking 1825 debris asylum ##20 buddhism teddy ##ants gazette ##nger ##sse dental eligibility utc fathers averaged zimbabwe francesco coloured hissed translator lynch mandate humanities mackenzie uniforms lin ##iana ##gio asset mhz fitting samantha genera wei rim beloved shark riot entities expressions indo carmen slipping owing abbot neighbor sidney ##av rats recommendations encouraging squadrons anticipated commanders conquered ##oto donations diagnosed ##mond divide ##iva guessed decoration vernon auditorium revelation conversations ##kers ##power herzegovina dash alike protested lateral herman accredited mg ##gent freeman mel fiji crow crimson ##rine livestock ##pped humanitarian bored oz whip ##lene ##ali legitimate alter grinning spelled anxious oriental wesley ##nin ##hole carnival controller detect ##ssa bowed educator kosovo macedonia ##sin occupy mastering stephanie janeiro para unaware nurses noon 135 cam hopefully ranger combine sociology polar rica ##eer neill ##sman holocaust ##ip doubled lust 1828 109 decent cooling unveiled ##card 1829 nsw homer chapman meyer ##gin dive mae reagan expertise ##gled darwin brooke sided prosecution investigating comprised petroleum genres reluctant differently trilogy johns vegetables corpse highlighted lounge pension unsuccessfully elegant aided ivory beatles amelia cain dubai sunny immigrant babe click ##nder underwater pepper combining mumbled atlas horns accessed ballad physicians homeless gestured rpm freak louisville corporations patriots prizes rational warn modes decorative overnight din troubled phantom ##ort monarch sheer ##dorf generals guidelines organs addresses ##zon enhance curling parishes cord ##kie linux caesar deutsche bavaria ##bia coleman cyclone ##eria bacon petty ##yama ##old hampton diagnosis 1824 throws complexity rita disputed ##₃ pablo ##sch marketed trafficking ##ulus examine plague formats ##oh vault faithful ##bourne webster ##ox highlights ##ient ##ann phones vacuum sandwich modeling ##gated bolivia clergy qualities isabel ##nas ##ars wears screams reunited annoyed bra ##ancy ##rate differential transmitter tattoo container poker ##och excessive resides cowboys ##tum augustus trash providers statute retreated balcony reversed void storey preceded masses leap laughs neighborhoods wards schemes falcon santo battlefield pad ronnie thread lesbian venus ##dian beg sandstone daylight punched gwen analog stroked wwe acceptable measurements dec toxic ##kel adequate surgical economist parameters varsity ##sberg quantity ella ##chy ##rton countess generating precision diamonds expressway ga ##ı 1821 uruguay talents galleries expenses scanned colleague outlets ryder lucien ##ila paramount ##bon syracuse dim fangs gown sweep ##sie toyota missionaries websites ##nsis sentences adviser val trademark spells ##plane patience starter slim ##borg toe incredibly shoots elliot nobility ##wyn cowboy endorsed gardner tendency persuaded organisms emissions kazakhstan amused boring chips themed ##hand llc constantinople chasing systematic guatemala borrowed erin carey ##hard highlands struggles 1810 ##ifying ##ced wong exceptions develops enlarged kindergarten castro ##ern ##rina leigh zombie juvenile ##most consul ##nar sailor hyde clarence intensive pinned nasty useless jung clayton stuffed exceptional ix apostolic 230 transactions ##dge exempt swinging cove religions ##ash shields dairy bypass 190 pursuing bug joyce bombay chassis southampton chat interact redesignated ##pen nascar pray salmon rigid regained malaysian grim publicity constituted capturing toilet delegate purely tray drift loosely striker weakened trinidad mitch itv defines transmitted ming scarlet nodding fitzgerald fu narrowly sp tooth standings virtue ##₁ ##wara ##cting chateau gloves lid ##nel hurting conservatory ##pel sinclair reopened sympathy nigerian strode advocated optional chronic discharge ##rc suck compatible laurel stella shi fails wage dodge 128 informal sorts levi buddha villagers ##aka chronicles heavier summoned gateway 3000 eleventh jewelry translations accordingly seas ##ency fiber pyramid cubic dragging ##ista caring ##ops android contacted lunar ##dt kai lisbon patted 1826 sacramento theft madagascar subtropical disputes ta holidays piper willow mare cane itunes newfoundland benny companions dong raj observe roar charming plaque tibetan fossils enacted manning bubble tina tanzania ##eda ##hir funk swamp deputies cloak ufc scenario par scratch metals anthem guru engaging specially ##boat dialects nineteen cecil duet disability messenger unofficial ##lies defunct eds moonlight drainage surname puzzle honda switching conservatives mammals knox broadcaster sidewalk cope ##ried benson princes peterson ##sal bedford sharks eli wreck alberto gasp archaeology lgbt teaches securities madness compromise waving coordination davidson visions leased possibilities eighty jun fernandez enthusiasm assassin sponsorship reviewer kingdoms estonian laboratories ##fy ##nal applies verb celebrations ##zzo rowing lightweight sadness submit mvp balanced dude ##vas explicitly metric magnificent mound brett mohammad mistakes irregular ##hing ##ass sanders betrayed shipped surge ##enburg reporters termed georg pity verbal bulls abbreviated enabling appealed ##are ##atic sicily sting heel sweetheart bart spacecraft brutal monarchy ##tter aberdeen cameo diane ##ub survivor clyde ##aries complaint ##makers clarinet delicious chilean karnataka coordinates 1818 panties ##rst pretending ar dramatically kiev bella tends distances 113 catalog launching instances telecommunications portable lindsay vatican ##eim angles aliens marker stint screens bolton ##rne judy wool benedict plasma europa spark imaging filmmaker swiftly ##een contributor ##nor opted stamps apologize financing butter gideon sophisticated alignment avery chemicals yearly speculation prominence professionally ##ils immortal institutional inception wrists identifying tribunal derives gains ##wo papal preference linguistic vince operative brewery ##ont unemployment boyd ##ured ##outs albeit prophet 1813 bi ##rr ##face ##rad quarterly asteroid cleaned radius temper ##llen telugu jerk viscount menu ##ote glimpse ##aya yacht hawaiian baden ##rl laptop readily ##gu monetary offshore scots watches ##yang ##arian upgrade needle xbox lea encyclopedia flank fingertips ##pus delight teachings confirm roth beaches midway winters ##iah teasing daytime beverly gambling bonnie ##backs regulated clement hermann tricks knot ##shing ##uring ##vre detached ecological owed specialty byron inventor bats stays screened unesco midland trim affection ##ander ##rry jess thoroughly feedback ##uma chennai strained heartbeat wrapping overtime pleaded ##sworth mon leisure oclc ##tate ##ele feathers angelo thirds nuts surveys clever gill commentator ##dos darren rides gibraltar ##nc ##mu dissolution dedication shin meals saddle elvis reds chaired taller appreciation functioning niece favored advocacy robbie criminals suffolk yugoslav passport constable congressman hastings vera ##rov consecrated sparks ecclesiastical confined ##ovich muller floyd nora 1822 paved 1827 cumberland ned saga spiral ##flow appreciated yi collaborative treating similarities feminine finishes ##ib jade import ##nse ##hot champagne mice securing celebrities helsinki attributes ##gos cousins phases ache lucia gandhi submission vicar spear shine tasmania biting detention constitute tighter seasonal ##gus terrestrial matthews ##oka effectiveness parody philharmonic ##onic 1816 strangers encoded consortium guaranteed regards shifts tortured collision supervisor inform broader insight theaters armour emeritus blink incorporates mapping ##50 ##ein handball flexible ##nta substantially generous thief ##own carr loses 1793 prose ucla romeo generic metallic realization damages mk commissioners zach default ##ther helicopters lengthy stems spa partnered spectators rogue indication penalties teresa 1801 sen ##tric dalton ##wich irving photographic ##vey dell deaf peters excluded unsure ##vable patterson crawled ##zio resided whipped latvia slower ecole pipes employers maharashtra comparable va textile pageant ##gel alphabet binary irrigation chartered choked antoine offs waking supplement ##wen quantities demolition regain locate urdu folks alt 114 ##mc scary andreas whites ##ava classrooms mw aesthetic publishes valleys guides cubs johannes bryant conventions affecting ##itt drain awesome isolation prosecutor ambitious apology captive downs atmospheric lorenzo aisle beef foul ##onia kidding composite disturbed illusion natives ##ffer emi rockets riverside wartime painters adolf melted ##ail uncertainty simulation hawks progressed meantime builder spray breach unhappy regina russians ##urg determining ##tation tram 1806 ##quin aging ##12 1823 garion rented mister diaz terminated clip 1817 depend nervously disco owe defenders shiva notorious disbelief shiny worcester ##gation ##yr trailing undertook islander belarus limitations watershed fuller overlooking utilized raphael 1819 synthetic breakdown klein ##nate moaned memoir lamb practicing ##erly cellular arrows exotic ##graphy witches 117 charted rey hut hierarchy subdivision freshwater giuseppe aloud reyes qatar marty sideways utterly sexually jude prayers mccarthy softball blend damien ##gging ##metric wholly erupted lebanese negro revenues tasted comparative teamed transaction labeled maori sovereignty parkway trauma gran malay 121 advancement descendant 2020 buzz salvation inventory symbolic ##making antarctica mps ##gas ##bro mohammed myanmar holt submarines tones ##lman locker patriarch bangkok emerson remarks predators kin afghan confession norwich rental emerge advantages ##zel rca ##hold shortened storms aidan ##matic autonomy compliance ##quet dudley atp ##osis 1803 motto documentation summary professors spectacular christina archdiocese flashing innocence remake ##dell psychic reef scare employ rs sticks meg gus leans ##ude accompany bergen tomas ##iko doom wages pools ##nch ##bes breasts scholarly alison outline brittany breakthrough willis realistic ##cut ##boro competitor ##stan pike picnic icon designing commercials washing villain skiing micro costumes auburn halted executives ##hat logistics cycles vowel applicable barrett exclaimed eurovision eternity ramon ##umi ##lls modifications sweeping disgust ##uck torch aviv ensuring rude dusty sonic donovan outskirts cu pathway ##band ##gun ##lines disciplines acids cadet paired ##40 sketches ##sive marriages ##⁺ folding peers slovak implies admired ##beck 1880s leopold instinct attained weston megan horace ##ination dorsal ingredients evolutionary ##its complications deity lethal brushing levy deserted institutes posthumously delivering telescope coronation motivated rapids luc flicked pays volcano tanner weighed ##nica crowds frankie gifted addressing granddaughter winding ##rna constantine gomez ##front landscapes rudolf anthropology slate werewolf ##lio astronomy circa rouge dreaming sack knelt drowned naomi prolific tracked freezing herb ##dium agony randall twisting wendy deposit touches vein wheeler ##bbled ##bor batted retaining tire presently compare specification daemon nigel ##grave merry recommendation czechoslovakia sandra ng roma ##sts lambert inheritance sheikh winchester cries examining ##yle comeback cuisine nave ##iv ko retrieve tomatoes barker polished defining irene lantern personalities begging tract swore 1809 175 ##gic omaha brotherhood ##rley haiti ##ots exeter ##ete ##zia steele dumb pearson 210 surveyed elisabeth trends ##ef fritz ##rf premium bugs fraction calmly viking ##birds tug inserted unusually ##ield confronted distress crashing brent turks resign ##olo cambodia gabe sauce ##kal evelyn 116 extant clusters quarry teenagers luna ##lers ##ister affiliation drill ##ashi panthers scenic libya anita strengthen inscriptions ##cated lace sued judith riots ##uted mint ##eta preparations midst dub challenger ##vich mock cf displaced wicket breaths enables schmidt analyst ##lum ag highlight automotive axe josef newark sufficiently resembles 50th ##pal flushed mum traits ##ante commodore incomplete warming titular ceremonial ethical 118 celebrating eighteenth cao lima medalist mobility strips snakes ##city miniature zagreb barton escapes umbrella automated doubted differs cooled georgetown dresden cooked fade wyatt rna jacobs carlton abundant stereo boost madras inning ##hia spur ip malayalam begged osaka groan escaping charging dose vista ##aj bud papa communists advocates edged tri ##cent resemble peaking necklace fried montenegro saxony goose glances stuttgart curator recruit grocery sympathetic ##tting ##fort 127 lotus randolph ancestor ##rand succeeding jupiter 1798 macedonian ##heads hiking 1808 handing fischer ##itive garbage node ##pies prone singular papua inclined attractions italia pouring motioned grandma garnered jacksonville corp ego ringing aluminum ##hausen ordering ##foot drawer traders synagogue ##play ##kawa resistant wandering fragile fiona teased var hardcore soaked jubilee decisive exposition mercer poster valencia hale kuwait 1811 ##ises ##wr ##eed tavern gamma 122 johan ##uer airways amino gil ##ury vocational domains torres ##sp generator folklore outcomes ##keeper canberra shooter fl beams confrontation ##lling ##gram feb aligned forestry pipeline jax motorway conception decay ##tos coffin ##cott stalin 1805 escorted minded ##nam sitcom purchasing twilight veronica additions passive tensions straw 123 frequencies 1804 refugee cultivation ##iate christie clary bulletin crept disposal ##rich ##zong processor crescent ##rol bmw emphasized whale nazis aurora ##eng dwelling hauled sponsors toledo mega ideology theatres tessa cerambycidae saves turtle cone suspects kara rusty yelling greeks mozart shades cocked participant ##tro shire spit freeze necessity ##cos inmates nielsen councillors loaned uncommon omar peasants botanical offspring daniels formations jokes 1794 pioneers sigma licensing ##sus wheelchair polite 1807 liquor pratt trustee ##uta forewings balloon ##zz kilometre camping explicit casually shawn foolish teammates nm hassan carrie judged satisfy vanessa knives selective cnn flowed ##lice eclipse stressed eliza mathematician cease cultivated ##roy commissions browns ##ania destroyers sheridan meadow ##rius minerals ##cial downstream clash gram memoirs ventures baha seymour archie midlands edith fare flynn invite canceled tiles stabbed boulder incorporate amended camden facial mollusk unreleased descriptions yoga grabs 550 raises ramp shiver ##rose coined pioneering tunes qing warwick tops 119 melanie giles ##rous wandered ##inal annexed nov 30th unnamed ##ished organizational airplane normandy stoke whistle blessing violations chased holders shotgun ##ctic outlet reactor ##vik tires tearing shores fortified mascot constituencies nc columnist productive tibet ##rta lineage hooked oct tapes judging cody ##gger hansen kashmir triggered ##eva solved cliffs ##tree resisted anatomy protesters transparent implied ##iga injection mattress excluding ##mbo defenses helpless devotion ##elli growl liberals weber phenomena atoms plug ##iff mortality apprentice howe convincing aaa swimmer barber leone promptly sodium def nowadays arise ##oning gloucester corrected dignity norm erie ##ders elders evacuated sylvia compression ##yar hartford pose backpack reasoning accepts 24th wipe millimetres marcel ##oda dodgers albion 1790 overwhelmed aerospace oaks 1795 showcase acknowledge recovering nolan ashe hurts geology fashioned disappearance farewell swollen shrug marquis wimbledon 124 rue 1792 commemorate reduces experiencing inevitable calcutta intel ##court murderer sticking fisheries imagery bloom 280 brake ##inus gustav hesitation memorable po viral beans accidents tunisia antenna spilled consort treatments aye perimeter ##gard donation hostage migrated banker addiction apex lil trout ##ously conscience ##nova rams sands genome passionate troubles ##lets ##set amid ##ibility ##ret higgins exceed vikings ##vie payne ##zan muscular ##ste defendant sucking ##wal ibrahim fuselage claudia vfl europeans snails interval ##garh preparatory statewide tasked lacrosse viktor ##lation angola ##hra flint implications employs teens patrons stall weekends barriers scrambled nucleus tehran jenna parsons lifelong robots displacement 5000 ##bles precipitation ##gt knuckles clutched 1802 marrying ecology marx accusations declare scars kolkata mat meadows bermuda skeleton finalists vintage crawl coordinate affects subjected orchestral mistaken ##tc mirrors dipped relied 260 arches candle ##nick incorporating wildly fond basilica owl fringe rituals whispering stirred feud tertiary slick goat honorable whereby skip ricardo stripes parachute adjoining submerged synthesizer ##gren intend positively ninety phi beaver partition fellows alexis prohibition carlisle bizarre fraternity ##bre doubts icy cbc aquatic sneak sonny combines airports crude supervised spatial merge alfonso ##bic corrupt scan undergo ##ams disabilities colombian comparing dolphins perkins ##lish reprinted unanimous bounced hairs underworld midwest semester bucket paperback miniseries coventry demise ##leigh demonstrations sensor rotating yan ##hler arrange soils ##idge hyderabad labs ##dr brakes grandchildren ##nde negotiated rover ferrari continuation directorate augusta stevenson counterpart gore ##rda nursery rican ave collectively broadly pastoral repertoire asserted discovering nordic styled fiba cunningham harley middlesex survives tumor tempo zack aiming lok urgent ##rade ##nto devils ##ement contractor turin ##wl ##ool bliss repaired simmons moan astronomical cr negotiate lyric 1890s lara bred clad angus pbs ##ience engineered posed ##lk hernandez possessions elbows psychiatric strokes confluence electorate lifts campuses lava alps ##ep ##ution ##date physicist woody ##page ##ographic ##itis juliet reformation sparhawk 320 complement suppressed jewel ##½ floated ##kas continuity sadly ##ische inability melting scanning paula flour judaism safer vague ##lm solving curb ##stown financially gable bees expired miserable cassidy dominion 1789 cupped 145 robbery facto amos warden resume tallest marvin ing pounded usd declaring gasoline ##aux darkened 270 650 sophomore ##mere erection gossip televised risen dial ##eu pillars ##link passages profound ##tina arabian ashton silicon nail ##ead ##lated ##wer ##hardt fleming firearms ducked circuits blows waterloo titans ##lina atom fireplace cheshire financed activation algorithms ##zzi constituent catcher cherokee partnerships sexuality platoon tragic vivian guarded whiskey meditation poetic ##late ##nga ##ake porto listeners dominance kendra mona chandler factions 22nd salisbury attitudes derivative ##ido ##haus intake paced javier illustrator barrels bias cockpit burnett dreamed ensuing ##anda receptors someday hawkins mattered ##lal slavic 1799 jesuit cameroon wasted tai wax lowering victorious freaking outright hancock librarian sensing bald calcium myers tablet announcing barack shipyard pharmaceutical ##uan greenwich flush medley patches wolfgang pt speeches acquiring exams nikolai ##gg hayden kannada ##type reilly ##pt waitress abdomen devastated capped pseudonym pharmacy fulfill paraguay 1796 clicked ##trom archipelago syndicated ##hman lumber orgasm rejection clifford lorraine advent mafia rodney brock ##ght ##used ##elia cassette chamberlain despair mongolia sensors developmental upstream ##eg ##alis spanning 165 trombone basque seeded interred renewable rhys leapt revision molecule ##ages chord vicious nord shivered 23rd arlington debts corpus sunrise bays blackburn centimetres ##uded shuddered gm strangely gripping cartoons isabelle orbital ##ppa seals proving ##lton refusal strengthened bust assisting baghdad batsman portrayal mara pushes spears og ##cock reside nathaniel brennan 1776 confirmation caucus ##worthy markings yemen nobles ku lazy viewer catalan encompasses sawyer ##fall sparked substances patents braves arranger evacuation sergio persuade dover tolerance penguin cum jockey insufficient townships occupying declining plural processed projection puppet flanders introduces liability ##yon gymnastics antwerp taipei hobart candles jeep wes observers 126 chaplain bundle glorious ##hine hazel flung sol excavations dumped stares sh bangalore triangular icelandic intervals expressing turbine ##vers songwriting crafts ##igo jasmine ditch rite ##ways entertaining comply sorrow wrestlers basel emirates marian rivera helpful ##some caution downward networking ##atory ##tered darted genocide emergence replies specializing spokesman convenient unlocked fading augustine concentrations resemblance elijah investigator andhra ##uda promotes bean ##rrell fleeing wan simone announcer ##ame ##bby lydia weaver 132 residency modification ##fest stretches ##ast alternatively nat lowe lacks ##ented pam tile concealed inferior abdullah residences tissues vengeance ##ided moisture peculiar groove zip bologna jennings ninja oversaw zombies pumping batch livingston emerald installations 1797 peel nitrogen rama ##fying ##star schooling strands responding werner ##ost lime casa accurately targeting ##rod underway ##uru hemisphere lester ##yard occupies 2d griffith angrily reorganized ##owing courtney deposited ##dd ##30 estadio ##ifies dunn exiled ##ying checks ##combe ##о ##fly successes unexpectedly blu assessed ##flower ##ه observing sacked spiders kn ##tail mu nodes prosperity audrey divisional 155 broncos tangled adjust feeds erosion paolo surf directory snatched humid admiralty screwed gt reddish ##nese modules trench lamps bind leah bucks competes ##nz ##form transcription ##uc isles violently clutching pga cyclist inflation flats ragged unnecessary ##hian stubborn coordinated harriet baba disqualified 330 insect wolfe ##fies reinforcements rocked duel winked embraced bricks ##raj hiatus defeats pending brightly jealousy ##xton ##hm ##uki lena gdp colorful ##dley stein kidney ##shu underwear wanderers ##haw ##icus guardians m³ roared habits ##wise permits gp uranium punished disguise bundesliga elise dundee erotic partisan pi collectors float individually rendering behavioral bucharest ser hare valerie corporal nutrition proportional ##isa immense ##kis pavement ##zie ##eld sutherland crouched 1775 ##lp suzuki trades endurance operas crosby prayed priory rory socially ##urn gujarat ##pu walton cube pasha privilege lennon floods thorne waterfall nipple scouting approve ##lov minorities voter dwight extensions assure ballroom slap dripping privileges rejoined confessed demonstrating patriotic yell investor ##uth pagan slumped squares ##cle ##kins confront bert embarrassment ##aid aston urging sweater starr yuri brains williamson commuter mortar structured selfish exports ##jon cds ##him unfinished ##rre mortgage destinations ##nagar canoe solitary buchanan delays magistrate fk ##pling motivation ##lier ##vier recruiting assess ##mouth malik antique 1791 pius rahman reich tub zhou smashed airs galway xii conditioning honduras discharged dexter ##pf lionel 129 debates lemon tiffany volunteered dom dioxide procession devi sic tremendous advertisements colts transferring verdict hanover decommissioned utter relate pac racism ##top beacon limp similarity terra occurrence ant ##how becky capt updates armament richie pal ##graph halloween mayo ##ssen ##bone cara serena fcc dolls obligations ##dling violated lafayette jakarta exploitation ##ime infamous iconic ##lah ##park kitty moody reginald dread spill crystals olivier modeled bluff equilibrium separating notices ordnance extinction onset cosmic attachment sammy expose privy anchored ##bil abbott admits bending baritone emmanuel policeman vaughan winged climax dresses denny polytechnic mohamed burmese authentic nikki genetics grandparents homestead gaza postponed metacritic una ##sby ##bat unstable dissertation ##rial ##cian curls obscure uncovered bronx praying disappearing ##hoe prehistoric coke turret mutations nonprofit pits monaco ##ي ##usion prominently dispatched podium ##mir uci ##uation 133 fortifications birthplace kendall ##lby ##oll preacher rack goodman ##rman persistent ##ott countless jaime recorder lexington persecution jumps renewal wagons ##11 crushing ##holder decorations ##lake abundance wrath laundry £1 garde ##rp jeanne beetles peasant ##sl splitting caste sergei ##rer ##ema scripts ##ively rub satellites ##vor inscribed verlag scrapped gale packages chick potato slogan kathleen arabs ##culture counterparts reminiscent choral ##tead rand retains bushes dane accomplish courtesy closes ##oth slaughter hague krakow lawson tailed elias ginger ##ttes canopy betrayal rebuilding turf ##hof frowning allegiance brigades kicks rebuild polls alias nationalism td rowan audition bowie fortunately recognizes harp dillon horrified ##oro renault ##tics ropes ##α presumed rewarded infrared wiping accelerated illustration ##rid presses practitioners badminton ##iard detained ##tera recognizing relates misery ##sies ##tly reproduction piercing potatoes thornton esther manners hbo ##aan ours bullshit ernie perennial sensitivity illuminated rupert ##jin ##iss ##ear rfc nassau ##dock staggered socialism ##haven appointments nonsense prestige sharma haul ##tical solidarity gps ##ook ##rata igor pedestrian ##uit baxter tenants wires medication unlimited guiding impacts diabetes ##rama sasha pas clive extraction 131 continually constraints ##bilities sonata hunted sixteenth chu planting quote mayer pretended abs spat ##hua ceramic ##cci curtains pigs pitching ##dad latvian sore dayton ##sted ##qi patrols slice playground ##nted shone stool apparatus inadequate mates treason ##ija desires ##liga ##croft somalia laurent mir leonardo oracle grape obliged chevrolet thirteenth stunning enthusiastic ##ede accounted concludes currents basil ##kovic drought ##rica mai ##aire shove posting ##shed pilgrimage humorous packing fry pencil wines smells 144 marilyn aching newest clung bon neighbours sanctioned ##pie mug ##stock drowning ##mma hydraulic ##vil hiring reminder lilly investigators ##ncies sour ##eous compulsory packet ##rion ##graphic ##elle cannes ##inate depressed ##rit heroic importantly theresa ##tled conway saturn marginal rae ##xia corresponds royce pact jasper explosives packaging aluminium ##ttered denotes rhythmic spans assignments hereditary outlined originating sundays lad reissued greeting beatrice ##dic pillar marcos plots handbook alcoholic judiciary avant slides extract masculine blur ##eum ##force homage trembled owens hymn trey omega signaling socks accumulated reacted attic theo lining angie distraction primera talbot ##key 1200 ti creativity billed ##hey deacon eduardo identifies proposition dizzy gunner hogan ##yam ##pping ##hol ja ##chan jensen reconstructed ##berger clearance darius ##nier abe harlem plea dei circled emotionally notation fascist neville exceeded upwards viable ducks ##fo workforce racer limiting shri ##lson possesses 1600 kerr moths devastating laden disturbing locking ##cture gal fearing accreditation flavor aide 1870s mountainous ##baum melt ##ures motel texture servers soda ##mb herd ##nium erect puzzled hum peggy examinations gould testified geoff ren devised sacks ##law denial posters grunted cesar tutor ec gerry offerings byrne falcons combinations ct incoming pardon rocking 26th avengers flared mankind seller uttar loch nadia stroking exposing ##hd fertile ancestral instituted ##has noises prophecy taxation eminent vivid pol ##bol dart indirect multimedia notebook upside displaying adrenaline referenced geometric ##iving progression ##ddy blunt announce ##far implementing ##lav aggression liaison cooler cares headache plantations gorge dots impulse thickness ashamed averaging kathy obligation precursor 137 fowler symmetry thee 225 hears ##rai undergoing ads butcher bowler ##lip cigarettes subscription goodness ##ically browne ##hos ##tech kyoto donor ##erty damaging friction drifting expeditions hardened prostitution 152 fauna blankets claw tossing snarled butterflies recruits investigative coated healed 138 communal hai xiii academics boone psychologist restless lahore stephens mba brendan foreigners printer ##pc ached explode 27th deed scratched dared ##pole cardiac 1780 okinawa proto commando compelled oddly electrons ##base replica thanksgiving ##rist sheila deliberate stafford tidal representations hercules ou ##path ##iated kidnapping lenses ##tling deficit samoa mouths consuming computational maze granting smirk razor fixture ideals inviting aiden nominal ##vs issuing julio pitt ramsey docks ##oss exhaust ##owed bavarian draped anterior mating ethiopian explores noticing ##nton discarded convenience hoffman endowment beasts cartridge mormon paternal probe sleeves interfere lump deadline ##rail jenks bulldogs scrap alternating justified reproductive nam seize descending secretariat kirby coupe grouped smash panther sedan tapping ##18 lola cheer germanic unfortunate ##eter unrelated ##fan subordinate ##sdale suzanne advertisement ##ility horsepower ##lda cautiously discourse luigi ##mans ##fields noun prevalent mao schneider everett surround governorate kira ##avia westward ##take misty rails sustainability 134 unused ##rating packs toast unwilling regulate thy suffrage nile awe assam definitions travelers affordable ##rb conferred sells undefeated beneficial torso basal repeating remixes ##pass bahrain cables fang ##itated excavated numbering statutory ##rey deluxe ##lian forested ramirez derbyshire zeus slamming transfers astronomer banana lottery berg histories bamboo ##uchi resurrection posterior bowls vaguely ##thi thou preserving tensed offence ##inas meyrick callum ridden watt langdon tying lowland snorted daring truman ##hale ##girl aura overly filing weighing goa infections philanthropist saunders eponymous ##owski latitude perspectives reviewing mets commandant radial ##kha flashlight reliability koch vowels amazed ada elaine supper ##rth ##encies predator debated soviets cola ##boards ##nah compartment crooked arbitrary fourteenth ##ctive havana majors steelers clips profitable ambush exited packers ##tile nude cracks fungi ##е limb trousers josie shelby tens frederic ##ος definite smoothly constellation insult baton discs lingering ##nco conclusions lent staging becker grandpa shaky ##tron einstein obstacles sk adverse elle economically ##moto mccartney thor dismissal motions readings nostrils treatise ##pace squeezing evidently prolonged 1783 venezuelan je marguerite beirut takeover shareholders ##vent denise digit airplay norse ##bbling imaginary pills hubert blaze vacated eliminating ##ello vine mansfield ##tty retrospective barrow borne clutch bail forensic weaving ##nett ##witz desktop citadel promotions worrying dorset ieee subdivided ##iating manned expeditionary pickup synod chuckle 185 barney ##rz ##ffin functionality karachi litigation meanings uc lick turbo anders ##ffed execute curl oppose ankles typhoon ##د ##ache ##asia linguistics compassion pressures grazing perfection ##iting immunity monopoly muddy backgrounds 136 namibia francesca monitors attracting stunt tuition ##ии vegetable ##mates ##quent mgm jen complexes forts ##ond cellar bites seventeenth royals flemish failures mast charities ##cular peruvian capitals macmillan ipswich outward frigate postgraduate folds employing ##ouse concurrently fiery ##tai contingent nightmares monumental nicaragua ##kowski lizard mal fielding gig reject ##pad harding ##ipe coastline ##cin ##nos beethoven humphrey innovations ##tam ##nge norris doris solicitor huang obey 141 ##lc niagara ##tton shelves aug bourbon curry nightclub specifications hilton ##ndo centennial dispersed worm neglected briggs sm font kuala uneasy plc ##nstein ##bound ##aking ##burgh awaiting pronunciation ##bbed ##quest eh optimal zhu raped greens presided brenda worries ##life venetian marxist turnout ##lius refined braced sins grasped sunderland nickel speculated lowell cyrillic communism fundraising resembling colonists mutant freddie usc ##mos gratitude ##run mural ##lous chemist wi reminds 28th steals tess pietro ##ingen promoter ri microphone honoured rai sant ##qui feather ##nson burlington kurdish terrorists deborah sickness ##wed ##eet hazard irritated desperation veil clarity ##rik jewels xv ##gged ##ows ##cup berkshire unfair mysteries orchid winced exhaustion renovations stranded obe infinity ##nies adapt redevelopment thanked registry olga domingo noir tudor ole ##atus commenting behaviors ##ais crisp pauline probable stirling wigan ##bian paralympics panting surpassed ##rew luca barred pony famed ##sters cassandra waiter carolyn exported ##orted andres destructive deeds jonah castles vacancy suv ##glass 1788 orchard yep famine belarusian sprang ##forth skinny ##mis administrators rotterdam zambia zhao boiler discoveries ##ride ##physics lucius disappointing outreach spoon ##frame qualifications unanimously enjoys regency ##iidae stade realism veterinary rodgers dump alain chestnut castile censorship rumble gibbs ##itor communion reggae inactivated logs loads ##houses homosexual ##iano ale informs ##cas phrases plaster linebacker ambrose kaiser fascinated 850 limerick recruitment forge mastered ##nding leinster rooted threaten ##strom borneo ##hes suggestions scholarships propeller documentaries patronage coats constructing invest neurons comet entirety shouts identities annoying unchanged wary ##antly ##ogy neat oversight ##kos phillies replay constance ##kka incarnation humble skies minus ##acy smithsonian ##chel guerrilla jar cadets ##plate surplus audit ##aru cracking joanna louisa pacing ##lights intentionally ##iri diner nwa imprint australians tong unprecedented bunker naive specialists ark nichols railing leaked pedal ##uka shrub longing roofs v8 captains neural tuned ##ntal ##jet emission medina frantic codex definitive sid abolition intensified stocks enrique sustain genoa oxide ##written clues cha ##gers tributaries fragment venom ##rity ##ente ##sca muffled vain sire laos ##ingly ##hana hastily snapping surfaced sentiment motive ##oft contests approximate mesa luckily dinosaur exchanges propelled accord bourne relieve tow masks offended ##ues cynthia ##mmer rains bartender zinc reviewers lois ##sai legged arrogant rafe rosie comprise handicap blockade inlet lagoon copied drilling shelley petals ##inian mandarin obsolete ##inated onward arguably productivity cindy praising seldom busch discusses raleigh shortage ranged stanton encouragement firstly conceded overs temporal ##uke cbe ##bos woo certainty pumps ##pton stalked ##uli lizzie periodic thieves weaker ##night gases shoving chooses wc ##chemical prompting weights ##kill robust flanked sticky hu tuberculosis ##eb ##eal christchurch resembled wallet reese inappropriate pictured distract fixing fiddle giggled burger heirs hairy mechanic torque apache obsessed chiefly cheng logging ##tag extracted meaningful numb ##vsky gloucestershire reminding ##bay unite ##lit breeds diminished clown glove 1860s ##ن ##ug archibald focal freelance sliced depiction ##yk organism switches sights stray crawling ##ril lever leningrad interpretations loops anytime reel alicia delighted ##ech inhaled xiv suitcase bernie vega licenses northampton exclusion induction monasteries racecourse homosexuality ##right ##sfield ##rky dimitri michele alternatives ions commentators genuinely objected pork hospitality fencing stephan warships peripheral wit drunken wrinkled quentin spends departing chung numerical spokesperson ##zone johannesburg caliber killers ##udge assumes neatly demographic abigail bloc ##vel mounting ##lain bentley slightest xu recipients ##jk merlin ##writer seniors prisons blinking hindwings flickered kappa ##hel 80s strengthening appealing brewing gypsy mali lashes hulk unpleasant harassment bio treaties predict instrumentation pulp troupe boiling mantle ##ffe ins ##vn dividing handles verbs ##onal coconut senegal 340 thorough gum momentarily ##sto cocaine panicked destined ##turing teatro denying weary captained mans ##hawks ##code wakefield bollywood thankfully ##16 cyril ##wu amendments ##bahn consultation stud reflections kindness 1787 internally ##ovo tex mosaic distribute paddy seeming 143 ##hic piers ##15 ##mura ##verse popularly winger kang sentinel mccoy ##anza covenant ##bag verge fireworks suppress thrilled dominate ##jar swansea ##60 142 reconciliation ##ndi stiffened cue dorian ##uf damascus amor ida foremost ##aga porsche unseen dir ##had ##azi stony lexi melodies ##nko angular integer podcast ants inherent jaws justify persona ##olved josephine ##nr ##ressed customary flashes gala cyrus glaring backyard ariel physiology greenland html stir avon atletico finch methodology ked ##lent mas catholicism townsend branding quincy fits containers 1777 ashore aragon ##19 forearm poisoning ##sd adopting conquer grinding amnesty keller finances evaluate forged lankan instincts ##uto guam bosnian photographed workplace desirable protector ##dog allocation intently encourages willy ##sten bodyguard electro brighter ##ν bihar ##chev lasts opener amphibious sal verde arte ##cope captivity vocabulary yields ##tted agreeing desmond pioneered ##chus strap campaigned railroads ##ович emblem ##dre stormed 501 ##ulous marijuana northumberland ##gn ##nath bowen landmarks beaumont ##qua danube ##bler attorneys th ge flyers critique villains cass mutation acc ##0s colombo mckay motif sampling concluding syndicate ##rell neon stables ds warnings clint mourning wilkinson ##tated merrill leopard evenings exhaled emil sonia ezra discrete stove farrell fifteenth prescribed superhero ##rier worms helm wren ##duction ##hc expo ##rator hq unfamiliar antony prevents acceleration fiercely mari painfully calculations cheaper ign clifton irvine davenport mozambique ##np pierced ##evich wonders ##wig ##cate ##iling crusade ware ##uel enzymes reasonably mls ##coe mater ambition bunny eliot kernel ##fin asphalt headmaster torah aden lush pins waived ##care ##yas joao substrate enforce ##grad ##ules alvarez selections epidemic tempted ##bit bremen translates ensured waterfront 29th forrest manny malone kramer reigning cookies simpler absorption 205 engraved ##ffy evaluated 1778 haze 146 comforting crossover ##abe thorn ##rift ##imo ##pop suppression fatigue cutter ##tr 201 wurttemberg ##orf enforced hovering proprietary gb samurai syllable ascent lacey tick lars tractor merchandise rep bouncing defendants ##yre huntington ##ground ##oko standardized ##hor ##hima assassinated nu predecessors rainy liar assurance lyrical ##uga secondly flattened ios parameter undercover ##mity bordeaux punish ridges markers exodus inactive hesitate debbie nyc pledge savoy nagar offset organist ##tium hesse marin converting ##iver diagram propulsion pu validity reverted supportive ##dc ministries clans responds proclamation ##inae ##ø ##rea ein pleading patriot sf birch islanders strauss hates ##dh brandenburg concession rd ##ob 1900s killings textbook antiquity cinematography wharf embarrassing setup creed farmland inequality centred signatures fallon 370 ##ingham ##uts ceylon gazing directive laurie ##tern globally ##uated ##dent allah excavation threads ##cross 148 frantically icc utilize determines respiratory thoughtful receptions ##dicate merging chandra seine 147 builders builds diagnostic dev visibility goddamn analyses dhaka cho proves chancel concurrent curiously canadians pumped restoring 1850s turtles jaguar sinister spinal traction declan vows 1784 glowed capitalism swirling install universidad ##lder ##oat soloist ##genic ##oor coincidence beginnings nissan dip resorts caucasus combustion infectious ##eno pigeon serpent ##itating conclude masked salad jew ##gr surreal toni ##wc harmonica 151 ##gins ##etic ##coat fishermen intending bravery ##wave klaus titan wembley taiwanese ransom 40th incorrect hussein eyelids jp cooke dramas utilities ##etta ##print eisenhower principally granada lana ##rak openings concord ##bl bethany connie morality sega ##mons ##nard earnings ##kara ##cine wii communes ##rel coma composing softened severed grapes ##17 nguyen analyzed warlord hubbard heavenly behave slovenian ##hit ##ony hailed filmmakers trance caldwell skye unrest coward likelihood ##aging bern sci taliban honolulu propose ##wang 1700 browser imagining cobra contributes dukes instinctively conan violinist ##ores accessories gradual ##amp quotes sioux ##dating undertake intercepted sparkling compressed 139 fungus tombs haley imposing rests degradation lincolnshire retailers wetlands tulsa distributor dungeon nun greenhouse convey atlantis aft exits oman dresser lyons ##sti joking eddy judgement omitted digits ##cts ##game juniors ##rae cents stricken une ##ngo wizards weir breton nan technician fibers liking royalty ##cca 154 persia terribly magician ##rable ##unt vance cafeteria booker camille warmer ##static consume cavern gaps compass contemporaries foyer soothing graveyard maj plunged blush ##wear cascade demonstrates ordinance ##nov boyle ##lana rockefeller shaken banjo izzy ##ense breathless vines ##32 ##eman alterations chromosome dwellings feudal mole 153 catalonia relics tenant mandated ##fm fridge hats honesty patented raul heap cruisers accusing enlightenment infants wherein chatham contractors zen affinity hc osborne piston 156 traps maturity ##rana lagos ##zal peering ##nay attendant dealers protocols subset prospects biographical ##cre artery ##zers insignia nuns endured ##eration recommend schwartz serbs berger cromwell crossroads ##ctor enduring clasped grounded ##bine marseille twitched abel choke https catalyst moldova italians ##tist disastrous wee ##oured ##nti wwf nope ##piration ##asa expresses thumbs 167 ##nza coca 1781 cheating ##ption skipped sensory heidelberg spies satan dangers semifinal 202 bohemia whitish confusing shipbuilding relies surgeons landings ravi baku moor suffix alejandro ##yana litre upheld ##unk rajasthan ##rek coaster insists posture scenarios etienne favoured appoint transgender elephants poked greenwood defences fulfilled militant somali 1758 chalk potent ##ucci migrants wink assistants nos restriction activism niger ##ario colon shaun ##sat daphne ##erated swam congregations reprise considerations magnet playable xvi ##р overthrow tobias knob chavez coding ##mers propped katrina orient newcomer ##suke temperate ##pool farmhouse interrogation ##vd committing ##vert forthcoming strawberry joaquin macau ponds shocking siberia ##cellular chant contributors ##nant ##ologists sped absorb hail 1782 spared ##hore barbados karate opus originates saul ##xie evergreen leaped ##rock correlation exaggerated weekday unification bump tracing brig afb pathways utilizing ##ners mod mb disturbance kneeling ##stad ##guchi 100th pune ##thy decreasing 168 manipulation miriam academia ecosystem occupational rbi ##lem rift ##14 rotary stacked incorporation awakening generators guerrero racist ##omy cyber derivatives culminated allie annals panzer sainte wikipedia pops zu austro ##vate algerian politely nicholson mornings educate tastes thrill dartmouth ##gating db ##jee regan differing concentrating choreography divinity ##media pledged alexandre routing gregor madeline ##idal apocalypse ##hora gunfire culminating elves fined liang lam programmed tar guessing transparency gabrielle ##gna cancellation flexibility ##lining accession shea stronghold nets specializes ##rgan abused hasan sgt ling exceeding ##₄ admiration supermarket ##ark photographers specialised tilt resonance hmm perfume 380 sami threatens garland botany guarding boiled greet puppy russo supplier wilmington vibrant vijay ##bius paralympic grumbled paige faa licking margins hurricanes ##gong fest grenade ripping ##uz counseling weigh ##sian needles wiltshire edison costly ##not fulton tramway redesigned staffordshire cache gasping watkins sleepy candidacy ##group monkeys timeline throbbing ##bid ##sos berth uzbekistan vanderbilt bothering overturned ballots gem ##iger sunglasses subscribers hooker compelling ang exceptionally saloon stab ##rdi carla terrifying rom ##vision coil ##oids satisfying vendors 31st mackay deities overlooked ambient bahamas felipe olympia whirled botanist advertised tugging ##dden disciples morales unionist rites foley morse motives creepy ##₀ soo ##sz bargain highness frightening turnpike tory reorganization ##cer depict biographer ##walk unopposed manifesto ##gles institut emile accidental kapoor ##dam kilkenny cortex lively ##13 romanesque jain shan cannons ##ood ##ske petrol echoing amalgamated disappears cautious proposes sanctions trenton ##ر flotilla aus contempt tor canary cote theirs ##hun conceptual deleted fascinating paso blazing elf honourable hutchinson ##eiro ##outh ##zin surveyor tee amidst wooded reissue intro ##ono cobb shelters newsletter hanson brace encoding confiscated dem caravan marino scroll melodic cows imam ##adi ##aneous northward searches biodiversity cora 310 roaring ##bers connell theologian halo compose pathetic unmarried dynamo ##oot az calculation toulouse deserves humour nr forgiveness tam undergone martyr pamela myths whore counselor hicks 290 heavens battleship electromagnetic ##bbs stellar establishments presley hopped ##chin temptation 90s wills nas ##yuan nhs ##nya seminars ##yev adaptations gong asher lex indicator sikh tobago cites goin ##yte satirical ##gies characterised correspond bubbles lure participates ##vid eruption skate therapeutic 1785 canals wholesale defaulted sac 460 petit ##zzled virgil leak ravens 256 portraying ##yx ghetto creators dams portray vicente ##rington fae namesake bounty ##arium joachim ##ota ##iser aforementioned axle snout depended dismantled reuben 480 ##ibly gallagher ##lau ##pd earnest ##ieu ##iary inflicted objections ##llar asa gritted ##athy jericho ##sea ##was flick underside ceramics undead substituted 195 eastward undoubtedly wheeled chimney ##iche guinness cb ##ager siding ##bell traitor baptiste disguised inauguration 149 tipperary choreographer perched warmed stationary eco ##ike ##ntes bacterial ##aurus flores phosphate ##core attacker invaders alvin intersects a1 indirectly immigrated businessmen cornelius valves narrated pill sober ul nationale monastic applicants scenery ##jack 161 motifs constitutes cpu ##osh jurisdictions sd tuning irritation woven ##uddin fertility gao ##erie antagonist impatient glacial hides boarded denominations interception ##jas cookie nicola ##tee algebraic marquess bahn parole buyers bait turbines paperwork bestowed natasha renee oceans purchases 157 vaccine 215 ##tock fixtures playhouse integrate jai oswald intellectuals ##cky booked nests mortimer ##isi obsession sept ##gler ##sum 440 scrutiny simultaneous squinted ##shin collects oven shankar penned remarkably ##я slips luggage spectral 1786 collaborations louie consolidation ##ailed ##ivating 420 hoover blackpool harness ignition vest tails belmont mongol skinner ##nae visually mage derry ##tism ##unce stevie transitional ##rdy redskins drying prep prospective ##21 annoyance oversee ##loaded fills ##books ##iki announces fda scowled respects prasad mystic tucson ##vale revue springer bankrupt 1772 aristotle salvatore habsburg ##geny dal natal nut pod chewing darts moroccan walkover rosario lenin punjabi ##ße grossed scattering wired invasive hui polynomial corridors wakes gina portrays ##cratic arid retreating erich irwin sniper ##dha linen lindsey maneuver butch shutting socio bounce commemorative postseason jeremiah pines 275 mystical beads bp abbas furnace bidding consulted assaulted empirical rubble enclosure sob weakly cancel polly yielded ##emann curly prediction battered 70s vhs jacqueline render sails barked detailing grayson riga sloane raging ##yah herbs bravo ##athlon alloy giggle imminent suffers assumptions waltz ##itate accomplishments ##ited bathing remixed deception prefix ##emia deepest ##tier ##eis balkan frogs ##rong slab ##pate philosophers peterborough grains imports dickinson rwanda ##atics 1774 dirk lan tablets ##rove clone ##rice caretaker hostilities mclean ##gre regimental treasures norms impose tsar tango diplomacy variously complain 192 recognise arrests 1779 celestial pulitzer ##dus bing libretto ##moor adele splash ##rite expectation lds confronts ##izer spontaneous harmful wedge entrepreneurs buyer ##ope bilingual translate rugged conner circulated uae eaton ##gra ##zzle lingered lockheed vishnu reelection alonso ##oom joints yankee headline cooperate heinz laureate invading ##sford echoes scandinavian ##dham hugging vitamin salute micah hind trader ##sper radioactive ##ndra militants poisoned ratified remark campeonato deprived wander prop ##dong outlook ##tani ##rix ##eye chiang darcy ##oping mandolin spice statesman babylon 182 walled forgetting afro ##cap 158 giorgio buffer ##polis planetary ##gis overlap terminals kinda centenary ##bir arising manipulate elm ke 1770 ak ##tad chrysler mapped moose pomeranian quad macarthur assemblies shoreline recalls stratford ##rted noticeable ##evic imp ##rita ##sque accustomed supplying tents disgusted vogue sipped filters khz reno selecting luftwaffe mcmahon tyne masterpiece carriages collided dunes exercised flare remembers muzzle ##mobile heck ##rson burgess lunged middleton boycott bilateral ##sity hazardous lumpur multiplayer spotlight jackets goldman liege porcelain rag waterford benz attracts hopeful battling ottomans kensington baked hymns cheyenne lattice levine borrow polymer clashes michaels monitored commitments denounced ##25 ##von cavity ##oney hobby akin ##holders futures intricate cornish patty ##oned illegally dolphin ##lag barlow yellowish maddie apologized luton plagued ##puram nana ##rds sway fanny łodz ##rino psi suspicions hanged ##eding initiate charlton ##por nak competent 235 analytical annex wardrobe reservations ##rma sect 162 fairfax hedge piled buckingham uneven bauer simplicity snyder interpret accountability donors moderately byrd continents ##cite ##max disciple hr jamaican ping nominees ##uss mongolian diver attackers eagerly ideological pillows miracles apartheid revolver sulfur clinics moran 163 ##enko ile katy rhetoric ##icated chronology recycling ##hrer elongated mughal pascal profiles vibration databases domination ##fare ##rant matthias digest rehearsal polling weiss initiation reeves clinging flourished impress ngo ##hoff ##ume buckley symposium rhythms weed emphasize transforming ##taking ##gence ##yman accountant analyze flicker foil priesthood voluntarily decreases ##80 ##hya slater sv charting mcgill ##lde moreno ##iu besieged zur robes ##phic admitting api deported turmoil peyton earthquakes ##ares nationalists beau clair brethren interrupt welch curated galerie requesting 164 ##ested impending steward viper ##vina complaining beautifully brandy foam nl 1660 ##cake alessandro punches laced explanations ##lim attribute clit reggie discomfort ##cards smoothed whales ##cene adler countered duffy disciplinary widening recipe reliance conducts goats gradient preaching ##shaw matilda quasi striped meridian cannabis cordoba certificates ##agh ##tering graffiti hangs pilgrims repeats ##ych revive urine etat ##hawk fueled belts fuzzy susceptible ##hang mauritius salle sincere beers hooks ##cki arbitration entrusted advise sniffed seminar junk donnell processors principality strapped celia mendoza everton fortunes prejudice starving reassigned steamer ##lund tuck evenly foreman ##ffen dans 375 envisioned slit ##xy baseman liberia rosemary ##weed electrified periodically potassium stride contexts sperm slade mariners influx bianca subcommittee ##rane spilling icao estuary ##nock delivers iphone ##ulata isa mira bohemian dessert ##sbury welcoming proudly slowing ##chs musee ascension russ ##vian waits ##psy africans exploit ##morphic gov eccentric crab peck ##ull entrances formidable marketplace groom bolted metabolism patton robbins courier payload endure ##ifier andes refrigerator ##pr ornate ##uca ruthless illegitimate masonry strasbourg bikes adobe ##³ apples quintet willingly niche bakery corpses energetic ##cliffe ##sser ##ards 177 centimeters centro fuscous cretaceous rancho ##yde andrei telecom tottenham oasis ordination vulnerability presiding corey cp penguins sims ##pis malawi piss ##48 correction ##cked ##ffle ##ryn countdown detectives psychiatrist psychedelic dinosaurs blouse ##get choi vowed ##oz randomly ##pol 49ers scrub blanche bruins dusseldorf ##using unwanted ##ums 212 dominique elevations headlights om laguna ##oga 1750 famously ignorance shrewsbury ##aine ajax breuning che confederacy greco overhaul ##screen paz skirts disagreement cruelty jagged phoebe shifter hovered viruses ##wes mandy ##lined ##gc landlord squirrel dashed ##ι ornamental gag wally grange literal spurs undisclosed proceeding yin ##text billie orphan spanned humidity indy weighted presentations explosions lucian ##tary vaughn hindus ##anga ##hell psycho 171 daytona protects efficiently rematch sly tandem ##oya rebranded impaired hee metropolis peach godfrey diaspora ethnicity prosperous gleaming dar grossing playback ##rden stripe pistols ##tain births labelled ##cating 172 rudy alba ##onne aquarium hostility ##gb ##tase shudder sumatra hardest lakers consonant creeping demos homicide capsule zeke liberties expulsion pueblo ##comb trait transporting ##ddin ##neck ##yna depart gregg mold ledge hangar oldham playboy termination analysts gmbh romero ##itic insist cradle filthy brightness slash shootout deposed bordering ##truct isis microwave tumbled sheltered cathy werewolves messy andersen convex clapped clinched satire wasting edo vc rufus ##jak mont ##etti poznan ##keeping restructuring transverse ##rland azerbaijani slovene gestures roommate choking shear ##quist vanguard oblivious ##hiro disagreed baptism ##lich coliseum ##aceae salvage societe cory locke relocation relying versailles ahl swelling ##elo cheerful ##word ##edes gin sarajevo obstacle diverted ##nac messed thoroughbred fluttered utrecht chewed acquaintance assassins dispatch mirza ##wart nike salzburg swell yen ##gee idle ligue samson ##nds ##igh playful spawned ##cise tease ##case burgundy ##bot stirring skeptical interceptions marathi ##dies bedrooms aroused pinch ##lik preferences tattoos buster digitally projecting rust ##ital kitten priorities addison pseudo ##guard dusk icons sermon ##psis ##iba bt ##lift ##xt ju truce rink ##dah ##wy defects psychiatry offences calculate glucose ##iful ##rized ##unda francaise ##hari richest warwickshire carly 1763 purity redemption lending ##cious muse bruises cerebral aero carving ##name preface terminology invade monty ##int anarchist blurred ##iled rossi treats guts shu foothills ballads undertaking premise cecilia affiliates blasted conditional wilder minors drone rudolph buffy swallowing horton attested ##hop rutherford howell primetime livery penal ##bis minimize hydro wrecked wrought palazzo ##gling cans vernacular friedman nobleman shale walnut danielle ##ection ##tley sears ##kumar chords lend flipping streamed por dracula gallons sacrifices gamble orphanage ##iman mckenzie ##gible boxers daly ##balls ##ان 208 ##ific ##rative ##iq exploited slated ##uity circling hillary pinched goldberg provost campaigning lim piles ironically jong mohan successors usaf ##tem ##ught autobiographical haute preserves ##ending acquitted comparisons 203 hydroelectric gangs cypriot torpedoes rushes chrome derive bumps instability fiat pets ##mbe silas dye reckless settler ##itation info heats ##writing 176 canonical maltese fins mushroom stacy aspen avid ##kur ##loading vickers gaston hillside statutes wilde gail kung sabine comfortably motorcycles ##rgo 169 pneumonia fetch ##sonic axel faintly parallels ##oop mclaren spouse compton interdisciplinary miner ##eni 181 clamped ##chal ##llah separates versa ##mler scarborough labrador ##lity ##osing rutgers hurdles como 166 burt divers ##100 wichita cade coincided ##erson bruised mla ##pper vineyard ##ili ##brush notch mentioning jase hearted kits doe ##acle pomerania ##ady ronan seizure pavel problematic ##zaki domenico ##ulin catering penelope dependence parental emilio ministerial atkinson ##bolic clarkson chargers colby grill peeked arises summon ##aged fools ##grapher faculties qaeda ##vial garner refurbished ##hwa geelong disasters nudged bs shareholder lori algae reinstated rot ##ades ##nous invites stainless 183 inclusive ##itude diocesan til ##icz denomination ##xa benton floral registers ##ider ##erman ##kell absurd brunei guangzhou hitter retaliation ##uled ##eve blanc nh consistency contamination ##eres ##rner dire palermo broadcasters diaries inspire vols brewer tightening ky mixtape hormone ##tok stokes ##color ##dly ##ssi pg ##ometer ##lington sanitation ##tility intercontinental apps ##adt ¹⁄₂ cylinders economies favourable unison croix gertrude odyssey vanity dangling ##logists upgrades dice middleweight practitioner ##ight 206 henrik parlor orion angered lac python blurted ##rri sensual intends swings angled ##phs husky attain peerage precinct textiles cheltenham shuffled dai confess tasting bhutan ##riation tyrone segregation abrupt ruiz ##rish smirked blackwell confidential browning amounted ##put vase scarce fabulous raided staple guyana unemployed glider shay ##tow carmine troll intervene squash superstar ##uce cylindrical len roadway researched handy ##rium ##jana meta lao declares ##rring ##tadt ##elin ##kova willem shrubs napoleonic realms skater qi volkswagen ##ł tad hara archaeologist awkwardly eerie ##kind wiley ##heimer ##24 titus organizers cfl crusaders lama usb vent enraged thankful occupants maximilian ##gaard possessing textbooks ##oran collaborator quaker ##ulo avalanche mono silky straits isaiah mustang surged resolutions potomac descend cl kilograms plato strains saturdays ##olin bernstein ##ype holstein ponytail ##watch belize conversely heroine perpetual ##ylus charcoal piedmont glee negotiating backdrop prologue ##jah ##mmy pasadena climbs ramos sunni ##holm ##tner ##tri anand deficiency hertfordshire stout ##avi aperture orioles ##irs doncaster intrigued bombed coating otis ##mat cocktail ##jit ##eto amir arousal sar ##proof ##act ##ories dixie pots ##bow whereabouts 159 ##fted drains bullying cottages scripture coherent fore poe appetite ##uration sampled ##ators ##dp derrick rotor jays peacock installment ##rro advisors ##coming rodeo scotch ##mot ##db ##fen ##vant ensued rodrigo dictatorship martyrs twenties ##н towed incidence marta rainforest sai scaled ##cles oceanic qualifiers symphonic mcbride dislike generalized aubrey colonization ##iation ##lion ##ssing disliked lublin salesman ##ulates spherical whatsoever sweating avalon contention punt severity alderman atari ##dina ##grant ##rop scarf seville vertices annexation fairfield fascination inspiring launches palatinate regretted ##rca feral ##iom elk nap olsen reddy yong ##leader ##iae garment transports feng gracie outrage viceroy insides ##esis breakup grady organizer softer grimaced 222 murals galicia arranging vectors ##rsten bas ##sb ##cens sloan ##eka bitten ara fender nausea bumped kris banquet comrades detector persisted ##llan adjustment endowed cinemas ##shot sellers ##uman peek epa kindly neglect simpsons talon mausoleum runaway hangul lookout ##cic rewards coughed acquainted chloride ##ald quicker accordion neolithic ##qa artemis coefficient lenny pandora tx ##xed ecstasy litter segunda chairperson gemma hiss rumor vow nasal antioch compensate patiently transformers ##eded judo morrow penis posthumous philips bandits husbands denote flaming ##any ##phones langley yorker 1760 walters ##uo ##kle gubernatorial fatty samsung leroy outlaw ##nine unpublished poole jakob ##ᵢ ##ₙ crete distorted superiority ##dhi intercept crust mig claus crashes positioning 188 stallion 301 frontal armistice ##estinal elton aj encompassing camel commemorated malaria woodward calf cigar penetrate ##oso willard ##rno ##uche illustrate amusing convergence noteworthy ##lma ##rva journeys realise manfred ##sable 410 ##vocation hearings fiance ##posed educators provoked adjusting ##cturing modular stockton paterson vlad rejects electors selena maureen ##tres uber ##rce swirled ##num proportions nanny pawn naturalist parma apostles awoke ethel wen ##bey monsoon overview ##inating mccain rendition risky adorned ##ih equestrian germain nj conspicuous confirming ##yoshi shivering ##imeter milestone rumours flinched bounds smacked token ##bei lectured automobiles ##shore impacted ##iable nouns nero ##leaf ismail prostitute trams ##lace bridget sud stimulus impressions reins revolves ##oud ##gned giro honeymoon ##swell criterion ##sms ##uil libyan prefers ##osition 211 preview sucks accusation bursts metaphor diffusion tolerate faye betting cinematographer liturgical specials bitterly humboldt ##ckle flux rattled ##itzer archaeologists odor authorised marshes discretion ##ов alarmed archaic inverse ##leton explorers ##pine drummond tsunami woodlands ##minate ##tland booklet insanity owning insert crafted calculus ##tore receivers ##bt stung ##eca ##nched prevailing travellers eyeing lila graphs ##borne 178 julien ##won morale adaptive therapist erica cw libertarian bowman pitches vita ##ional crook ##ads ##entation caledonia mutiny ##sible 1840s automation ##ß flock ##pia ironic pathology ##imus remarried ##22 joker withstand energies ##att shropshire hostages madeleine tentatively conflicting mateo recipes euros ol mercenaries nico ##ndon albuquerque augmented mythical bel freud ##child cough ##lica 365 freddy lillian genetically nuremberg calder 209 bonn outdoors paste suns urgency vin restraint tyson ##cera ##selle barrage bethlehem kahn ##par mounts nippon barony happier ryu makeshift sheldon blushed castillo barking listener taped bethel fluent headlines pornography rum disclosure sighing mace doubling gunther manly ##plex rt interventions physiological forwards emerges ##tooth ##gny compliment rib recession visibly barge faults connector exquisite prefect ##rlin patio ##cured elevators brandt italics pena 173 wasp satin ea botswana graceful respectable ##jima ##rter ##oic franciscan generates ##dl alfredo disgusting ##olate ##iously sherwood warns cod promo cheryl sino ##ة ##escu twitch ##zhi brownish thom ortiz ##dron densely ##beat carmel reinforce ##bana 187 anastasia downhill vertex contaminated remembrance harmonic homework ##sol fiancee gears olds angelica loft ramsay quiz colliery sevens ##cape autism ##hil walkway ##boats ruben abnormal ounce khmer ##bbe zachary bedside morphology punching ##olar sparrow convinces ##35 hewitt queer remastered rods mabel solemn notified lyricist symmetric ##xide 174 encore passports wildcats ##uni baja ##pac mildly ##ease bleed commodity mounds glossy orchestras ##omo damian prelude ambitions ##vet awhile remotely ##aud asserts imply ##iques distinctly modelling remedy ##dded windshield dani xiao ##endra audible powerplant 1300 invalid elemental acquisitions ##hala immaculate libby plata smuggling ventilation denoted minh ##morphism 430 differed dion kelley lore mocking sabbath spikes hygiene drown runoff stylized tally liberated aux interpreter righteous aba siren reaper pearce millie ##cier ##yra gaius ##iso captures ##ttering dorm claudio ##sic benches knighted blackness ##ored discount fumble oxidation routed ##ς novak perpendicular spoiled fracture splits ##urt pads topology ##cats axes fortunate offenders protestants esteem 221 broadband convened frankly hound prototypes isil facilitated keel ##sher sahara awaited bubba orb prosecutors 186 hem 520 ##xing relaxing remnant romney sorted slalom stefano ulrich ##active exemption folder pauses foliage hitchcock epithet 204 criticisms ##aca ballistic brody hinduism chaotic youths equals ##pala pts thicker analogous capitalist improvised overseeing sinatra ascended beverage ##tl straightforward ##kon curran ##west bois 325 induce surveying emperors sax unpopular ##kk cartoonist fused ##mble unto ##yuki localities ##cko ##ln darlington slain academie lobbying sediment puzzles ##grass defiance dickens manifest tongues alumnus arbor coincide 184 appalachian mustafa examiner cabaret traumatic yves bracelet draining heroin magnum baths odessa consonants mitsubishi ##gua kellan vaudeville ##fr joked null straps probation ##ław ceded interfaces ##pas ##zawa blinding viet 224 rothschild museo 640 huddersfield ##vr tactic ##storm brackets dazed incorrectly ##vu reg glazed fearful manifold benefited irony ##sun stumbling ##rte willingness balkans mei wraps ##aba injected ##lea gu syed harmless ##hammer bray takeoff poppy timor cardboard astronaut purdue weeping southbound cursing stalls diagonal ##neer lamar bryce comte weekdays harrington ##uba negatively ##see lays grouping ##cken ##henko affirmed halle modernist ##lai hodges smelling aristocratic baptized dismiss justification oilers ##now coupling qin snack healer ##qing gardener layla battled formulated stephenson gravitational ##gill ##jun 1768 granny coordinating suites ##cd ##ioned monarchs ##cote ##hips sep blended apr barrister deposition fia mina policemen paranoid ##pressed churchyard covert crumpled creep abandoning tr transmit conceal barr understands readiness spire ##cology ##enia ##erry 610 startling unlock vida bowled slots ##nat ##islav spaced trusting admire rig ##ink slack ##70 mv 207 casualty ##wei classmates ##odes ##rar ##rked amherst furnished evolve foundry menace mead ##lein flu wesleyan ##kled monterey webber ##vos wil ##mith ##на bartholomew justices restrained ##cke amenities 191 mediated sewage trenches ml mainz ##thus 1800s ##cula ##inski caine bonding 213 converts spheres superseded marianne crypt sweaty ensign historia ##br spruce ##post ##ask forks thoughtfully yukon pamphlet ames ##uter karma ##yya bryn negotiation sighs incapable ##mbre ##ntial actresses taft ##mill luce prevailed ##amine 1773 motionless envoy testify investing sculpted instructors provence kali cullen horseback ##while goodwin ##jos gaa norte ##ldon modify wavelength abd 214 skinned sprinter forecast scheduling marries squared tentative ##chman boer ##isch bolts swap fisherman assyrian impatiently guthrie martins murdoch 194 tanya nicely dolly lacy med ##45 syn decks fashionable millionaire ##ust surfing ##ml ##ision heaved tammy consulate attendees routinely 197 fuse saxophonist backseat malaya ##lord scowl tau ##ishly 193 sighted steaming ##rks 303 911 ##holes ##hong ching ##wife bless conserved jurassic stacey unix zion chunk rigorous blaine 198 peabody slayer dismay brewers nz ##jer det ##glia glover postwar int penetration sylvester imitation vertically airlift heiress knoxville viva ##uin 390 macon ##rim ##fighter ##gonal janice ##orescence ##wari marius belongings leicestershire 196 blanco inverted preseason sanity sobbing ##due ##elt ##dled collingwood regeneration flickering shortest ##mount ##osi feminism ##lat sherlock cabinets fumbled northbound precedent snaps ##mme researching ##akes guillaume insights manipulated vapor neighbour sap gangster frey f1 stalking scarcely callie barnett tendencies audi doomed assessing slung panchayat ambiguous bartlett ##etto distributing violating wolverhampton ##hetic swami histoire ##urus liable pounder groin hussain larsen popping surprises ##atter vie curt ##station mute relocate musicals authorization richter ##sef immortality tna bombings ##press deteriorated yiddish ##acious robbed colchester cs pmid ao verified balancing apostle swayed recognizable oxfordshire retention nottinghamshire contender judd invitational shrimp uhf ##icient cleaner longitudinal tanker ##mur acronym broker koppen sundance suppliers ##gil 4000 clipped fuels petite ##anne landslide helene diversion populous landowners auspices melville quantitative ##xes ferries nicky ##llus doo haunting roche carver downed unavailable ##pathy approximation hiroshima ##hue garfield valle comparatively keyboardist traveler ##eit congestion calculating subsidiaries ##bate serb modernization fairies deepened ville averages ##lore inflammatory tonga ##itch co₂ squads ##hea gigantic serum enjoyment retailer verona 35th cis ##phobic magna technicians ##vati arithmetic ##sport levin ##dation amtrak chow sienna ##eyer backstage entrepreneurship ##otic learnt tao ##udy worcestershire formulation baggage hesitant bali sabotage ##kari barren enhancing murmur pl freshly putnam syntax aces medicines resentment bandwidth ##sier grins chili guido ##sei framing implying gareth lissa genevieve pertaining admissions geo thorpe proliferation sato bela analyzing parting ##gor awakened ##isman huddled secrecy ##kling hush gentry 540 dungeons ##ego coasts ##utz sacrificed ##chule landowner mutually prevalence programmer adolescent disrupted seaside gee trusts vamp georgie ##nesian ##iol schedules sindh ##market etched hm sparse bey beaux scratching gliding unidentified 216 collaborating gems jesuits oro accumulation shaping mbe anal ##xin 231 enthusiasts newscast ##egan janata dewey parkinson 179 ankara biennial towering dd inconsistent 950 ##chet thriving terminate cabins furiously eats advocating donkey marley muster phyllis leiden ##user grassland glittering iucn loneliness 217 memorandum armenians ##ddle popularized rhodesia 60s lame ##illon sans bikini header orbits ##xx ##finger ##ulator sharif spines biotechnology strolled naughty yates ##wire fremantle milo ##mour abducted removes ##atin humming wonderland ##chrome ##ester hume pivotal ##rates armand grams believers elector rte apron bis scraped ##yria endorsement initials ##llation eps dotted hints buzzing emigration nearer ##tom indicators ##ulu coarse neutron protectorate ##uze directional exploits pains loire 1830s proponents guggenheim rabbits ritchie 305 hectare inputs hutton ##raz verify ##ako boilers longitude ##lev skeletal yer emilia citrus compromised ##gau pokemon prescription paragraph eduard cadillac attire categorized kenyan weddings charley ##bourg entertain monmouth ##lles nutrients davey mesh incentive practised ecosystems kemp subdued overheard ##rya bodily maxim ##nius apprenticeship ursula ##fight lodged rug silesian unconstitutional patel inspected coyote unbeaten ##hak 34th disruption convict parcel ##cl ##nham collier implicated mallory ##iac ##lab susannah winkler ##rber shia phelps sediments graphical robotic ##sner adulthood mart smoked ##isto kathryn clarified ##aran divides convictions oppression pausing burying ##mt federico mathias eileen ##tana kite hunched ##acies 189 ##atz disadvantage liza kinetic greedy paradox yokohama dowager trunks ventured ##gement gupta vilnius olaf ##thest crimean hopper ##ej progressively arturo mouthed arrondissement ##fusion rubin simulcast oceania ##orum ##stra ##rred busiest intensely navigator cary ##vine ##hini ##bies fife rowe rowland posing insurgents shafts lawsuits activate conor inward culturally garlic 265 ##eering eclectic ##hui ##kee ##nl furrowed vargas meteorological rendezvous ##aus culinary commencement ##dition quota ##notes mommy salaries overlapping mule ##iology ##mology sums wentworth ##isk ##zione mainline subgroup ##illy hack plaintiff verdi bulb differentiation engagements multinational supplemented bertrand caller regis ##naire ##sler ##arts ##imated blossom propagation kilometer viaduct vineyards ##uate beckett optimization golfer songwriters seminal semitic thud volatile evolving ridley ##wley trivial distributions scandinavia jiang ##ject wrestled insistence ##dio emphasizes napkin ##ods adjunct rhyme ##ricted ##eti hopeless surrounds tremble 32nd smoky ##ntly oils medicinal padded steer wilkes 219 255 concessions hue uniquely blinded landon yahoo ##lane hendrix commemorating dex specify chicks ##ggio intercity 1400 morley ##torm highlighting ##oting pang oblique stalled ##liner flirting newborn 1769 bishopric shaved 232 currie ##ush dharma spartan ##ooped favorites smug novella sirens abusive creations espana ##lage paradigm semiconductor sheen ##rdo ##yen ##zak nrl renew ##pose ##tur adjutant marches norma ##enity ineffective weimar grunt ##gat lordship plotting expenditure infringement lbs refrain av mimi mistakenly postmaster 1771 ##bara ras motorsports tito 199 subjective ##zza bully stew ##kaya prescott 1a ##raphic ##zam bids styling paranormal reeve sneaking exploding katz akbar migrant syllables indefinitely ##ogical destroys replaces applause ##phine pest ##fide 218 articulated bertie ##thing ##cars ##ptic courtroom crowley aesthetics cummings tehsil hormones titanic dangerously ##ibe stadion jaenelle auguste ciudad ##chu mysore partisans ##sio lucan philipp ##aly debating henley interiors ##rano ##tious homecoming beyonce usher henrietta prepares weeds ##oman ely plucked ##pire ##dable luxurious ##aq artifact password pasture juno maddy minsk ##dder ##ologies ##rone assessments martian royalist 1765 examines ##mani ##rge nino 223 parry scooped relativity ##eli ##uting ##cao congregational noisy traverse ##agawa strikeouts nickelodeon obituary transylvania binds depictions polk trolley ##yed ##lard breeders ##under dryly hokkaido 1762 strengths stacks bonaparte connectivity neared prostitutes stamped anaheim gutierrez sinai ##zzling bram fresno madhya ##86 proton ##lena ##llum ##phon reelected wanda ##anus ##lb ample distinguishing ##yler grasping sermons tomato bland stimulation avenues ##eux spreads scarlett fern pentagon assert baird chesapeake ir calmed distortion fatalities ##olis correctional pricing ##astic ##gina prom dammit ying collaborate ##chia welterweight 33rd pointer substitution bonded umpire communicating multitude paddle ##obe federally intimacy ##insky betray ssr ##lett ##lean ##lves ##therapy airbus ##tery functioned ud bearer biomedical netflix ##hire ##nca condom brink ik ##nical macy ##bet flap gma experimented jelly lavender ##icles ##ulia munro ##mian ##tial rye ##rle 60th gigs hottest rotated predictions fuji bu ##erence ##omi barangay ##fulness ##sas clocks ##rwood ##liness cereal roe wight decker uttered babu onion xml forcibly ##df petra sarcasm hartley peeled storytelling ##42 ##xley ##ysis ##ffa fibre kiel auditor fig harald greenville ##berries geographically nell quartz ##athic cemeteries ##lr crossings nah holloway reptiles chun sichuan snowy 660 corrections ##ivo zheng ambassadors blacksmith fielded fluids hardcover turnover medications melvin academies ##erton ro roach absorbing spaniards colton ##founded outsider espionage kelsey 245 edible ##ulf dora establishes ##sham ##tries contracting ##tania cinematic costello nesting ##uron connolly duff ##nology mma ##mata fergus sexes gi optics spectator woodstock banning ##hee ##fle differentiate outfielder refinery 226 312 gerhard horde lair drastically ##udi landfall ##cheng motorsport odi ##achi predominant quay skins ##ental edna harshly complementary murdering ##aves wreckage ##90 ono outstretched lennox munitions galen reconcile 470 scalp bicycles gillespie questionable rosenberg guillermo hostel jarvis kabul volvo opium yd ##twined abuses decca outpost ##cino sensible neutrality ##64 ponce anchorage atkins turrets inadvertently disagree libre vodka reassuring weighs ##yal glide jumper ceilings repertory outs stain ##bial envy ##ucible smashing heightened policing hyun mixes lai prima ##ples celeste ##bina lucrative intervened kc manually ##rned stature staffed bun bastards nairobi priced ##auer thatcher ##kia tripped comune ##ogan ##pled brasil incentives emanuel hereford musica ##kim benedictine biennale ##lani eureka gardiner rb knocks sha ##ael ##elled ##onate efficacy ventura masonic sanford maize leverage ##feit capacities santana ##aur novelty vanilla ##cter ##tour benin ##oir ##rain neptune drafting tallinn ##cable humiliation ##boarding schleswig fabian bernardo liturgy spectacle sweeney pont routledge ##tment cosmos ut hilt sleek universally ##eville ##gawa typed ##dry favors allegheny glaciers ##rly recalling aziz ##log parasite requiem auf ##berto ##llin illumination ##breaker ##issa festivities bows govern vibe vp 333 sprawled larson pilgrim bwf leaping ##rts ##ssel alexei greyhound hoarse ##dler ##oration seneca ##cule gaping ##ulously ##pura cinnamon ##gens ##rricular craven fantasies houghton engined reigned dictator supervising ##oris bogota commentaries unnatural fingernails spirituality tighten ##tm canadiens protesting intentional cheers sparta ##ytic ##iere ##zine widen belgarath controllers dodd iaaf navarre ##ication defect squire steiner whisky ##mins 560 inevitably tome ##gold chew ##uid ##lid elastic ##aby streaked alliances jailed regal ##ined ##phy czechoslovak narration absently ##uld bluegrass guangdong quran criticizing hose hari ##liest ##owa skier streaks deploy ##lom raft bose dialed huff ##eira haifa simplest bursting endings ib sultanate ##titled franks whitman ensures sven ##ggs collaborators forster organising ui banished napier injustice teller layered thump ##otti roc battleships evidenced fugitive sadie robotics ##roud equatorial geologist ##iza yielding ##bron ##sr internationale mecca ##diment sbs skyline toad uploaded reflective undrafted lal leafs bayern ##dai lakshmi shortlisted ##stick ##wicz camouflage donate af christi lau ##acio disclosed nemesis 1761 assemble straining northamptonshire tal ##asi bernardino premature heidi 42nd coefficients galactic reproduce buzzed sensations zionist monsieur myrtle ##eme archery strangled musically viewpoint antiquities bei trailers seahawks cured pee preferring tasmanian lange sul ##mail ##working colder overland lucivar massey gatherings haitian ##smith disapproval flaws ##cco ##enbach 1766 npr ##icular boroughs creole forums techno 1755 dent abdominal streetcar ##eson ##stream procurement gemini predictable ##tya acheron christoph feeder fronts vendor bernhard jammu tumors slang ##uber goaltender twists curving manson vuelta mer peanut confessions pouch unpredictable allowance theodor vascular ##factory bala authenticity metabolic coughing nanjing ##cea pembroke ##bard splendid 36th ff hourly ##ahu elmer handel ##ivate awarding thrusting dl experimentation ##hesion ##46 caressed entertained steak ##rangle biologist orphans baroness oyster stepfather ##dridge mirage reefs speeding ##31 barons 1764 227 inhabit preached repealed ##tral honoring boogie captives administer johanna ##imate gel suspiciously 1767 sobs ##dington backbone hayward garry ##folding ##nesia maxi ##oof ##ppe ellison galileo ##stand crimea frenzy amour bumper matrices natalia baking garth palestinians ##grove smack conveyed ensembles gardening ##manship ##rup ##stituting 1640 harvesting topography jing shifters dormitory ##carriage ##lston ist skulls ##stadt dolores jewellery sarawak ##wai ##zier fences christy confinement tumbling credibility fir stench ##bria ##plication ##nged ##sam virtues ##belt marjorie pba ##eem ##made celebrates schooner agitated barley fulfilling anthropologist ##pro restrict novi regulating ##nent padres ##rani ##hesive loyola tabitha milky olson proprietor crambidae guarantees intercollegiate ljubljana hilda ##sko ignorant hooded ##lts sardinia ##lidae ##vation frontman privileged witchcraft ##gp jammed laude poking ##than bracket amazement yunnan ##erus maharaja linnaeus 264 commissioning milano peacefully ##logies akira rani regulator ##36 grasses ##rance luzon crows compiler gretchen seaman edouard tab buccaneers ellington hamlets whig socialists ##anto directorial easton mythological ##kr ##vary rhineland semantic taut dune inventions succeeds ##iter replication branched ##pired jul prosecuted kangaroo penetrated ##avian middlesbrough doses bleak madam predatory relentless ##vili reluctance ##vir hailey crore silvery 1759 monstrous swimmers transmissions hawthorn informing ##eral toilets caracas crouch kb ##sett 295 cartel hadley ##aling alexia yvonne ##biology cinderella eton superb blizzard stabbing industrialist maximus ##gm ##orus groves maud clade oversized comedic ##bella rosen nomadic fulham montane beverages galaxies redundant swarm ##rot ##folia ##llis buckinghamshire fen bearings bahadur ##rom gilles phased dynamite faber benoit vip ##ount ##wd booking fractured tailored anya spices westwood cairns auditions inflammation steamed ##rocity ##acion ##urne skyla thereof watford torment archdeacon transforms lulu demeanor fucked serge ##sor mckenna minas entertainer ##icide caress originate residue ##sty 1740 ##ilised ##org beech ##wana subsidies ##ghton emptied gladstone ru firefighters voodoo ##rcle het nightingale tamara edmond ingredient weaknesses silhouette 285 compatibility withdrawing hampson ##mona anguish giggling ##mber bookstore ##jiang southernmost tilting ##vance bai economical rf briefcase dreadful hinted projections shattering totaling ##rogate analogue indicted periodical fullback ##dman haynes ##tenberg ##ffs ##ishment 1745 thirst stumble penang vigorous ##ddling ##kor ##lium octave ##ove ##enstein ##inen ##ones siberian ##uti cbn repeal swaying ##vington khalid tanaka unicorn otago plastered lobe riddle ##rella perch ##ishing croydon filtered graeme tripoli ##ossa crocodile ##chers sufi mined ##tung inferno lsu ##phi swelled utilizes £2 cale periodicals styx hike informally coop lund ##tidae ala hen qui transformations disposed sheath chickens ##cade fitzroy sas silesia unacceptable odisha 1650 sabrina pe spokane ratios athena massage shen dilemma ##drum ##riz ##hul corona doubtful niall ##pha ##bino fines cite acknowledging bangor ballard bathurst ##resh huron mustered alzheimer garments kinase tyre warship ##cp flashback pulmonary braun cheat kamal cyclists constructions grenades ndp traveller excuses stomped signalling trimmed futsal mosques relevance ##wine wta ##23 ##vah ##lter hoc ##riding optimistic ##´s deco sim interacting rejecting moniker waterways ##ieri ##oku mayors gdansk outnumbered pearls ##ended ##hampton fairs totals dominating 262 notions stairway compiling pursed commodities grease yeast ##jong carthage griffiths residual amc contraction laird sapphire ##marine ##ivated amalgamation dissolve inclination lyle packaged altitudes suez canons graded lurched narrowing boasts guise wed enrico ##ovsky rower scarred bree cub iberian protagonists bargaining proposing trainers voyages vans fishes ##aea ##ivist ##verance encryption artworks kazan sabre cleopatra hepburn rotting supremacy mecklenburg ##brate burrows hazards outgoing flair organizes ##ctions scorpion ##usions boo 234 chevalier dunedin slapping ##34 ineligible pensions ##38 ##omic manufactures emails bismarck 238 weakening blackish ding mcgee quo ##rling northernmost xx manpower greed sampson clicking ##ange ##horpe ##inations ##roving torre ##eptive ##moral symbolism 38th asshole meritorious outfits splashed biographies sprung astros ##tale 302 737 filly raoul nw tokugawa linden clubhouse ##apa tracts romano ##pio putin tags ##note chained dickson gunshot moe gunn rashid ##tails zipper ##bas ##nea contrasted ##ply ##udes plum pharaoh ##pile aw comedies ingrid sandwiches subdivisions 1100 mariana nokia kamen hz delaney veto herring ##words possessive outlines ##roup siemens stairwell rc gallantry messiah palais yells 233 zeppelin ##dm bolivar ##cede smackdown mckinley ##mora ##yt muted geologic finely unitary avatar hamas maynard rees bog contrasting ##rut liv chico disposition pixel ##erate becca dmitry yeshiva narratives ##lva ##ulton mercenary sharpe tempered navigate stealth amassed keynes ##lini untouched ##rrie havoc lithium ##fighting abyss graf southward wolverine balloons implements ngos transitions ##icum ambushed concacaf dormant economists ##dim costing csi rana universite boulders verity ##llon collin mellon misses cypress fluorescent lifeless spence ##ulla crewe shepard pak revelations ##م jolly gibbons paw ##dro ##quel freeing ##test shack fries palatine ##51 ##hiko accompaniment cruising recycled ##aver erwin sorting synthesizers dyke realities sg strides enslaved wetland ##ghan competence gunpowder grassy maroon reactors objection ##oms carlson gearbox macintosh radios shelton ##sho clergyman prakash 254 mongols trophies oricon 228 stimuli twenty20 cantonese cortes mirrored ##saurus bhp cristina melancholy ##lating enjoyable nuevo ##wny downfall schumacher ##ind banging lausanne rumbled paramilitary reflex ax amplitude migratory ##gall ##ups midi barnard lastly sherry ##hp ##nall keystone ##kra carleton slippery ##53 coloring foe socket otter ##rgos mats ##tose consultants bafta bison topping ##km 490 primal abandonment transplant atoll hideous mort pained reproduced tae howling ##turn unlawful billionaire hotter poised lansing ##chang dinamo retro messing nfc domesday ##mina blitz timed ##athing ##kley ascending gesturing ##izations signaled tis chinatown mermaid savanna jameson ##aint catalina ##pet ##hers cochrane cy chatting ##kus alerted computation mused noelle majestic mohawk campo octagonal ##sant ##hend 241 aspiring ##mart comprehend iona paralyzed shimmering swindon rhone ##eley reputed configurations pitchfork agitation francais gillian lipstick ##ilo outsiders pontifical resisting bitterness sewer rockies ##edd ##ucher misleading 1756 exiting galloway ##nging risked ##heart 246 commemoration schultz ##rka integrating ##rsa poses shrieked ##weiler guineas gladys jerking owls goldsmith nightly penetrating ##unced lia ##33 ignited betsy ##aring ##thorpe follower vigorously ##rave coded kiran knit zoology tbilisi ##28 ##bered repository govt deciduous dino growling ##bba enhancement unleashed chanting pussy biochemistry ##eric kettle repression toxicity nrhp ##arth ##kko ##bush ernesto commended outspoken 242 mca parchment sms kristen ##aton bisexual raked glamour navajo a2 conditioned showcased ##hma spacious youthful ##esa usl appliances junta brest layne conglomerate enchanted chao loosened picasso circulating inspect montevideo ##centric ##kti piazza spurred ##aith bari freedoms poultry stamford lieu ##ect indigo sarcastic bahia stump attach dvds frankenstein lille approx scriptures pollen ##script nmi overseen ##ivism tides proponent newmarket inherit milling ##erland centralized ##rou distributors credentials drawers abbreviation ##lco ##xon downing uncomfortably ripe ##oes erase franchises ##ever populace ##bery ##khar decomposition pleas ##tet daryl sabah ##stle ##wide fearless genie lesions annette ##ogist oboe appendix nair dripped petitioned maclean mosquito parrot rpg hampered 1648 operatic reservoirs ##tham irrelevant jolt summarized ##fp medallion ##taff ##− clawed harlow narrower goddard marcia bodied fremont suarez altering tempest mussolini porn ##isms sweetly oversees walkers solitude grimly shrines hk ich supervisors hostess dietrich legitimacy brushes expressive ##yp dissipated ##rse localized systemic ##nikov gettysburg ##js ##uaries dialogues muttering 251 housekeeper sicilian discouraged ##frey beamed kaladin halftime kidnap ##amo ##llet 1754 synonymous depleted instituto insulin reprised ##opsis clashed ##ctric interrupting radcliffe insisting medici 1715 ejected playfully turbulent ##47 starvation ##rini shipment rebellious petersen verification merits ##rified cakes ##charged 1757 milford shortages spying fidelity ##aker emitted storylines harvested seismic ##iform cheung kilda theoretically barbie lynx ##rgy ##tius goblin mata poisonous ##nburg reactive residues obedience ##евич conjecture ##rac 401 hating sixties kicker moaning motown ##bha emancipation neoclassical ##hering consoles ebert professorship ##tures sustaining assaults obeyed affluent incurred tornadoes ##eber ##zow emphasizing highlanders cheated helmets ##ctus internship terence bony executions legislators berries peninsular tinged ##aco 1689 amplifier corvette ribbons lavish pennant ##lander worthless ##chfield ##forms mariano pyrenees expenditures ##icides chesterfield mandir tailor 39th sergey nestled willed aristocracy devotees goodnight raaf rumored weaponry remy appropriations harcourt burr riaa ##lence limitation unnoticed guo soaking swamps ##tica collapsing tatiana descriptive brigham psalm ##chment maddox ##lization patti caliph ##aja akron injuring serra ##ganj basins ##sari astonished launcher ##church hilary wilkins sewing ##sf stinging ##fia ##ncia underwood startup ##ition compilations vibrations embankment jurist ##nity bard juventus groundwater kern palaces helium boca cramped marissa soto ##worm jae princely ##ggy faso bazaar warmly ##voking 229 pairing ##lite ##grate ##nets wien freaked ulysses rebirth ##alia ##rent mummy guzman jimenez stilled ##nitz trajectory tha woken archival professions ##pts ##pta hilly shadowy shrink ##bolt norwood glued migrate stereotypes devoid ##pheus 625 evacuate horrors infancy gotham knowles optic downloaded sachs kingsley parramatta darryl mor ##onale shady commence confesses kan ##meter ##placed marlborough roundabout regents frigates io ##imating gothenburg revoked carvings clockwise convertible intruder ##sche banged ##ogo vicky bourgeois ##mony dupont footing ##gum pd ##real buckle yun penthouse sane 720 serviced stakeholders neumann bb ##eers comb ##gam catchment pinning rallies typing ##elles forefront freiburg sweetie giacomo widowed goodwill worshipped aspirations midday ##vat fishery ##trick bournemouth turk 243 hearth ethanol guadalajara murmurs sl ##uge afforded scripted ##hta wah ##jn coroner translucent 252 memorials puck progresses clumsy ##race 315 candace recounted ##27 ##slin ##uve filtering ##mac howl strata heron leveled ##ays dubious ##oja ##т ##wheel citations exhibiting ##laya ##mics ##pods turkic ##lberg injunction ##ennial ##mit antibodies ##44 organise ##rigues cardiovascular cushion inverness ##zquez dia cocoa sibling ##tman ##roid expanse feasible tunisian algiers ##relli rus bloomberg dso westphalia bro tacoma 281 downloads ##ours konrad duran ##hdi continuum jett compares legislator secession ##nable ##gues ##zuka translating reacher ##gley ##ła aleppo ##agi tc orchards trapping linguist versatile drumming postage calhoun superiors ##mx barefoot leary ##cis ignacio alfa kaplan ##rogen bratislava mori ##vot disturb haas 313 cartridges gilmore radiated salford tunic hades ##ulsive archeological delilah magistrates auditioned brewster charters empowerment blogs cappella dynasties iroquois whipping ##krishna raceway truths myra weaken judah mcgregor ##horse mic refueling 37th burnley bosses markus premio query ##gga dunbar ##economic darkest lyndon sealing commendation reappeared ##mun addicted ezio slaughtered satisfactory shuffle ##eves ##thic ##uj fortification warrington ##otto resurrected fargo mane ##utable ##lei ##space foreword ox ##aris ##vern abrams hua ##mento sakura ##alo uv sentimental ##skaya midfield ##eses sturdy scrolls macleod ##kyu entropy ##lance mitochondrial cicero excelled thinner convoys perceive ##oslav ##urable systematically grind burkina 287 ##tagram ops ##aman guantanamo ##cloth ##tite forcefully wavy ##jou pointless ##linger ##tze layton portico superficial clerical outlaws ##hism burials muir ##inn creditors hauling rattle ##leg calais monde archers reclaimed dwell wexford hellenic falsely remorse ##tek dough furnishings ##uttered gabon neurological novice ##igraphy contemplated pulpit nightstand saratoga ##istan documenting pulsing taluk ##firmed busted marital ##rien disagreements wasps ##yes hodge mcdonnell mimic fran pendant dhabi musa ##nington congratulations argent darrell concussion losers regrets thessaloniki reversal donaldson hardwood thence achilles ritter ##eran demonic jurgen prophets goethe eki classmate buff ##cking yank irrational ##inging perished seductive qur sourced ##crat ##typic mustard ravine barre horizontally characterization phylogenetic boise ##dit ##runner ##tower brutally intercourse seduce ##bbing fay ferris ogden amar nik unarmed ##inator evaluating kyrgyzstan sweetness ##lford ##oki mccormick meiji notoriety stimulate disrupt figuring instructional mcgrath ##zoo groundbreaking ##lto flinch khorasan agrarian bengals mixer radiating ##sov ingram pitchers nad tariff ##cript tata ##codes ##emi ##ungen appellate lehigh ##bled ##giri brawl duct texans ##ciation ##ropolis skipper speculative vomit doctrines stresses 253 davy graders whitehead jozef timely cumulative haryana paints appropriately boon cactus ##ales ##pid dow legions ##pit perceptions 1730 picturesque ##yse periphery rune wr ##aha celtics sentencing whoa ##erin confirms variance 425 moines mathews spade rave m1 fronted fx blending alleging reared ##gl 237 ##paper grassroots eroded ##free ##physical directs ordeal ##sław accelerate hacker rooftop ##inia lev buys cebu devote ##lce specialising ##ulsion choreographed repetition warehouses ##ryl paisley tuscany analogy sorcerer hash huts shards descends exclude nix chaplin gaga ito vane ##drich causeway misconduct limo orchestrated glands jana ##kot u2 ##mple ##sons branching contrasts scoop longed ##virus chattanooga ##75 syrup cornerstone ##tized ##mind ##iaceae careless precedence frescoes ##uet chilled consult modelled snatch peat ##thermal caucasian humane relaxation spins temperance ##lbert occupations lambda hybrids moons mp3 ##oese 247 rolf societal yerevan ness ##ssler befriended mechanized nominate trough boasted cues seater ##hom bends ##tangle conductors emptiness ##lmer eurasian adriatic tian ##cie anxiously lark propellers chichester jock ev 2a ##holding credible recounts tori loyalist abduction ##hoot ##redo nepali ##mite ventral tempting ##ango ##crats steered ##wice javelin dipping laborers prentice looming titanium ##ː badges emir tensor ##ntation egyptians rash denies hawthorne lombard showers wehrmacht dietary trojan ##reus welles executing horseshoe lifeboat ##lak elsa infirmary nearing roberta boyer mutter trillion joanne ##fine ##oked sinks vortex uruguayan clasp sirius ##block accelerator prohibit sunken byu chronological diplomats ochreous 510 symmetrical 1644 maia ##tology salts reigns atrocities ##ия hess bared issn ##vyn cater saturated ##cycle ##isse sable voyager dyer yusuf ##inge fountains wolff ##39 ##nni engraving rollins atheist ominous ##ault herr chariot martina strung ##fell ##farlane horrific sahib gazes saetan erased ptolemy ##olic flushing lauderdale analytic ##ices 530 navarro beak gorilla herrera broom guadalupe raiding sykes 311 bsc deliveries 1720 invasions carmichael tajikistan thematic ecumenical sentiments onstage ##rians ##brand ##sume catastrophic flanks molten ##arns waller aimee terminating ##icing alternately ##oche nehru printers outraged ##eving empires template banners repetitive za ##oise vegetarian ##tell guiana opt cavendish lucknow synthesized ##hani ##mada finalized ##ctable fictitious mayoral unreliable ##enham embracing peppers rbis ##chio ##neo inhibition slashed togo orderly embroidered safari salty 236 barron benito totaled ##dak pubs simulated caden devin tolkien momma welding sesame ##ept gottingen hardness 630 shaman temeraire 620 adequately pediatric ##kit ck assertion radicals composure cadence seafood beaufort lazarus mani warily cunning kurdistan 249 cantata ##kir ares ##41 ##clusive nape townland geared insulted flutter boating violate draper dumping malmo ##hh ##romatic firearm alta bono obscured ##clave exceeds panorama unbelievable ##train preschool ##essed disconnected installing rescuing secretaries accessibility ##castle ##drive ##ifice ##film bouts slug waterway mindanao ##buro ##ratic halves ##ل calming liter maternity adorable bragg electrification mcc ##dote roxy schizophrenia ##body munoz kaye whaling 239 mil tingling tolerant ##ago unconventional volcanoes ##finder deportivo ##llie robson kaufman neuroscience wai deportation masovian scraping converse ##bh hacking bulge ##oun administratively yao 580 amp mammoth booster claremont hooper nomenclature pursuits mclaughlin melinda ##sul catfish barclay substrates taxa zee originals kimberly packets padma ##ality borrowing ostensibly solvent ##bri ##genesis ##mist lukas shreveport veracruz ##ь ##lou ##wives cheney tt anatolia hobbs ##zyn cyclic radiant alistair greenish siena dat independents ##bation conform pieter hyper applicant bradshaw spores telangana vinci inexpensive nuclei 322 jang nme soho spd ##ign cradled receptionist pow ##43 ##rika fascism ##ifer experimenting ##ading ##iec ##region 345 jocelyn maris stair nocturnal toro constabulary elgin ##kker msc ##giving ##schen ##rase doherty doping sarcastically batter maneuvers ##cano ##apple ##gai ##git intrinsic ##nst ##stor 1753 showtime cafes gasps lviv ushered ##thed fours restart astonishment transmitting flyer shrugs ##sau intriguing cones dictated mushrooms medial ##kovsky ##elman escorting gaped ##26 godfather ##door ##sell djs recaptured timetable vila 1710 3a aerodrome mortals scientology ##orne angelina mag convection unpaid insertion intermittent lego ##nated endeavor kota pereira ##lz 304 bwv glamorgan insults agatha fey ##cend fleetwood mahogany protruding steamship zeta ##arty mcguire suspense ##sphere advising urges ##wala hurriedly meteor gilded inline arroyo stalker ##oge excitedly revered ##cure earle introductory ##break ##ilde mutants puff pulses reinforcement ##haling curses lizards stalk correlated ##fixed fallout macquarie ##unas bearded denton heaving 802 ##ocation winery assign dortmund ##lkirk everest invariant charismatic susie ##elling bled lesley telegram sumner bk ##ogen ##к wilcox needy colbert duval ##iferous ##mbled allotted attends imperative ##hita replacements hawker ##inda insurgency ##zee ##eke casts ##yla 680 ives transitioned ##pack ##powering authoritative baylor flex cringed plaintiffs woodrow ##skie drastic ape aroma unfolded commotion nt preoccupied theta routines lasers privatization wand domino ek clenching nsa strategically showered bile handkerchief pere storing christophe insulting 316 nakamura romani asiatic magdalena palma cruises stripping 405 konstantin soaring ##berman colloquially forerunner havilland incarcerated parasites sincerity ##utus disks plank saigon ##ining corbin homo ornaments powerhouse ##tlement chong fastened feasibility idf morphological usable ##nish ##zuki aqueduct jaguars keepers ##flies aleksandr faust assigns ewing bacterium hurled tricky hungarians integers wallis 321 yamaha ##isha hushed oblivion aviator evangelist friars ##eller monograph ode ##nary airplanes labourers charms ##nee 1661 hagen tnt rudder fiesta transcript dorothea ska inhibitor maccabi retorted raining encompassed clauses menacing 1642 lineman ##gist vamps ##ape ##dick gloom ##rera dealings easing seekers ##nut ##pment helens unmanned ##anu ##isson basics ##amy ##ckman adjustments 1688 brutality horne ##zell sui ##55 ##mable aggregator ##thal rhino ##drick ##vira counters zoom ##01 ##rting mn montenegrin packard ##unciation ##♭ ##kki reclaim scholastic thugs pulsed ##icia syriac quan saddam banda kobe blaming buddies dissent ##lusion ##usia corbett jaya delle erratic lexie ##hesis 435 amiga hermes ##pressing ##leen chapels gospels jamal ##uating compute revolving warp ##sso ##thes armory ##eras ##gol antrim loki ##kow ##asian ##good ##zano braid handwriting subdistrict funky pantheon ##iculate concurrency estimation improper juliana ##his newcomers johnstone staten communicated ##oco ##alle sausage stormy ##stered ##tters superfamily ##grade acidic collateral tabloid ##oped ##rza bladder austen ##ellant mcgraw ##hay hannibal mein aquino lucifer wo badger boar cher christensen greenberg interruption ##kken jem 244 mocked bottoms cambridgeshire ##lide sprawling ##bbly eastwood ghent synth ##buck advisers ##bah nominally hapoel qu daggers estranged fabricated towels vinnie wcw misunderstanding anglia nothin unmistakable ##dust ##lova chilly marquette truss ##edge ##erine reece ##lty ##chemist ##connected 272 308 41st bash raion waterfalls ##ump ##main labyrinth queue theorist ##istle bharatiya flexed soundtracks rooney leftist patrolling wharton plainly alleviate eastman schuster topographic engages immensely unbearable fairchild 1620 dona lurking parisian oliveira ia indictment hahn bangladeshi ##aster vivo ##uming ##ential antonia expects indoors kildare harlan ##logue ##ogenic ##sities forgiven ##wat childish tavi ##mide ##orra plausible grimm successively scooted ##bola ##dget ##rith spartans emery flatly azure epilogue ##wark flourish ##iny ##tracted ##overs ##oshi bestseller distressed receipt spitting hermit topological ##cot drilled subunit francs ##layer eel ##fk ##itas octopus footprint petitions ufo ##say ##foil interfering leaking palo ##metry thistle valiant ##pic narayan mcpherson ##fast gonzales ##ym ##enne dustin novgorod solos ##zman doin ##raph ##patient ##meyer soluble ashland cuffs carole pendleton whistling vassal ##river deviation revisited constituents rallied rotate loomed ##eil ##nting amateurs augsburg auschwitz crowns skeletons ##cona bonnet 257 dummy globalization simeon sleeper mandal differentiated ##crow ##mare milne bundled exasperated talmud owes segregated ##feng ##uary dentist piracy props ##rang devlin ##torium malicious paws ##laid dependency ##ergy ##fers ##enna 258 pistons rourke jed grammatical tres maha wig 512 ghostly jayne ##achal ##creen ##ilis ##lins ##rence designate ##with arrogance cambodian clones showdown throttle twain ##ception lobes metz nagoya 335 braking ##furt 385 roaming ##minster amin crippled ##37 ##llary indifferent hoffmann idols intimidating 1751 261 influenza memo onions 1748 bandage consciously ##landa ##rage clandestine observes swiped tangle ##ener ##jected ##trum ##bill ##lta hugs congresses josiah spirited ##dek humanist managerial filmmaking inmate rhymes debuting grimsby ur ##laze duplicate vigor ##tf republished bolshevik refurbishment antibiotics martini methane newscasts royale horizons levant iain visas ##ischen paler ##around manifestation snuck alf chop futile pedestal rehab ##kat bmg kerman res fairbanks jarrett abstraction saharan ##zek 1746 procedural clearer kincaid sash luciano ##ffey crunch helmut ##vara revolutionaries ##tute creamy leach ##mmon 1747 permitting nes plight wendell ##lese contra ts clancy ipa mach staples autopsy disturbances nueva karin pontiac ##uding proxy venerable haunt leto bergman expands ##helm wal ##pipe canning celine cords obesity ##enary intrusion planner ##phate reasoned sequencing 307 harrow ##chon ##dora marred mcintyre repay tarzan darting 248 harrisburg margarita repulsed ##hur ##lding belinda hamburger novo compliant runways bingham registrar skyscraper ic cuthbert improvisation livelihood ##corp ##elial admiring ##dened sporadic believer casablanca popcorn ##29 asha shovel ##bek ##dice coiled tangible ##dez casper elsie resin tenderness rectory ##ivision avail sonar ##mori boutique ##dier guerre bathed upbringing vaulted sandals blessings ##naut ##utnant 1680 306 foxes pia corrosion hesitantly confederates crystalline footprints shapiro tirana valentin drones 45th microscope shipments texted inquisition wry guernsey unauthorized resigning 760 ripple schubert stu reassure felony ##ardo brittle koreans ##havan ##ives dun implicit tyres ##aldi ##lth magnolia ##ehan ##puri ##poulos aggressively fei gr familiarity ##poo indicative ##trust fundamentally jimmie overrun 395 anchors moans ##opus britannia armagh ##ggle purposely seizing ##vao bewildered mundane avoidance cosmopolitan geometridae quartermaster caf 415 chatter engulfed gleam purge ##icate juliette jurisprudence guerra revisions ##bn casimir brew ##jm 1749 clapton cloudy conde hermitage 278 simulations torches vincenzo matteo ##rill hidalgo booming westbound accomplishment tentacles unaffected ##sius annabelle flopped sloping ##litz dreamer interceptor vu ##loh consecration copying messaging breaker climates hospitalized 1752 torino afternoons winfield witnessing ##teacher breakers choirs sawmill coldly ##ege sipping haste uninhabited conical bibliography pamphlets severn edict ##oca deux illnesses grips ##pl rehearsals sis thinkers tame ##keepers 1690 acacia reformer ##osed ##rys shuffling ##iring ##shima eastbound ionic rhea flees littered ##oum rocker vomiting groaning champ overwhelmingly civilizations paces sloop adoptive ##tish skaters ##vres aiding mango ##joy nikola shriek ##ignon pharmaceuticals ##mg tuna calvert gustavo stocked yearbook ##urai ##mana computed subsp riff hanoi kelvin hamid moors pastures summons jihad nectar ##ctors bayou untitled pleasing vastly republics intellect ##η ##ulio ##tou crumbling stylistic sb ##ی consolation frequented h₂o walden widows ##iens 404 ##ignment chunks improves 288 grit recited ##dev snarl sociological ##arte ##gul inquired ##held bruise clube consultancy homogeneous hornets multiplication pasta prick savior ##grin ##kou ##phile yoon ##gara grimes vanishing cheering reacting bn distillery ##quisite ##vity coe dockyard massif ##jord escorts voss ##valent byte chopped hawke illusions workings floats ##koto ##vac kv annapolis madden ##onus alvaro noctuidae ##cum ##scopic avenge steamboat forte illustrates erika ##trip 570 dew nationalities bran manifested thirsty diversified muscled reborn ##standing arson ##lessness ##dran ##logram ##boys ##kushima ##vious willoughby ##phobia 286 alsace dashboard yuki ##chai granville myspace publicized tricked ##gang adjective ##ater relic reorganisation enthusiastically indications saxe ##lassified consolidate iec padua helplessly ramps renaming regulars pedestrians accents convicts inaccurate lowers mana ##pati barrie bjp outta someplace berwick flanking invoked marrow sparsely excerpts clothed rei ##ginal wept ##straße ##vish alexa excel ##ptive membranes aquitaine creeks cutler sheppard implementations ns ##dur fragrance budge concordia magnesium marcelo ##antes gladly vibrating ##rral ##ggles montrose ##omba lew seamus 1630 cocky ##ament ##uen bjorn ##rrick fielder fluttering ##lase methyl kimberley mcdowell reductions barbed ##jic ##tonic aeronautical condensed distracting ##promising huffed ##cala ##sle claudius invincible missy pious balthazar ci ##lang butte combo orson ##dication myriad 1707 silenced ##fed ##rh coco netball yourselves ##oza clarify heller peg durban etudes offender roast blackmail curvature ##woods vile 309 illicit suriname ##linson overture 1685 bubbling gymnast tucking ##mming ##ouin maldives ##bala gurney ##dda ##eased ##oides backside pinto jars racehorse tending ##rdial baronetcy wiener duly ##rke barbarian cupping flawed ##thesis bertha pleistocene puddle swearing ##nob ##tically fleeting prostate amulet educating ##mined ##iti ##tler 75th jens respondents analytics cavaliers papacy raju ##iente ##ulum ##tip funnel 271 disneyland ##lley sociologist ##iam 2500 faulkner louvre menon ##dson 276 ##ower afterlife mannheim peptide referees comedians meaningless ##anger ##laise fabrics hurley renal sleeps ##bour ##icle breakout kristin roadside animator clover disdain unsafe redesign ##urity firth barnsley portage reset narrows 268 commandos expansive speechless tubular ##lux essendon eyelashes smashwords ##yad ##bang ##claim craved sprinted chet somme astor wrocław orton 266 bane ##erving ##uing mischief ##amps ##sund scaling terre ##xious impairment offenses undermine moi soy contiguous arcadia inuit seam ##tops macbeth rebelled ##icative ##iot 590 elaborated frs uniformed ##dberg 259 powerless priscilla stimulated 980 qc arboretum frustrating trieste bullock ##nified enriched glistening intern ##adia locus nouvelle ollie ike lash starboard ee tapestry headlined hove rigged ##vite pollock ##yme thrive clustered cas roi gleamed olympiad ##lino pressured regimes ##hosis ##lick ripley ##ophone kickoff gallon rockwell ##arable crusader glue revolutions scrambling 1714 grover ##jure englishman aztec 263 contemplating coven ipad preach triumphant tufts ##esian rotational ##phus 328 falkland ##brates strewn clarissa rejoin environmentally glint banded drenched moat albanians johor rr maestro malley nouveau shaded taxonomy v6 adhere bunk airfields ##ritan 1741 encompass remington tran ##erative amelie mazda friar morals passions ##zai breadth vis ##hae argus burnham caressing insider rudd ##imov ##mini ##rso italianate murderous textual wainwright armada bam weave timer ##taken ##nh fra ##crest ardent salazar taps tunis ##ntino allegro gland philanthropic ##chester implication ##optera esq judas noticeably wynn ##dara inched indexed crises villiers bandit royalties patterned cupboard interspersed accessory isla kendrick entourage stitches ##esthesia headwaters ##ior interlude distraught draught 1727 ##basket biased sy transient triad subgenus adapting kidd shortstop ##umatic dimly spiked mcleod reprint nellie pretoria windmill ##cek singled ##mps 273 reunite ##orous 747 bankers outlying ##omp ##ports ##tream apologies cosmetics patsy ##deh ##ocks ##yson bender nantes serene ##nad lucha mmm 323 ##cius ##gli cmll coinage nestor juarez ##rook smeared sprayed twitching sterile irina embodied juveniles enveloped miscellaneous cancers dq gulped luisa crested swat donegal ref ##anov ##acker hearst mercantile ##lika doorbell ua vicki ##alla ##som bilbao psychologists stryker sw horsemen turkmenistan wits ##national anson mathew screenings ##umb rihanna ##agne ##nessy aisles ##iani ##osphere hines kenton saskatoon tasha truncated ##champ ##itan mildred advises fredrik interpreting inhibitors ##athi spectroscopy ##hab ##kong karim panda ##oia ##nail ##vc conqueror kgb leukemia ##dity arrivals cheered pisa phosphorus shielded ##riated mammal unitarian urgently chopin sanitary ##mission spicy drugged hinges ##tort tipping trier impoverished westchester ##caster 267 epoch nonstop ##gman ##khov aromatic centrally cerro ##tively ##vio billions modulation sedimentary 283 facilitating outrageous goldstein ##eak ##kt ld maitland penultimate pollard ##dance fleets spaceship vertebrae ##nig alcoholism als recital ##bham ##ference ##omics m2 ##bm trois ##tropical ##в commemorates ##meric marge ##raction 1643 670 cosmetic ravaged ##ige catastrophe eng ##shida albrecht arterial bellamy decor harmon ##rde bulbs synchronized vito easiest shetland shielding wnba ##glers ##ssar ##riam brianna cumbria ##aceous ##rard cores thayer ##nsk brood hilltop luminous carts keynote larkin logos ##cta ##ا ##mund ##quay lilith tinted 277 wrestle mobilization ##uses sequential siam bloomfield takahashi 274 ##ieving presenters ringo blazed witty ##oven ##ignant devastation haydn harmed newt therese ##peed gershwin molina rabbis sudanese 001 innate restarted ##sack ##fus slices wb ##shah enroll hypothetical hysterical 1743 fabio indefinite warped ##hg exchanging 525 unsuitable ##sboro gallo 1603 bret cobalt homemade ##hunter mx operatives ##dhar terraces durable latch pens whorls ##ctuated ##eaux billing ligament succumbed ##gly regulators spawn ##brick ##stead filmfare rochelle ##nzo 1725 circumstance saber supplements ##nsky ##tson crowe wellesley carrot ##9th ##movable primate drury sincerely topical ##mad ##rao callahan kyiv smarter tits undo ##yeh announcements anthologies barrio nebula ##islaus ##shaft ##tyn bodyguards 2021 assassinate barns emmett scully ##mah ##yd ##eland ##tino ##itarian demoted gorman lashed prized adventist writ ##gui alla invertebrates ##ausen 1641 amman 1742 align healy redistribution ##gf ##rize insulation ##drop adherents hezbollah vitro ferns yanking 269 php registering uppsala cheerleading confines mischievous tully ##ross 49th docked roam stipulated pumpkin ##bry prompt ##ezer blindly shuddering craftsmen frail scented katharine scramble shaggy sponge helix zaragoza 279 ##52 43rd backlash fontaine seizures posse cowan nonfiction telenovela wwii hammered undone ##gpur encircled irs ##ivation artefacts oneself searing smallpox ##belle ##osaurus shandong breached upland blushing rankin infinitely psyche tolerated docking evicted ##col unmarked ##lving gnome lettering litres musique ##oint benevolent ##jal blackened ##anna mccall racers tingle ##ocene ##orestation introductions radically 292 ##hiff ##باد 1610 1739 munchen plead ##nka condo scissors ##sight ##tens apprehension ##cey ##yin hallmark watering formulas sequels ##llas aggravated bae commencing ##building enfield prohibits marne vedic civilized euclidean jagger beforehand blasts dumont ##arney ##nem 740 conversions hierarchical rios simulator ##dya ##lellan hedges oleg thrusts shadowed darby maximize 1744 gregorian ##nded ##routed sham unspecified ##hog emory factual ##smo ##tp fooled ##rger ortega wellness marlon ##oton ##urance casket keating ley enclave ##ayan char influencing jia ##chenko 412 ammonia erebidae incompatible violins cornered ##arat grooves astronauts columbian rampant fabrication kyushu mahmud vanish ##dern mesopotamia ##lete ict ##rgen caspian kenji pitted ##vered 999 grimace roanoke tchaikovsky twinned ##analysis ##awan xinjiang arias clemson kazakh sizable 1662 ##khand ##vard plunge tatum vittorio ##nden cholera ##dana ##oper bracing indifference projectile superliga ##chee realises upgrading 299 porte retribution ##vies nk stil ##resses ama bureaucracy blackberry bosch testosterone collapses greer ##pathic ioc fifties malls ##erved bao baskets adolescents siegfried ##osity ##tosis mantra detecting existent fledgling ##cchi dissatisfied gan telecommunication mingled sobbed 6000 controversies outdated taxis ##raus fright slams ##lham ##fect ##tten detectors fetal tanned ##uw fray goth olympian skipping mandates scratches sheng unspoken hyundai tracey hotspur restrictive ##buch americana mundo ##bari burroughs diva vulcan ##6th distinctions thumping ##ngen mikey sheds fide rescues springsteen vested valuation ##ece ##ely pinnacle rake sylvie ##edo almond quivering ##irus alteration faltered ##wad 51st hydra ticked ##kato recommends ##dicated antigua arjun stagecoach wilfred trickle pronouns ##pon aryan nighttime ##anian gall pea stitch ##hei leung milos ##dini eritrea nexus starved snowfall kant parasitic cot discus hana strikers appleton kitchens ##erina ##partisan ##itha ##vius disclose metis ##channel 1701 tesla ##vera fitch 1735 blooded ##tila decimal ##tang ##bai cyclones eun bottled peas pensacola basha bolivian crabs boil lanterns partridge roofed 1645 necks ##phila opined patting ##kla ##lland chuckles volta whereupon ##nche devout euroleague suicidal ##dee inherently involuntary knitting nasser ##hide puppets colourful courageous southend stills miraculous hodgson richer rochdale ethernet greta uniting prism umm ##haya ##itical ##utation deterioration pointe prowess ##ropriation lids scranton billings subcontinent ##koff ##scope brute kellogg psalms degraded ##vez stanisław ##ructured ferreira pun astonishing gunnar ##yat arya prc gottfried ##tight excursion ##ographer dina ##quil ##nare huffington illustrious wilbur gundam verandah ##zard naacp ##odle constructive fjord kade ##naud generosity thrilling baseline cayman frankish plastics accommodations zoological ##fting cedric qb motorized ##dome ##otted squealed tackled canucks budgets situ asthma dail gabled grasslands whimpered writhing judgments ##65 minnie pv ##carbon bananas grille domes monique odin maguire markham tierney ##estra ##chua libel poke speedy atrium laval notwithstanding ##edly fai kala ##sur robb ##sma listings luz supplementary tianjin ##acing enzo jd ric scanner croats transcribed ##49 arden cv ##hair ##raphy ##lver ##uy 357 seventies staggering alam horticultural hs regression timbers blasting ##ounded montagu manipulating ##cit catalytic 1550 troopers ##meo condemnation fitzpatrick ##oire ##roved inexperienced 1670 castes ##lative outing 314 dubois flicking quarrel ste learners 1625 iq whistled ##class 282 classify tariffs temperament 355 folly liszt ##yles immersed jordanian ceasefire apparel extras maru fished ##bio harta stockport assortment craftsman paralysis transmitters ##cola blindness ##wk fatally proficiency solemnly ##orno repairing amore groceries ultraviolet ##chase schoolhouse ##tua resurgence nailed ##otype ##× ruse saliva diagrams ##tructing albans rann thirties 1b antennas hilarious cougars paddington stats ##eger breakaway ipod reza authorship prohibiting scoffed ##etz ##ttle conscription defected trondheim ##fires ivanov keenan ##adan ##ciful ##fb ##slow locating ##ials ##tford cadiz basalt blankly interned rags rattling ##tick carpathian reassured sync bum guildford iss staunch ##onga astronomers sera sofie emergencies susquehanna ##heard duc mastery vh1 williamsburg bayer buckled craving ##khan ##rdes bloomington ##write alton barbecue ##bians justine ##hri ##ndt delightful smartphone newtown photon retrieval peugeot hissing ##monium ##orough flavors lighted relaunched tainted ##games ##lysis anarchy microscopic hopping adept evade evie ##beau inhibit sinn adjustable hurst intuition wilton cisco 44th lawful lowlands stockings thierry ##dalen ##hila ##nai fates prank tb maison lobbied provocative 1724 4a utopia ##qual carbonate gujarati purcell ##rford curtiss ##mei overgrown arenas mediation swallows ##rnik respectful turnbull ##hedron ##hope alyssa ozone ##ʻi ami gestapo johansson snooker canteen cuff declines empathy stigma ##ags ##iner ##raine taxpayers gui volga ##wright ##copic lifespan overcame tattooed enactment giggles ##ador ##camp barrington bribe obligatory orbiting peng ##enas elusive sucker ##vating cong hardship empowered anticipating estrada cryptic greasy detainees planck sudbury plaid dod marriott kayla ##ears ##vb ##zd mortally ##hein cognition radha 319 liechtenstein meade richly argyle harpsichord liberalism trumpets lauded tyrant salsa tiled lear promoters reused slicing trident ##chuk ##gami ##lka cantor checkpoint ##points gaul leger mammalian ##tov ##aar ##schaft doha frenchman nirvana ##vino delgado headlining ##eron ##iography jug tko 1649 naga intersections ##jia benfica nawab ##suka ashford gulp ##deck ##vill ##rug brentford frazier pleasures dunne potsdam shenzhen dentistry ##tec flanagan ##dorff ##hear chorale dinah prem quezon ##rogated relinquished sutra terri ##pani flaps ##rissa poly ##rnet homme aback ##eki linger womb ##kson ##lewood doorstep orthodoxy threaded westfield ##rval dioceses fridays subsided ##gata loyalists ##biotic ##ettes letterman lunatic prelate tenderly invariably souza thug winslow ##otide furlongs gogh jeopardy ##runa pegasus ##umble humiliated standalone tagged ##roller freshmen klan ##bright attaining initiating transatlantic logged viz ##uance 1723 combatants intervening stephane chieftain despised grazed 317 cdc galveston godzilla macro simulate ##planes parades ##esses 960 ##ductive ##unes equator overdose ##cans ##hosh ##lifting joshi epstein sonora treacherous aquatics manchu responsive ##sation supervisory ##christ ##llins ##ibar ##balance ##uso kimball karlsruhe mab ##emy ignores phonetic reuters spaghetti 820 almighty danzig rumbling tombstone designations lured outset ##felt supermarkets ##wt grupo kei kraft susanna ##blood comprehension genealogy ##aghan ##verted redding ##ythe 1722 bowing ##pore ##roi lest sharpened fulbright valkyrie sikhs ##unds swans bouquet merritt ##tage ##venting commuted redhead clerks leasing cesare dea hazy ##vances fledged greenfield servicemen ##gical armando blackout dt sagged downloadable intra potion pods ##4th ##mism xp attendants gambia stale ##ntine plump asteroids rediscovered buds flea hive ##neas 1737 classifications debuts ##eles olympus scala ##eurs ##gno ##mute hummed sigismund visuals wiggled await pilasters clench sulfate ##ances bellevue enigma trainee snort ##sw clouded denim ##rank ##rder churning hartman lodges riches sima ##missible accountable socrates regulates mueller ##cr 1702 avoids solids himalayas nutrient pup ##jevic squat fades nec ##lates ##pina ##rona ##ου privateer tequila ##gative ##mpton apt hornet immortals ##dou asturias cleansing dario ##rries ##anta etymology servicing zhejiang ##venor ##nx horned erasmus rayon relocating £10 ##bags escalated promenade stubble 2010s artisans axial liquids mora sho yoo ##tsky bundles oldies ##nally notification bastion ##ths sparkle ##lved 1728 leash pathogen highs ##hmi immature 880 gonzaga ignatius mansions monterrey sweets bryson ##loe polled regatta brightest pei rosy squid hatfield payroll addict meath cornerback heaviest lodging ##mage capcom rippled ##sily barnet mayhem ymca snuggled rousseau ##cute blanchard 284 fragmented leighton chromosomes risking ##md ##strel ##utter corinne coyotes cynical hiroshi yeomanry ##ractive ebook grading mandela plume agustin magdalene ##rkin bea femme trafford ##coll ##lun ##tance 52nd fourier upton ##mental camilla gust iihf islamabad longevity ##kala feldman netting ##rization endeavour foraging mfa orr ##open greyish contradiction graz ##ruff handicapped marlene tweed oaxaca spp campos miocene pri configured cooks pluto cozy pornographic ##entes 70th fairness glided jonny lynne rounding sired ##emon ##nist remade uncover ##mack complied lei newsweek ##jured ##parts ##enting ##pg 293 finer guerrillas athenian deng disused stepmother accuse gingerly seduction 521 confronting ##walker ##going gora nostalgia sabres virginity wrenched ##minated syndication wielding eyre ##56 ##gnon ##igny behaved taxpayer sweeps ##growth childless gallant ##ywood amplified geraldine scrape ##ffi babylonian fresco ##rdan ##kney ##position 1718 restricting tack fukuoka osborn selector partnering ##dlow 318 gnu kia tak whitley gables ##54 ##mania mri softness immersion ##bots ##evsky 1713 chilling insignificant pcs ##uis elites lina purported supplemental teaming ##americana ##dding ##inton proficient rouen ##nage ##rret niccolo selects ##bread fluffy 1621 gruff knotted mukherjee polgara thrash nicholls secluded smoothing thru corsica loaf whitaker inquiries ##rrier ##kam indochina 289 marlins myles peking ##tea extracts pastry superhuman connacht vogel ##ditional ##het ##udged ##lash gloss quarries refit teaser ##alic ##gaon 20s materialized sling camped pickering tung tracker pursuant ##cide cranes soc ##cini ##typical ##viere anhalt overboard workout chores fares orphaned stains ##logie fenton surpassing joyah triggers ##itte grandmaster ##lass ##lists clapping fraudulent ledger nagasaki ##cor ##nosis ##tsa eucalyptus tun ##icio ##rney ##tara dax heroism ina wrexham onboard unsigned ##dates moshe galley winnie droplets exiles praises watered noodles ##aia fein adi leland multicultural stink bingo comets erskine modernized canned constraint domestically chemotherapy featherweight stifled ##mum darkly irresistible refreshing hasty isolate ##oys kitchener planners ##wehr cages yarn implant toulon elects childbirth yue ##lind ##lone cn rightful sportsman junctions remodeled specifies ##rgh 291 ##oons complimented ##urgent lister ot ##logic bequeathed cheekbones fontana gabby ##dial amadeus corrugated maverick resented triangles ##hered ##usly nazareth tyrol 1675 assent poorer sectional aegean ##cous 296 nylon ghanaian ##egorical ##weig cushions forbid fusiliers obstruction somerville ##scia dime earrings elliptical leyte oder polymers timmy atm midtown piloted settles continual externally mayfield ##uh enrichment henson keane persians 1733 benji braden pep 324 ##efe contenders pepsi valet ##isches 298 ##asse ##earing goofy stroll ##amen authoritarian occurrences adversary ahmedabad tangent toppled dorchester 1672 modernism marxism islamist charlemagne exponential racks unicode brunette mbc pic skirmish ##bund ##lad ##powered ##yst hoisted messina shatter ##ctum jedi vantage ##music ##neil clemens mahmoud corrupted authentication lowry nils ##washed omnibus wounding jillian ##itors ##opped serialized narcotics handheld ##arm ##plicity intersecting stimulating ##onis crate fellowships hemingway casinos climatic fordham copeland drip beatty leaflets robber brothel madeira ##hedral sphinx ultrasound ##vana valor forbade leonid villas ##aldo duane marquez ##cytes disadvantaged forearms kawasaki reacts consular lax uncles uphold ##hopper concepcion dorsey lass ##izan arching passageway 1708 researches tia internationals ##graphs ##opers distinguishes javanese divert ##uven plotted ##listic ##rwin ##erik ##tify affirmative signifies validation ##bson kari felicity georgina zulu ##eros ##rained ##rath overcoming ##dot argyll ##rbin 1734 chiba ratification windy earls parapet ##marks hunan pristine astrid punta ##gart brodie ##kota ##oder malaga minerva rouse ##phonic bellowed pagoda portals reclamation ##gur ##odies ##⁄₄ parentheses quoting allergic palette showcases benefactor heartland nonlinear ##tness bladed cheerfully scans ##ety ##hone 1666 girlfriends pedersen hiram sous ##liche ##nator 1683 ##nery ##orio ##umen bobo primaries smiley ##cb unearthed uniformly fis metadata 1635 ind ##oted recoil ##titles ##tura ##ια 406 hilbert jamestown mcmillan tulane seychelles ##frid antics coli fated stucco ##grants 1654 bulky accolades arrays caledonian carnage optimism puebla ##tative ##cave enforcing rotherham seo dunlop aeronautics chimed incline zoning archduke hellenistic ##oses ##sions candi thong ##ople magnate rustic ##rsk projective slant ##offs danes hollis vocalists ##ammed congenital contend gesellschaft ##ocating ##pressive douglass quieter ##cm ##kshi howled salim spontaneously townsville buena southport ##bold kato 1638 faerie stiffly ##vus ##rled 297 flawless realising taboo ##7th bytes straightening 356 jena ##hid ##rmin cartwright berber bertram soloists 411 noses 417 coping fission hardin inca ##cen 1717 mobilized vhf ##raf biscuits curate ##85 ##anial 331 gaunt neighbourhoods 1540 ##abas blanca bypassed sockets behold coincidentally ##bane nara shave splinter terrific ##arion ##erian commonplace juris redwood waistband boxed caitlin fingerprints jennie naturalized ##ired balfour craters jody bungalow hugely quilt glitter pigeons undertaker bulging constrained goo ##sil ##akh assimilation reworked ##person persuasion ##pants felicia ##cliff ##ulent 1732 explodes ##dun ##inium ##zic lyman vulture hog overlook begs northwards ow spoil ##urer fatima favorably accumulate sargent sorority corresponded dispersal kochi toned ##imi ##lita internacional newfound ##agger ##lynn ##rigue booths peanuts ##eborg medicare muriel nur ##uram crates millennia pajamas worsened ##breakers jimi vanuatu yawned ##udeau carousel ##hony hurdle ##ccus ##mounted ##pod rv ##eche airship ambiguity compulsion recapture ##claiming arthritis ##osomal 1667 asserting ngc sniffing dade discontent glendale ported ##amina defamation rammed ##scent fling livingstone ##fleet 875 ##ppy apocalyptic comrade lcd ##lowe cessna eine persecuted subsistence demi hoop reliefs 710 coptic progressing stemmed perpetrators 1665 priestess ##nio dobson ebony rooster itf tortricidae ##bbon ##jian cleanup ##jean ##øy 1721 eighties taxonomic holiness ##hearted ##spar antilles showcasing stabilized ##nb gia mascara michelangelo dawned ##uria ##vinsky extinguished fitz grotesque £100 ##fera ##loid ##mous barges neue throbbed cipher johnnie ##a1 ##mpt outburst ##swick spearheaded administrations c1 heartbreak pixels pleasantly ##enay lombardy plush ##nsed bobbie ##hly reapers tremor xiang minogue substantive hitch barak ##wyl kwan ##encia 910 obscene elegance indus surfer bribery conserve ##hyllum ##masters horatio ##fat apes rebound psychotic ##pour iteration ##mium ##vani botanic horribly antiques dispose paxton ##hli ##wg timeless 1704 disregard engraver hounds ##bau ##version looted uno facilitates groans masjid rutland antibody disqualification decatur footballers quake slacks 48th rein scribe stabilize commits exemplary tho ##hort ##chison pantry traversed ##hiti disrepair identifiable vibrated baccalaureate ##nnis csa interviewing ##iensis ##raße greaves wealthiest 343 classed jogged £5 ##58 ##atal illuminating knicks respecting ##uno scrubbed ##iji ##dles kruger moods growls raider silvia chefs kam vr cree percival ##terol gunter counterattack defiant henan ze ##rasia ##riety equivalence submissions ##fra ##thor bautista mechanically ##heater cornice herbal templar ##mering outputs ruining ligand renumbered extravagant mika blockbuster eta insurrection ##ilia darkening ferocious pianos strife kinship ##aer melee ##anor ##iste ##may ##oue decidedly weep ##jad ##missive ##ppel 354 puget unease ##gnant 1629 hammering kassel ob wessex ##lga bromwich egan paranoia utilization ##atable ##idad contradictory provoke ##ols ##ouring ##tangled knesset ##very ##lette plumbing ##sden ##¹ greensboro occult sniff 338 zev beaming gamer haggard mahal ##olt ##pins mendes utmost briefing gunnery ##gut ##pher ##zh ##rok 1679 khalifa sonya ##boot principals urbana wiring ##liffe ##minating ##rrado dahl nyu skepticism np townspeople ithaca lobster somethin ##fur ##arina ##−1 freighter zimmerman biceps contractual ##herton amend hurrying subconscious ##anal 336 meng clermont spawning ##eia ##lub dignitaries impetus snacks spotting twigs ##bilis ##cz ##ouk libertadores nic skylar ##aina ##firm gustave asean ##anum dieter legislatures flirt bromley trolls umar ##bbies ##tyle blah parc bridgeport crank negligence ##nction 46th constantin molded bandages seriousness 00pm siegel carpets compartments upbeat statehood ##dner ##edging marko 730 platt ##hane paving ##iy 1738 abbess impatience limousine nbl ##talk 441 lucille mojo nightfall robbers ##nais karel brisk calves replicate ascribed telescopes ##olf intimidated ##reen ballast specialization ##sit aerodynamic caliphate rainer visionary ##arded epsilon ##aday ##onte aggregation auditory boosted reunification kathmandu loco robyn 402 acknowledges appointing humanoid newell redeveloped restraints ##tained barbarians chopper 1609 italiana ##lez ##lho investigates wrestlemania ##anies ##bib 690 ##falls creaked dragoons gravely minions stupidity volley ##harat ##week musik ##eries ##uously fungal massimo semantics malvern ##ahl ##pee discourage embryo imperialism 1910s profoundly ##ddled jiangsu sparkled stat ##holz sweatshirt tobin ##iction sneered ##cheon ##oit brit causal smyth ##neuve diffuse perrin silvio ##ipes ##recht detonated iqbal selma ##nism ##zumi roasted ##riders tay ##ados ##mament ##mut ##rud 840 completes nipples cfa flavour hirsch ##laus calderon sneakers moravian ##ksha 1622 rq 294 ##imeters bodo ##isance ##pre ##ronia anatomical excerpt ##lke dh kunst ##tablished ##scoe biomass panted unharmed gael housemates montpellier ##59 coa rodents tonic hickory singleton ##taro 451 1719 aldo breaststroke dempsey och rocco ##cuit merton dissemination midsummer serials ##idi haji polynomials ##rdon gs enoch prematurely shutter taunton £3 ##grating ##inates archangel harassed ##asco 326 archway dazzling ##ecin 1736 sumo wat ##kovich 1086 honneur ##ently ##nostic ##ttal ##idon 1605 403 1716 blogger rents ##gnan hires ##ikh ##dant howie ##rons handler retracted shocks 1632 arun duluth kepler trumpeter ##lary peeking seasoned trooper ##mara laszlo ##iciencies ##rti heterosexual ##inatory ##ssion indira jogging ##inga ##lism beit dissatisfaction malice ##ately nedra peeling ##rgeon 47th stadiums 475 vertigo ##ains iced restroom ##plify ##tub illustrating pear ##chner ##sibility inorganic rappers receipts watery ##kura lucinda ##oulos reintroduced ##8th ##tched gracefully saxons nutritional wastewater rained favourites bedrock fisted hallways likeness upscale ##lateral 1580 blinds prequel ##pps ##tama deter humiliating restraining tn vents 1659 laundering recess rosary tractors coulter federer ##ifiers ##plin persistence ##quitable geschichte pendulum quakers ##beam bassett pictorial buffet koln ##sitor drills reciprocal shooters ##57 ##cton ##tees converge pip dmitri donnelly yamamoto aqua azores demographics hypnotic spitfire suspend wryly roderick ##rran sebastien ##asurable mavericks ##fles ##200 himalayan prodigy ##iance transvaal demonstrators handcuffs dodged mcnamara sublime 1726 crazed ##efined ##till ivo pondered reconciled shrill sava ##duk bal cad heresy jaipur goran ##nished 341 lux shelly whitehall ##hre israelis peacekeeping ##wled 1703 demetrius ousted ##arians ##zos beale anwar backstroke raged shrinking cremated ##yck benign towing wadi darmstadt landfill parana soothe colleen sidewalks mayfair tumble hepatitis ferrer superstructure ##gingly ##urse ##wee anthropological translators ##mies closeness hooves ##pw mondays ##roll ##vita landscaping ##urized purification sock thorns thwarted jalan tiberius ##taka saline ##rito confidently khyber sculptors ##ij brahms hammersmith inspectors battista fivb fragmentation hackney ##uls arresting exercising antoinette bedfordshire ##zily dyed ##hema 1656 racetrack variability ##tique 1655 austrians deteriorating madman theorists aix lehman weathered 1731 decreed eruptions 1729 flaw quinlan sorbonne flutes nunez 1711 adored downwards fable rasped 1712 moritz mouthful renegade shivers stunts dysfunction restrain translit 327 pancakes ##avio ##cision ##tray 351 vial ##lden bain ##maid ##oxide chihuahua malacca vimes ##rba ##rnier 1664 donnie plaques ##ually 337 bangs floppy huntsville loretta nikolay ##otte eater handgun ubiquitous ##hett eras zodiac 1634 ##omorphic 1820s ##zog cochran ##bula ##lithic warring ##rada dalai excused blazers mcconnell reeling bot este ##abi geese hoax taxon ##bla guitarists ##icon condemning hunts inversion moffat taekwondo ##lvis 1624 stammered ##rest ##rzy sousa fundraiser marylebone navigable uptown cabbage daniela salman shitty whimper ##kian ##utive programmers protections rm ##rmi ##rued forceful ##enes fuss ##tao ##wash brat oppressive reykjavik spartak ticking ##inkles ##kiewicz adolph horst maui protege straighten cpc landau concourse clements resultant ##ando imaginative joo reactivated ##rem ##ffled ##uising consultative ##guide flop kaitlyn mergers parenting somber ##vron supervise vidhan ##imum courtship exemplified harmonies medallist refining ##rrow ##ка amara ##hum 780 goalscorer sited overshadowed rohan displeasure secretive multiplied osman ##orth engravings padre ##kali ##veda miniatures mis ##yala clap pali rook ##cana 1692 57th antennae astro oskar 1628 bulldog crotch hackett yucatan ##sure amplifiers brno ferrara migrating ##gree thanking turing ##eza mccann ting andersson onslaught gaines ganga incense standardization ##mation sentai scuba stuffing turquoise waivers alloys ##vitt regaining vaults ##clops ##gizing digger furry memorabilia probing ##iad payton rec deutschland filippo opaque seamen zenith afrikaans ##filtration disciplined inspirational ##merie banco confuse grafton tod ##dgets championed simi anomaly biplane ##ceptive electrode ##para 1697 cleavage crossbow swirl informant ##lars ##osta afi bonfire spec ##oux lakeside slump ##culus ##lais ##qvist ##rrigan 1016 facades borg inwardly cervical xl pointedly 050 stabilization ##odon chests 1699 hacked ctv orthogonal suzy ##lastic gaulle jacobite rearview ##cam ##erted ashby ##drik ##igate ##mise ##zbek affectionately canine disperse latham ##istles ##ivar spielberg ##orin ##idium ezekiel cid ##sg durga middletown ##cina customized frontiers harden ##etano ##zzy 1604 bolsheviks ##66 coloration yoko ##bedo briefs slabs debra liquidation plumage ##oin blossoms dementia subsidy 1611 proctor relational jerseys parochial ter ##ici esa peshawar cavalier loren cpi idiots shamrock 1646 dutton malabar mustache ##endez ##ocytes referencing terminates marche yarmouth ##sop acton mated seton subtly baptised beige extremes jolted kristina telecast ##actic safeguard waldo ##baldi ##bular endeavors sloppy subterranean ##ensburg ##itung delicately pigment tq ##scu 1626 ##ound collisions coveted herds ##personal ##meister ##nberger chopra ##ricting abnormalities defective galician lucie ##dilly alligator likened ##genase burundi clears complexion derelict deafening diablo fingered champaign dogg enlist isotope labeling mrna ##erre brilliance marvelous ##ayo 1652 crawley ether footed dwellers deserts hamish rubs warlock skimmed ##lizer 870 buick embark heraldic irregularities ##ajan kiara ##kulam ##ieg antigen kowalski ##lge oakley visitation ##mbit vt ##suit 1570 murderers ##miento ##rites chimneys ##sling condemn custer exchequer havre ##ghi fluctuations ##rations dfb hendricks vaccines ##tarian nietzsche biking juicy ##duced brooding scrolling selangor ##ragan 352 annum boomed seminole sugarcane ##dna departmental dismissing innsbruck arteries ashok batavia daze kun overtook ##rga ##tlan beheaded gaddafi holm electronically faulty galilee fractures kobayashi ##lized gunmen magma aramaic mala eastenders inference messengers bf ##qu 407 bathrooms ##vere 1658 flashbacks ideally misunderstood ##jali ##weather mendez ##grounds 505 uncanny ##iii 1709 friendships ##nbc sacrament accommodated reiterated logistical pebbles thumped ##escence administering decrees drafts ##flight ##cased ##tula futuristic picket intimidation winthrop ##fahan interfered 339 afar francoise morally uta cochin croft dwarfs ##bruck ##dents ##nami biker ##hner ##meral nano ##isen ##ometric ##pres ##ан brightened meek parcels securely gunners ##jhl ##zko agile hysteria ##lten ##rcus bukit champs chevy cuckoo leith sadler theologians welded ##section 1663 jj plurality xander ##rooms ##formed shredded temps intimately pau tormented ##lok ##stellar 1618 charred ems essen ##mmel alarms spraying ascot blooms twinkle ##abia ##apes internment obsidian ##chaft snoop ##dav ##ooping malibu ##tension quiver ##itia hays mcintosh travers walsall ##ffie 1623 beverley schwarz plunging structurally m3 rosenthal vikram ##tsk 770 ghz ##onda ##tiv chalmers groningen pew reckon unicef ##rvis 55th ##gni 1651 sulawesi avila cai metaphysical screwing turbulence ##mberg augusto samba 56th baffled momentary toxin ##urian ##wani aachen condoms dali steppe ##3d ##app ##oed ##year adolescence dauphin electrically inaccessible microscopy nikita ##ega atv ##cel ##enter ##oles ##oteric ##ы accountants punishments wrongly bribes adventurous clinch flinders southland ##hem ##kata gough ##ciency lads soared ##ה undergoes deformation outlawed rubbish ##arus ##mussen ##nidae ##rzburg arcs ##ingdon ##tituted 1695 wheelbase wheeling bombardier campground zebra ##lices ##oj ##bain lullaby ##ecure donetsk wylie grenada ##arding ##ης squinting eireann opposes ##andra maximal runes ##broken ##cuting ##iface ##ror ##rosis additive britney adultery triggering ##drome detrimental aarhus containment jc swapped vichy ##ioms madly ##oric ##rag brant ##ckey ##trix 1560 1612 broughton rustling ##stems ##uder asbestos mentoring ##nivorous finley leaps ##isan apical pry slits substitutes ##dict intuitive fantasia insistent unreasonable ##igen ##vna domed hannover margot ponder ##zziness impromptu jian lc rampage stemming ##eft andrey gerais whichever amnesia appropriated anzac clicks modifying ultimatum cambrian maids verve yellowstone ##mbs conservatoire ##scribe adherence dinners spectra imperfect mysteriously sidekick tatar tuba ##aks ##ifolia distrust ##athan ##zle c2 ronin zac ##pse celaena instrumentalist scents skopje ##mbling comical compensated vidal condor intersect jingle wavelengths ##urrent mcqueen ##izzly carp weasel 422 kanye militias postdoctoral eugen gunslinger ##ɛ faux hospice ##for appalled derivation dwarves ##elis dilapidated ##folk astoria philology ##lwyn ##otho ##saka inducing philanthropy ##bf ##itative geek markedly sql ##yce bessie indices rn ##flict 495 frowns resolving weightlifting tugs cleric contentious 1653 mania rms ##miya ##reate ##ruck ##tucket bien eels marek ##ayton ##cence discreet unofficially ##ife leaks ##bber 1705 332 dung compressor hillsborough pandit shillings distal ##skin 381 ##tat ##you nosed ##nir mangrove undeveloped ##idia textures ##inho ##500 ##rise ae irritating nay amazingly bancroft apologetic compassionate kata symphonies ##lovic airspace ##lch 930 gifford precautions fulfillment sevilla vulgar martinique ##urities looting piccolo tidy ##dermott quadrant armchair incomes mathematicians stampede nilsson ##inking ##scan foo quarterfinal ##ostal shang shouldered squirrels ##owe 344 vinegar ##bner ##rchy ##systems delaying ##trics ars dwyer rhapsody sponsoring ##gration bipolar cinder starters ##olio ##urst 421 signage ##nty aground figurative mons acquaintances duets erroneously soyuz elliptic recreated ##cultural ##quette ##ssed ##tma ##zcz moderator scares ##itaire ##stones ##udence juniper sighting ##just ##nsen britten calabria ry bop cramer forsyth stillness ##л airmen gathers unfit ##umber ##upt taunting ##rip seeker streamlined ##bution holster schumann tread vox ##gano ##onzo strive dil reforming covent newbury predicting ##orro decorate tre ##puted andover ie asahi dept dunkirk gills ##tori buren huskies ##stis ##stov abstracts bets loosen ##opa 1682 yearning ##glio ##sir berman effortlessly enamel napoli persist ##peration ##uez attache elisa b1 invitations ##kic accelerating reindeer boardwalk clutches nelly polka starbucks ##kei adamant huey lough unbroken adventurer embroidery inspecting stanza ##ducted naia taluka ##pone ##roids chases deprivation florian ##jing ##ppet earthly ##lib ##ssee colossal foreigner vet freaks patrice rosewood triassic upstate ##pkins dominates ata chants ks vo ##400 ##bley ##raya ##rmed 555 agra infiltrate ##ailing ##ilation ##tzer ##uppe ##werk binoculars enthusiast fujian squeak ##avs abolitionist almeida boredom hampstead marsden rations ##ands inflated 334 bonuses rosalie patna ##rco 329 detachments penitentiary 54th flourishing woolf ##dion ##etched papyrus ##lster ##nsor ##toy bobbed dismounted endelle inhuman motorola tbs wince wreath ##ticus hideout inspections sanjay disgrace infused pudding stalks ##urbed arsenic leases ##hyl ##rrard collarbone ##waite ##wil dowry ##bant ##edance genealogical nitrate salamanca scandals thyroid necessitated ##! ##" ### ##$ ##% ##& ##' ##( ##) ##* ##+ ##, ##- ##. ##/ ##: ##; ##< ##= ##> ##? ##@ ##[ ##\ ##] ##^ ##_ ##` ##{ ##| ##} ##~ ##¡ ##¢ ##£ ##¤ ##¥ ##¦ ##§ ##¨ ##© ##ª ##« ##¬ ##® ##± ##´ ##µ ##¶ ##· ##º ##» ##¼ ##¾ ##¿ ##æ ##ð ##÷ ##þ ##đ ##ħ ##ŋ ##œ ##ƒ ##ɐ ##ɑ ##ɒ ##ɔ ##ɕ ##ə ##ɡ ##ɣ ##ɨ ##ɪ ##ɫ ##ɬ ##ɯ ##ɲ ##ɴ ##ɹ ##ɾ ##ʀ ##ʁ ##ʂ ##ʃ ##ʉ ##ʊ ##ʋ ##ʌ ##ʎ ##ʐ ##ʑ ##ʒ ##ʔ ##ʰ ##ʲ ##ʳ ##ʷ ##ʸ ##ʻ ##ʼ ##ʾ ##ʿ ##ˈ ##ˡ ##ˢ ##ˣ ##ˤ ##β ##γ ##δ ##ε ##ζ ##θ ##κ ##λ ##μ ##ξ ##ο ##π ##ρ ##σ ##τ ##υ ##φ ##χ ##ψ ##ω ##б ##г ##д ##ж ##з ##м ##п ##с ##у ##ф ##х ##ц ##ч ##ш ##щ ##ъ ##э ##ю ##ђ ##є ##і ##ј ##љ ##њ ##ћ ##ӏ ##ա ##բ ##գ ##դ ##ե ##թ ##ի ##լ ##կ ##հ ##մ ##յ ##ն ##ո ##պ ##ս ##վ ##տ ##ր ##ւ ##ք ##־ ##א ##ב ##ג ##ד ##ו ##ז ##ח ##ט ##י ##ך ##כ ##ל ##ם ##מ ##ן ##נ ##ס ##ע ##ף ##פ ##ץ ##צ ##ק ##ר ##ש ##ת ##، ##ء ##ب ##ت ##ث ##ج ##ح ##خ ##ذ ##ز ##س ##ش ##ص ##ض ##ط ##ظ ##ع ##غ ##ـ ##ف ##ق ##ك ##و ##ى ##ٹ ##پ ##چ ##ک ##گ ##ں ##ھ ##ہ ##ے ##अ ##आ ##उ ##ए ##क ##ख ##ग ##च ##ज ##ट ##ड ##ण ##त ##थ ##द ##ध ##न ##प ##ब ##भ ##म ##य ##र ##ल ##व ##श ##ष ##स ##ह ##ा ##ि ##ी ##ो ##। ##॥ ##ং ##অ ##আ ##ই ##উ ##এ ##ও ##ক ##খ ##গ ##চ ##ছ ##জ ##ট ##ড ##ণ ##ত ##থ ##দ ##ধ ##ন ##প ##ব ##ভ ##ম ##য ##র ##ল ##শ ##ষ ##স ##হ ##া ##ি ##ী ##ে ##க ##ச ##ட ##த ##ந ##ன ##ப ##ம ##ய ##ர ##ல ##ள ##வ ##ா ##ி ##ு ##ே ##ை ##ನ ##ರ ##ಾ ##ක ##ය ##ර ##ල ##ව ##ා ##ก ##ง ##ต ##ท ##น ##พ ##ม ##ย ##ร ##ล ##ว ##ส ##อ ##า ##เ ##་ ##། ##ག ##ང ##ད ##ན ##པ ##བ ##མ ##འ ##ར ##ལ ##ས ##မ ##ა ##ბ ##გ ##დ ##ე ##ვ ##თ ##ი ##კ ##ლ ##მ ##ნ ##ო ##რ ##ს ##ტ ##უ ##ᄀ ##ᄂ ##ᄃ ##ᄅ ##ᄆ ##ᄇ ##ᄉ ##ᄊ ##ᄋ ##ᄌ ##ᄎ ##ᄏ ##ᄐ ##ᄑ ##ᄒ ##ᅡ ##ᅢ ##ᅥ ##ᅦ ##ᅧ ##ᅩ ##ᅪ ##ᅭ ##ᅮ ##ᅯ ##ᅲ ##ᅳ ##ᅴ ##ᅵ ##ᆨ ##ᆫ ##ᆯ ##ᆷ ##ᆸ ##ᆼ ##ᴬ ##ᴮ ##ᴰ ##ᴵ ##ᴺ ##ᵀ ##ᵃ ##ᵇ ##ᵈ ##ᵉ ##ᵍ ##ᵏ ##ᵐ ##ᵒ ##ᵖ ##ᵗ ##ᵘ ##ᵣ ##ᵤ ##ᵥ ##ᶜ ##ᶠ ##‐ ##‑ ##‒ ##– ##— ##― ##‖ ##‘ ##’ ##‚ ##“ ##” ##„ ##† ##‡ ##• ##… ##‰ ##′ ##″ ##› ##‿ ##⁄ ##⁰ ##ⁱ ##⁴ ##⁵ ##⁶ ##⁷ ##⁸ ##⁹ ##⁻ ##ⁿ ##₅ ##₆ ##₇ ##₈ ##₉ ##₊ ##₍ ##₎ ##ₐ ##ₑ ##ₒ ##ₓ ##ₕ ##ₖ ##ₗ ##ₘ ##ₚ ##ₛ ##ₜ ##₤ ##₩ ##€ ##₱ ##₹ ##ℓ ##№ ##ℝ ##™ ##⅓ ##⅔ ##← ##↑ ##→ ##↓ ##↔ ##↦ ##⇄ ##⇌ ##⇒ ##∂ ##∅ ##∆ ##∇ ##∈ ##∗ ##∘ ##√ ##∞ ##∧ ##∨ ##∩ ##∪ ##≈ ##≡ ##≤ ##≥ ##⊂ ##⊆ ##⊕ ##⊗ ##⋅ ##─ ##│ ##■ ##▪ ##● ##★ ##☆ ##☉ ##♠ ##♣ ##♥ ##♦ ##♯ ##⟨ ##⟩ ##ⱼ ##⺩ ##⺼ ##⽥ ##、 ##。 ##〈 ##〉 ##《 ##》 ##「 ##」 ##『 ##』 ##〜 ##あ ##い ##う ##え ##お ##か ##き ##く ##け ##こ ##さ ##し ##す ##せ ##そ ##た ##ち ##っ ##つ ##て ##と ##な ##に ##ぬ ##ね ##の ##は ##ひ ##ふ ##へ ##ほ ##ま ##み ##む ##め ##も ##や ##ゆ ##よ ##ら ##り ##る ##れ ##ろ ##を ##ん ##ァ ##ア ##ィ ##イ ##ウ ##ェ ##エ ##オ ##カ ##キ ##ク ##ケ ##コ ##サ ##シ ##ス ##セ ##タ ##チ ##ッ ##ツ ##テ ##ト ##ナ ##ニ ##ノ ##ハ ##ヒ ##フ ##ヘ ##ホ ##マ ##ミ ##ム ##メ ##モ ##ャ ##ュ ##ョ ##ラ ##リ ##ル ##レ ##ロ ##ワ ##ン ##・ ##ー ##一 ##三 ##上 ##下 ##不 ##世 ##中 ##主 ##久 ##之 ##也 ##事 ##二 ##五 ##井 ##京 ##人 ##亻 ##仁 ##介 ##代 ##仮 ##伊 ##会 ##佐 ##侍 ##保 ##信 ##健 ##元 ##光 ##八 ##公 ##内 ##出 ##分 ##前 ##劉 ##力 ##加 ##勝 ##北 ##区 ##十 ##千 ##南 ##博 ##原 ##口 ##古 ##史 ##司 ##合 ##吉 ##同 ##名 ##和 ##囗 ##四 ##国 ##國 ##土 ##地 ##坂 ##城 ##堂 ##場 ##士 ##夏 ##外 ##大 ##天 ##太 ##夫 ##奈 ##女 ##子 ##学 ##宀 ##宇 ##安 ##宗 ##定 ##宣 ##宮 ##家 ##宿 ##寺 ##將 ##小 ##尚 ##山 ##岡 ##島 ##崎 ##川 ##州 ##巿 ##帝 ##平 ##年 ##幸 ##广 ##弘 ##張 ##彳 ##後 ##御 ##德 ##心 ##忄 ##志 ##忠 ##愛 ##成 ##我 ##戦 ##戸 ##手 ##扌 ##政 ##文 ##新 ##方 ##日 ##明 ##星 ##春 ##昭 ##智 ##曲 ##書 ##月 ##有 ##朝 ##木 ##本 ##李 ##村 ##東 ##松 ##林 ##森 ##楊 ##樹 ##橋 ##歌 ##止 ##正 ##武 ##比 ##氏 ##民 ##水 ##氵 ##氷 ##永 ##江 ##沢 ##河 ##治 ##法 ##海 ##清 ##漢 ##瀬 ##火 ##版 ##犬 ##王 ##生 ##田 ##男 ##疒 ##発 ##白 ##的 ##皇 ##目 ##相 ##省 ##真 ##石 ##示 ##社 ##神 ##福 ##禾 ##秀 ##秋 ##空 ##立 ##章 ##竹 ##糹 ##美 ##義 ##耳 ##良 ##艹 ##花 ##英 ##華 ##葉 ##藤 ##行 ##街 ##西 ##見 ##訁 ##語 ##谷 ##貝 ##貴 ##車 ##軍 ##辶 ##道 ##郎 ##郡 ##部 ##都 ##里 ##野 ##金 ##鈴 ##镇 ##長 ##門 ##間 ##阝 ##阿 ##陳 ##陽 ##雄 ##青 ##面 ##風 ##食 ##香 ##馬 ##高 ##龍 ##龸 ##fi ##fl ##! ##( ##) ##, ##- ##. ##/ ##: ##? ##~ ================================================ FILE: src/examples/tensorflow/huggingface_bert/huggingface_bert.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "e91cf83b", "metadata": {}, "source": [ "# Running Huggingface DistilBERT with TensorFlow-Neuron" ] }, { "cell_type": "markdown", "id": "71394e1e", "metadata": {}, "source": [ "In this tutorial you will compile and deploy DistilBERT version of HuggingFace 🤗 Transformers BERT for Inferentia using TensorFlow-Neuron. The full list of HuggingFace's pretrained BERT models can be found in the BERT section on this page https://huggingface.co/transformers/pretrained_models.html. you can also read about HuggingFace's pipeline feature here: https://huggingface.co/transformers/main_classes/pipelines.html\n", "\n", "This Jupyter notebook should be run on an instance which is inf1.6xlarge or larger, but in real life scenario the compilation should be done on a compute instance and the deployment on inf1 instance to save costs." ] }, { "cell_type": "markdown", "id": "828ef9bd", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "5becc549", "metadata": {}, "source": [ "To run this tutorial please follow the instructions for [TensorFlow-Neuron Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/tensorflow-neuron.html#setup-tensorflow-neuron) and the [Jupyter Notebook Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html) and set your kernel to \"Python (tensorflow-neuron)\" .\n", "\n", "Next, install some additional dependencies." ] }, { "cell_type": "code", "execution_count": null, "id": "ee1a3b84", "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install transformers==4.30.2\n", "!pip install ipywidgets" ] }, { "cell_type": "markdown", "id": "c301cfce", "metadata": {}, "source": [ "## Download From Huggingface and Compile for AWS-Neuron" ] }, { "cell_type": "code", "execution_count": null, "id": "92e8050d", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import tensorflow as tf\n", "import tensorflow_neuron as tfn\n", "from transformers import DistilBertTokenizer, TFDistilBertModel\n", "\n", "# Create a wrapper for the roberta model that will accept inputs as a list\n", "# instead of a dictionary. This will allow the compiled model to be saved\n", "# to disk with the model.save() fucntion.\n", "class DistilBertWrapper(tf.keras.Model):\n", " def __init__(self, model):\n", " super().__init__()\n", " self.model = model\n", " def __call__(self, example_inputs):\n", " return self.model({'input_ids' : example_inputs[0], 'attention_mask' : example_inputs[1]})\n", " \n", "\n", "tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')\n", "model = DistilBertWrapper(TFDistilBertModel.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english'))\n", "\n", "batch_size = 16\n", "\n", "# create example inputs with a batch size of 16\n", "text = [\"Paris is the of France.\"] * batch_size\n", "encoded_input = tokenizer(text, return_tensors='tf', padding='max_length', max_length=64)\n", "\n", "# turn inputs into a list\n", "example_input = [encoded_input['input_ids'], encoded_input['attention_mask']]\n", "\n", "#compile\n", "model_neuron = tfn.trace(model, example_input)\n", "\n", "print(\"Running on neuron:\", model_neuron(example_input))\n", "\n", "# save the model to disk to save recompilation time for next usage\n", "model_neuron.save('./distilbert-neuron-b16')" ] }, { "cell_type": "markdown", "id": "0f2e159a", "metadata": {}, "source": [ "## Run Basic Inference Benchmarking" ] }, { "cell_type": "code", "execution_count": null, "id": "ccf22e74", "metadata": { "scrolled": true }, "outputs": [], "source": [ "import numpy as np\n", "import concurrent.futures\n", "import time\n", "\n", "reloaded_neuron_model = tf.keras.models.load_model('./distilbert-neuron-b16')\n", "print(\"Reloaded model running on neuron:\", reloaded_neuron_model(example_input))\n", "\n", "num_threads = 4\n", "num_inferences = 1000\n", "\n", "latency_list = []\n", "def inference_with_latency_calculation(example_input):\n", " global latency_list\n", " start = time.time()\n", " result = reloaded_neuron_model(example_input)\n", " end = time.time()\n", " latency_list.append((end-start) * 1000)\n", " return result\n", "\n", "start = time.time()\n", "with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:\n", " futures = []\n", " for i in range(num_inferences):\n", " futures.append(executor.submit(inference_with_latency_calculation, example_input))\n", " for future in concurrent.futures.as_completed(futures):\n", " get_result = future.result()\n", "end = time.time()\n", "\n", "total_time = end - start\n", "throughput = (num_inferences * batch_size)/total_time\n", "\n", "print(f\"Throughput was {throughput} samples per second.\")\n", "print(f\"Latency p50 was {np.percentile(latency_list, 50)} ms\")\n", "print(f\"Latency p90 was {np.percentile(latency_list, 90)} ms\")\n", "print(f\"Latency p95 was {np.percentile(latency_list, 95)} ms\")\n", "print(f\"Latency p99 was {np.percentile(latency_list, 99)} ms\")\n", "assert(throughput >= 1930.0)" ] }, { "cell_type": "code", "execution_count": null, "id": "b31b82fc", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/tensorflow/k8s_bert_demo/Dockerfile.tfserving_example ================================================ From ubuntu:16.04 RUN apt-get update RUN apt-get install -y wget apt-transport-https ca-certificates awscli RUN echo "deb https://apt.repos.neuron.amazonaws.com xenial main" > /etc/apt/sources.list.d/neuron.list RUN wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | apt-key add - RUN apt-get update RUN apt-get install -y tensorflow-model-server-neuron ================================================ FILE: src/examples/tensorflow/k8s_bert_demo/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/tensorflow/k8s_bert_demo/bert_client.py ================================================ import numpy as np import grpc import tensorflow as tf from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc import time if __name__ == '__main__': channel = grpc.insecure_channel('localhost:9000') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = 'bert_mrpc_hc_gelus_b4_l24_0926_02' i = np.zeros([1, 128], dtype=np.int32) request.inputs['input_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape)) request.inputs['input_mask'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape)) request.inputs['segment_ids'].CopyFrom(tf.contrib.util.make_tensor_proto(i, shape=i.shape)) latencies = [] for i in range(100): start = time.time() result = stub.Predict(request) latencies.append(time.time() - start) print("Inference successful: {}".format(i)) print ("Ran {} inferences successfully. Latency average = {}".format(len(latencies), np.average(latencies))) ================================================ FILE: src/examples/tensorflow/k8s_bert_demo/bert_service.yml ================================================ --- kind: Service apiVersion: v1 metadata: name: inf-k8s-test labels: app: inf-k8s-test spec: ports: - name: http-tf-serving port: 8500 targetPort: 8500 - name: grpc-tf-serving port: 9000 targetPort: 9000 selector: app: inf-k8s-test role: master type: ClusterIP --- kind: Deployment apiVersion: apps/v1 metadata: name: inf-k8s-test labels: app: inf-k8s-test role: master spec: replicas: 1 # Number of desired replicas. Increase to desired number. selector: matchLabels: app: inf-k8s-test role: master template: metadata: labels: app: inf-k8s-test role: master spec: volumes: - name: sock emptyDir: {} containers: - name: inf-k8s-test image: tf-serving-ctr imagePullPolicy: IfNotPresent command: ["/bin/sh","-c"] # Pull model from s3, then start tensorflow_model_server_neuron with the model. args: - "aws s3 sync s3:///bert /tmp/bert && \ tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp//bert/" # Open grpc and rest API ports ports: - containerPort: 8500 - containerPort: 9000 # Informs tensorflow_model_server_neuron of UDS socket location env: - name: NEURON_RTD_ADDRESS value: unix:/sock/neuron.sock # Arbitrary resource requirements resources: limits: cpu: 4 memory: 4Gi requests: cpu: "1" memory: 1Gi # Shared volume mount, for UDS socket volumeMounts: - name: sock mountPath: /sock # Neuron-rtd container - name: neuron-rtd image: 790709498068.dkr.ecr.us-east-1.amazonaws.com/neuron-rtd:latest # neuron-rtd image. imagePullPolicy: IfNotPresent # Neuron-rtd required capabilities securityContext: capabilities: add: - SYS_ADMIN - IPC_LOCK # Shared volume mount, for UDS socket volumeMounts: - name: sock mountPath: /sock resources: limits: hugepages-2Mi: 256Mi # configure to 256 * desired number of Inferentia devices. aws.amazon.com/neuron: 1 # desired number of Inferentia devices. requests: memory: 1024Mi # Desired amount of memory. Should be larger than hugepages-2Mi limit. ================================================ FILE: src/examples/tensorflow/keras_resnet50/LICENSE ================================================ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: src/examples/tensorflow/keras_resnet50/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/tensorflow/keras_resnet50/fp32tofp16.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import re import argparse import tensorflow as tf import numpy as np from google.protobuf import text_format from tensorflow.core.framework import graph_pb2 from tensorflow.core.framework import node_def_pb2 from tensorflow.python.platform import gfile from tensorflow.core.framework import attr_value_pb2 from tensorflow.python.framework import tensor_util def ConvertFP32ToOther(graphdef): """Converts an FP32 network by casting all constants (weights) to a lower precision floating point type (FP16) and updating the dtypes everywhere.""" cast_type = "float16" sess = tf.Session(graph=tf.import_graph_def(graphdef)) output_graph_def = graph_pb2.GraphDef() dummy_tensor = sess.run(tf.constant([0.1])) dummy_tensor_proto = tensor_util.make_tensor_proto(dummy_tensor, \ dtype=cast_type, shape=dummy_tensor.shape) dummy_tensor32 = sess.run(tf.constant([0.1])) dummy_tensor_proto32 = tensor_util.make_tensor_proto(dummy_tensor, \ dtype=tf.float32, shape=dummy_tensor.shape) dt_float_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto32.dtype) dt_half_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto.dtype) for node in graphdef.node: output_node = node_def_pb2.NodeDef() output_node.CopyFrom(node) if (node.op == "Const"): if (node.attr["dtype"] == dt_float_type_attr): a = tensor_util.MakeNdarray(node.attr["value"].tensor) a = tf.cast(a, cast_type) a = sess.run(a) output_node.attr["dtype"].CopyFrom(dt_half_type_attr) output_node.attr["value"].CopyFrom( attr_value_pb2.AttrValue( tensor=tensor_util.make_tensor_proto(a,\ dtype=cast_type, shape=a.shape))) else: if ("T" in node.attr.keys()): if (output_node.attr["T"] == dt_float_type_attr): output_node.attr["T"].CopyFrom(dt_half_type_attr) if ("Tparams" in node.attr.keys()): if (output_node.attr["Tparams"] == dt_float_type_attr): output_node.attr["Tparams"].CopyFrom(dt_half_type_attr) if ("dtype" in node.attr.keys()): if (node.attr["dtype"] == dt_float_type_attr): output_node.attr["dtype"].CopyFrom(dt_half_type_attr) if ("SrcT" in node.attr.keys()): if (node.attr["SrcT"] == dt_float_type_attr): output_node.attr["SrcT"].CopyFrom(dt_half_type_attr) if ("DstT" in node.attr.keys()): if (node.attr["DstT"] == dt_float_type_attr): output_node.attr["DstT"].CopyFrom(dt_half_type_attr) output_graph_def.node.extend([output_node]) return output_graph_def def load_graph(model_file): graph_def = tf.GraphDef() with open(model_file, "rb") as f: graph_def.ParseFromString(f.read()) return graph_def if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--graph", help="graph/model to be executed", required=True) parser.add_argument("--out_graph", help="graph/model to be generated", required=True) args = parser.parse_args() graph_f32 = load_graph(args.graph) graph_f16 = ConvertFP32ToOther(graph_f32) output_xformed_graph_name = args.out_graph with gfile.GFile(output_xformed_graph_name, "wb") as f: f.write(graph_f16.SerializeToString()) #with gfile.GFile(output_xformed_graph_name+"txt", 'w') as f: # f.write(text_format.MessageToString(graph_f16)) ================================================ FILE: src/examples/tensorflow/keras_resnet50/full_sweep ================================================ #!/usr/bin/env bash ########################################################################## # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: MIT-0 ########################################################################## echo "" > full_sweep.log echo "" > full_sweep_results.txt results=() for b in $(seq 1 5); do for i in 1 2 4 8 12 16; do python pb2sm_compile.py --batch_size=$b --neuroncore-pipeline-cores=$i | tee -a full_sweep.log; results[$b]+=", "`tail -1 full_sweep.log` done done head="batch" for i in 1 2 4 8 12 16; do head+=", nc${i}" done echo $head | tee -a full_sweep_results.txt for b in $(seq 1 5); do echo $b${results[$b]} | tee -a full_sweep_results.txt done ================================================ FILE: src/examples/tensorflow/keras_resnet50/gen_resnet50_keras.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import re import argparse import tensorflow as tf import numpy as np from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from google.protobuf import text_format import tensorflow.python.saved_model if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--fp16", action='store_true', help="use float16 parameters and operations") args = parser.parse_args() # set Keras global configurations tf.keras.backend.set_learning_phase(0) tf.keras.backend.set_image_data_format('channels_last') if (args.fp16): float_type = 'float16' float_type2 = 'fp16' else: float_type = 'float32' float_type2 = 'fp32' tf.keras.backend.set_floatx(float_type) # load pre-trained model using Keras model_name = 'resnet50_%s_keras'%float_type2 model = ResNet50(weights='imagenet') # various save files frozen_file = model_name + '.pb' opt_file = model_name + '_opt.pb' # obtain parameters model_input = model.input.name.replace(':0', '') model_output = model.output.name.replace(':0', '') batch, height, width, channels = model.input.shape print ("model, frozen file, optimized file, input size, input node, output node,") print ("%s, %s, %s, %dx%dx%d, %s, %s" %(model_name, frozen_file, opt_file, width, height, channels, model_input, model_output) ) # obtain the TF session sess = tf.compat.v1.keras.backend.get_session() # save checkpoint files for freeze_graph ckpt_file = '/tmp/' + model_name + '/' + model_name + '.ckpt' graph_file = '/tmp/' + model_name + '/' + model_name + '.pb' tf.compat.v1.train.Saver().save(sess, ckpt_file) tf.io.write_graph(sess.graph.as_graph_def(), logdir='.', name=graph_file, as_text=False) print(model_output) with tf.compat.v1.Session(graph=tf.Graph()) as sess: saver = tf.compat.v1.train.import_meta_graph(ckpt_file + '.meta') saver.restore(sess, ckpt_file) output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants( sess, tf.compat.v1.get_default_graph().as_graph_def(), [model_output]) output_graph_def = tf.compat.v1.graph_util.remove_training_nodes( output_graph_def, protected_nodes=[model_output]) with open(frozen_file, 'wb') as f: f.write(output_graph_def.SerializeToString()) ================================================ FILE: src/examples/tensorflow/keras_resnet50/infer_resnet50_keras.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import os import time import shutil import argparse import numpy as np import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications import resnet50 parser = argparse.ArgumentParser() parser.add_argument("--graph", default="resnet50_fp32_keras.pb", help="Graph to use for inference", required=True) parser.add_argument("--input", default="input_1", help="Input of graph") parser.add_argument("--output", default="probs/Softmax", help="Output of graph") args = parser.parse_args() tf.keras.backend.set_image_data_format('channels_last') def pb_to_saved_model(pb_path, input_names, output_names, model_dir): graph_def = tf.compat.v1.GraphDef() graph_def.ParseFromString(open(pb_path, 'rb').read()) with tf.compat.v1.Session(graph=tf.Graph()) as sess: tf.import_graph_def(graph_def, name='') inputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in input_names.items()} outputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in output_names.items()} tf.saved_model.simple_save(sess, model_dir, inputs, outputs) SAVED_MODEL_DIR = './rn50_fp16' shutil.rmtree(SAVED_MODEL_DIR, ignore_errors=True) input_tname="{}:0".format(args.input) output_tname="{}:0".format(args.output) pb_to_saved_model(args.graph, {input_tname : input_tname}, {output_tname : output_tname}, SAVED_MODEL_DIR) # Create input from image img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224)) img_arr = image.img_to_array(img_sgl) img_arr2 = np.expand_dims(img_arr, axis=0) img_arr3 = resnet50.preprocess_input(np.repeat(img_arr2, 1, axis=0)) # Load model predictor_host = tf.contrib.predictor.from_saved_model(SAVED_MODEL_DIR) # Run inference model_feed_dict={'input_1:0': img_arr3} infa_rslts = predictor_host(model_feed_dict); print(resnet50.decode_predictions(infa_rslts[output_tname], top=5)[0]) ================================================ FILE: src/examples/tensorflow/keras_resnet50/infer_resnet50_keras_loadtest.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import shutil import tensorflow as tf import os import time from concurrent import futures import numpy as np import statistics import argparse import requests import tensorflow as tf import tensorflow.neuron from tensorflow.keras.preprocessing import image from tensorflow.keras.applications import resnet50 import warnings import subprocess import json tf.keras.backend.set_image_data_format('channels_last') arg_parser = argparse.ArgumentParser() arg_parser.add_argument('--batch_size', type=int, default=5, choices=range(1, 6), help='Batch size of model as it was compiled') arg_parser.add_argument('--neuroncore-pipeline-cores', type=int, default=1, choices=range(1, 17), help='Number of NeuronCores limit for each partitioned graph') args = arg_parser.parse_args() neuron_ls_output = subprocess.run(["neuron-ls","-j"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, encoding="utf-8") neuron_ls_json = json.loads(neuron_ls_output.stdout) avail_neuroncores = neuron_ls_json[0]["nc_count"] USER_BATCH_SIZE = 2 * args.batch_size NUM_LOOPS_PER_THREAD = 400 COMPILED_MODEL_DIR = "./rn50_fp16_compiled_b" + str(args.batch_size) + "_nc" + str(args.neuroncore_pipeline_cores) + "/1" # Ensure there's enough buffer capacity to hold in-flight requests in runtime NUM_INFERS_IN_FLIGHT = args.neuroncore_pipeline_cores + 3 os.environ['NEURON_MAX_NUM_INFERS'] = str(NUM_INFERS_IN_FLIGHT) num_groups = avail_neuroncores // args.neuroncore_pipeline_cores group_sizes = [str(args.neuroncore_pipeline_cores)] * num_groups warnings.warn("NEURONCORE_GROUP_SIZES is being deprecated, if your application is using NEURONCORE_GROUP_SIZES please \ see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/deprecation.html#announcing-end-of-support-for-neuroncore-group-sizes \ for more details.", DeprecationWarning) os.environ['NEURONCORE_GROUP_SIZES'] = ','.join(group_sizes) # Create input from image img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224)) img_arr = image.img_to_array(img_sgl, dtype='float16') img_arr2 = np.expand_dims(img_arr, axis=0) img_arr3 = np.repeat(img_arr2, USER_BATCH_SIZE, axis=0) # Load model NUM_THREADS_PER_PREDICTOR = args.neuroncore_pipeline_cores pred_list = [tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR) for _ in range(num_groups)] pred_list = pred_list * NUM_THREADS_PER_PREDICTOR num_threads = len(pred_list) num_infer_per_thread = [] tot_latency_per_thread = [] thread_active = [] latency_list = [] for i in range(num_threads): num_infer_per_thread.append(0) tot_latency_per_thread.append(0) thread_active.append(0) def one_thread(pred, model_feed_dict, index): global num_infer_per_thread thread_active[index] = 1 for i in range(NUM_LOOPS_PER_THREAD): start = time.time() result = pred(model_feed_dict) delta = time.time() - start latency_list.append(delta) # skip first warmup run if i > 0: tot_latency_per_thread[index] += delta num_infer_per_thread[index] += USER_BATCH_SIZE #print(num_infer_per_thread[index]) thread_active[index] = 0 def current_throughput(): global num_infer_per_thread global args iteration = 0 num_infer = 0 last_num_infer = num_infer throughput_stats = [] print("Run with {} NeuronCores".format(avail_neuroncores)) print("NEURON_MAX_NUM_INFERS (env): " + os.environ.get('NEURON_MAX_NUM_INFERS', '')) print("NEURONCORE_GROUP_SIZES (env): " + os.environ.get('NEURONCORE_GROUP_SIZES', '')) print("NUM THREADS: ", num_threads) print("NUM_LOOPS_PER_THREAD: ", NUM_LOOPS_PER_THREAD) print("USER_BATCH_SIZE: ", USER_BATCH_SIZE) while num_infer < NUM_LOOPS_PER_THREAD * USER_BATCH_SIZE * num_threads: num_infer = 0 total_thread_cnt = 0 for i in range(num_threads): num_infer = num_infer + num_infer_per_thread[i] total_thread_cnt = total_thread_cnt + thread_active[i] current_num_infer = num_infer throughput = current_num_infer - last_num_infer #print('Active threads: {}, current throughput: {} images/sec'.format(total_thread_cnt, throughput)) # track throughput over time, after warmup if iteration > 4 and total_thread_cnt == num_threads: throughput_stats.append(throughput) last_num_infer = current_num_infer iteration += 1 time.sleep(1.0) time.sleep(1.0) tot_latency = 0 for i in range(num_threads): tot_latency += tot_latency_per_thread[i] # adjust loop count to remove the first warmup run print("Throughput values collected:") print(throughput_stats) print("\nCompiled batch size {:}, user batch size {:}, Throughput stats (images/sec): Avg={:0.0f} Max={:}, Latency stats (msec/user-batch): P50={:0.1f} P90={:0.1f} P95={:0.1f} P99={:0.1f} \n".format( args.batch_size, USER_BATCH_SIZE, np.mean(throughput_stats), np.max(throughput_stats), (np.percentile(latency_list, 50))*1000.0, (np.percentile(latency_list, 90))*1000.0, (np.percentile(latency_list, 95))*1000.0, (np.percentile(latency_list, 99))*1000.0) ) print("\n*** Compiled batch size {}, user batch size {}, num NeuronCores {} (input shape: {}, saved model dir: {}) ***\n".format(args.batch_size, USER_BATCH_SIZE, args.neuroncore_pipeline_cores, img_arr3.shape, COMPILED_MODEL_DIR)) # Run inference model_feed_dict={'input_1:0': img_arr3} executor = futures.ThreadPoolExecutor(max_workers = num_threads + 1) executor.submit(current_throughput) for i,pred in enumerate(pred_list): executor.submit(one_thread, pred, model_feed_dict, i) ================================================ FILE: src/examples/tensorflow/keras_resnet50/keras_resnet50.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "spectacular-payroll", "metadata": {}, "source": [ "# Tensorflow ResNet 50 Optimization Tutorial" ] }, { "cell_type": "markdown", "id": "equivalent-stack", "metadata": {}, "source": [ "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "id": "alpine-aside", "metadata": {}, "source": [ "## Introduction: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we provide three main sections:\n", "\n", "* Take a Resnet 50 model and perform optimizations on it\n", "\n", "* Compile the model with different batch sizes and Neuroncore Group sizes (read about Neuroncore Group sizes here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/nrt-theory-of-operation.html#neuron-core-group)\n", "\n", "* Run inference on our multiple compiled models to see which has the best throughput\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "id": "opened-forty", "metadata": {}, "source": [ "## Install Dependencies" ] }, { "cell_type": "code", "execution_count": null, "id": "meaningful-algebra", "metadata": {}, "outputs": [], "source": [ "!pip install pillow requests # Necessary for loading images\n", "!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n", "!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "id": "remarkable-exercise", "metadata": {}, "source": [ "## Compile" ] }, { "cell_type": "markdown", "id": "consecutive-right", "metadata": {}, "source": [ "The following example shows how to compile a FP16 ResNet50 network using various batching parameters to find the optimal solution. On inf1.6xlarge, run through the following steps to get a optimized Resnet 50 model.\n", "First, extract Keras ResNet50 FP32 (resnet50_fp32_keras.pb will be generated):" ] }, { "cell_type": "code", "execution_count": null, "id": "vertical-finland", "metadata": {}, "outputs": [], "source": [ "import re\n", "import argparse\n", "import tensorflow as tf\n", "import numpy as np\n", "\n", "from tensorflow.keras.applications.resnet50 import ResNet50\n", "from tensorflow.keras.preprocessing import image\n", "from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions\n", "\n", "from google.protobuf import text_format\n", "import tensorflow.python.saved_model\n", "\n", "# set Keras global configurations\n", "tf.keras.backend.set_learning_phase(0)\n", "tf.keras.backend.set_image_data_format('channels_last')\n", "\n", "float_type = 'float32'\n", "float_type2 = 'fp32'\n", "tf.keras.backend.set_floatx(float_type)\n", "\n", "# load pre-trained model using Keras\n", "model_name = 'resnet50_%s_keras'%float_type2\n", "model = ResNet50(weights='imagenet')\n", "\n", "# various save files\n", "frozen_file = model_name + '.pb'\n", "opt_file = model_name + '_opt.pb'\n", "\n", "# obtain parameters\n", "model_input = model.input.name.replace(':0', '')\n", "model_output = model.output.name.replace(':0', '')\n", "batch, height, width, channels = model.input.shape\n", "\n", "print (\"model, frozen file, optimized file, input size, input node, output node,\")\n", "print (\"%s, %s, %s, %dx%dx%d, %s, %s\" %(model_name, frozen_file, opt_file, width, height, channels, model_input, model_output) ) \n", "\n", "# obtain the TF session\n", "sess = tf.compat.v1.keras.backend.get_session()\n", "\n", "# save checkpoint files for freeze_graph\n", "ckpt_file = '/tmp/' + model_name + '/' + model_name + '.ckpt'\n", "graph_file = '/tmp/' + model_name + '/' + model_name + '.pb'\n", "tf.compat.v1.train.Saver().save(sess, ckpt_file)\n", "tf.io.write_graph(sess.graph.as_graph_def(), logdir='.', name=graph_file, as_text=False)\n", "\n", "print(model_output)\n", "with tf.compat.v1.Session(graph=tf.Graph()) as sess:\n", " saver = tf.compat.v1.train.import_meta_graph(ckpt_file + '.meta')\n", " saver.restore(sess, ckpt_file)\n", " output_graph_def = tf.compat.v1.graph_util.convert_variables_to_constants(\n", " sess, tf.compat.v1.get_default_graph().as_graph_def(), [model_output])\n", " output_graph_def = tf.compat.v1.graph_util.remove_training_nodes(\n", " output_graph_def, protected_nodes=[model_output])\n", " with open(frozen_file, 'wb') as f:\n", " f.write(output_graph_def.SerializeToString())" ] }, { "cell_type": "markdown", "id": "romance-cyprus", "metadata": {}, "source": [ "Optimize the extracted Keras ResNet50 FP32 graph for inference before casting (resnet50_fp32_keras_opt.pb will be generated) with the following transformations to the graph:\n", "\n", "* Remove Identity and CheckNumerics nodes\n", "* Fold FusedBatchNorm constants into previous Conv2D weights\n", "* Fold other constants\n", "* Strip unused nodes\n", "* Sort by execution order" ] }, { "cell_type": "code", "execution_count": null, "id": "higher-grant", "metadata": {}, "outputs": [], "source": [ "import copy\n", "import string\n", "\n", "from google.protobuf import text_format\n", "from tensorflow.core.framework import node_def_pb2\n", "from tensorflow.core.framework import attr_value_pb2\n", "from tensorflow.python.framework import tensor_util\n", "from tensorflow.tools.graph_transforms import TransformGraph\n", "\n", "def clear_input(node):\n", " for i in range(len(node.input)):\n", " node.input.pop()\n", "\n", "def replace_name(node, name):\n", " node.name = name\n", " \n", "def replace_input(node, input_name, new_name):\n", " # node.input.replace(input_name, new_name)\n", " temp = []\n", " for i in node.input:\n", " temp.extend([new_name if i == input_name else i])\n", " clear_input(node)\n", " for i in temp:\n", " node.input.extend([i])\n", "\n", "def swap_names(node1, node2):\n", " temp = node2.name\n", " node2.name = node1.name\n", " node1.name = temp\n", "\n", "def get_const_node(const_node_name, const_by_name):\n", " name = re.sub(\"/read$\", \"\", const_node_name)\n", " return const_by_name[name]\n", "\n", "def get_const_ndarray(const_node_name, const_by_name):\n", " name = re.sub(\"/read$\", \"\", const_node_name)\n", " node = const_by_name[name]\n", " return tf.make_ndarray(node.attr.get(\"value\").tensor)\n", "\n", "def adjust_bias_values(bias_node, fbn_node, const_by_name):\n", " bias_val = get_const_ndarray(bias_node.input[1], const_by_name) \n", " gamma_val = get_const_ndarray(fbn_node.input[1], const_by_name) \n", " mean_val = get_const_ndarray(fbn_node.input[3], const_by_name) \n", " variance_val = get_const_ndarray(fbn_node.input[4], const_by_name) \n", " new_bias = bias_val * gamma_val / np.sqrt(variance_val)\n", " new_tensor = tensor_util.make_tensor_proto(new_bias, new_bias.dtype, new_bias.shape)\n", " bias_const_node = get_const_node(bias_node.input[1], const_by_name)\n", " bias_const_node.attr[\"value\"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor))\n", "\n", "def MoveBiasAddAfterFusedBatchNorm(graphdef):\n", " \"\"\"fold_batch_norm function of TransformGraph is unable to fold Keras ResNet50\n", " because of BiasAdd between Conv2D and FusedBatchNorm (BiasAdd is not needed\n", " if FusedBatchNorm is used, but it exists in Keras ResNet50). Here, we \n", " move BiasAdd to after FusedBatchNorm, and adjust bias value by gamma/sqrt(variance).\n", " \"\"\"\n", " sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef))\n", " output_graph_def = tf.compat.v1.GraphDef()\n", " node_by_name = {}\n", " const_by_name = {}\n", " for node in graphdef.node:\n", " # Hack: use FusedBatchNormV2 so fold_batch_norm can recognize\n", " if node.op == \"FusedBatchNormV3\":\n", " node.op = \"FusedBatchNorm\"\n", " del(node.attr[\"U\"])\n", " #import pdb; pdb.set_trace()\n", " copied_node = node_def_pb2.NodeDef()\n", " copied_node.CopyFrom(node)\n", " node_by_name[node.name] = copied_node\n", " skip_add_node = False\n", " # Switch Mul/BiasAdd in Keras RN50 so fold_batch_norm transform would work\n", " if node.op == \"Const\":\n", " const_by_name[node.name] = copied_node \n", " elif node.op.startswith(\"FusedBatchNorm\"):\n", " inputs = node.input\n", " for i in inputs:\n", " input_node = node_by_name[i]\n", " if input_node.op == \"BiasAdd\":\n", " output_graph_def.node.remove(input_node)\n", " input_node_input0 = input_node.input[0]\n", " # Adjust bias values (multiply by scale/sqrt(variance))\n", " adjust_bias_values(input_node, node, const_by_name)\n", " # Hack: swap names to avoid changing input of activation\n", " swap_names(copied_node, input_node)\n", " # Fix inputs for these two ops\n", " replace_input(copied_node, i, input_node_input0)\n", " replace_input(input_node, input_node_input0, copied_node.name)\n", " # Fix order in node list\n", " output_graph_def.node.extend([copied_node])\n", " output_graph_def.node.extend([input_node])\n", " skip_add_node = True\n", " # Add maybe-modified nodes if not already done\n", " if not skip_add_node:\n", " output_graph_def.node.extend([copied_node])\n", " return output_graph_def\n", "\n", "def FoldFusedBatchNorm(graph_def):\n", " \"\"\"Optimize training graph for inference:\n", " - Remove Identity and CheckNumerics nodes\n", " - Fold FusedBatchNorm constants into previous Conv2D weights\n", " - Fold other constants\n", " - Strip unused nodes\n", " - Sort by execution order\n", " \"\"\"\n", " transformed_graph_def = TransformGraph (\n", " graph_def,\n", " ['input_1'],\n", " ['probs/Softmax'],\n", " [\n", " 'add_default_attributes',\n", " 'remove_nodes(op=Identity, op=CheckNumerics)',\n", " 'fold_constants(ignore_errors=true)',\n", " 'fold_batch_norms',\n", " 'fold_old_batch_norms',\n", " 'strip_unused_nodes',\n", " 'sort_by_execution_order',\n", " ])\n", " return transformed_graph_def\n", "\n", "def load_graph(model_file):\n", " graph_def = tf.compat.v1.GraphDef()\n", "\n", " with open(model_file, \"rb\") as f:\n", " graph_def.ParseFromString(f.read())\n", " return graph_def\n", "\n", "\n", "graph_orig = load_graph('resnet50_fp32_keras.pb')\n", "graph_mod = MoveBiasAddAfterFusedBatchNorm(graph_orig)\n", "graph_mod2 = FoldFusedBatchNorm(graph_mod)\n", "with tf.io.gfile.GFile('resnet50_fp32_keras_opt.pb', \"wb\") as f:\n", " f.write(graph_mod2.SerializeToString())" ] }, { "cell_type": "markdown", "id": "corresponding-acquisition", "metadata": {}, "source": [ "Convert full graph to FP16 (resnet50_fp16_keras_opt.pb will be generated.\n", "This will take about a minute." ] }, { "cell_type": "code", "execution_count": null, "id": "detected-training", "metadata": {}, "outputs": [], "source": [ "from tensorflow.core.framework import graph_pb2\n", "from tensorflow.python.platform import gfile\n", "\n", "def ConvertFP32ToOther(graphdef):\n", " \"\"\"Converts an FP32 network by casting all constants (weights) to a lower\n", " precision floating point type (FP16) and updating the dtypes\n", " everywhere.\"\"\"\n", " cast_type = \"float16\"\n", " sess = tf.Session(graph=tf.import_graph_def(graphdef))\n", " output_graph_def = graph_pb2.GraphDef()\n", " dummy_tensor = sess.run(tf.constant([0.1]))\n", " dummy_tensor_proto = tensor_util.make_tensor_proto(dummy_tensor, \\\n", " dtype=cast_type, shape=dummy_tensor.shape)\n", " dummy_tensor32 = sess.run(tf.constant([0.1]))\n", " dummy_tensor_proto32 = tensor_util.make_tensor_proto(dummy_tensor, \\\n", " dtype=tf.float32, shape=dummy_tensor.shape)\n", " dt_float_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto32.dtype)\n", " dt_half_type_attr = attr_value_pb2.AttrValue(type=dummy_tensor_proto.dtype)\n", " for node in graphdef.node:\n", " output_node = node_def_pb2.NodeDef()\n", " output_node.CopyFrom(node)\n", " if (node.op == \"Const\"):\n", " if (node.attr[\"dtype\"] == dt_float_type_attr):\n", " a = tensor_util.MakeNdarray(node.attr[\"value\"].tensor)\n", " a = tf.cast(a, cast_type)\n", " a = sess.run(a)\n", " output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n", " output_node.attr[\"value\"].CopyFrom(\n", " attr_value_pb2.AttrValue(\n", " tensor=tensor_util.make_tensor_proto(a,\\\n", " dtype=cast_type, shape=a.shape)))\n", " else:\n", " if (\"T\" in node.attr.keys()):\n", " if (output_node.attr[\"T\"] == dt_float_type_attr):\n", " output_node.attr[\"T\"].CopyFrom(dt_half_type_attr)\n", " if (\"Tparams\" in node.attr.keys()):\n", " if (output_node.attr[\"Tparams\"] == dt_float_type_attr):\n", " output_node.attr[\"Tparams\"].CopyFrom(dt_half_type_attr)\n", " if (\"dtype\" in node.attr.keys()):\n", " if (node.attr[\"dtype\"] == dt_float_type_attr):\n", " output_node.attr[\"dtype\"].CopyFrom(dt_half_type_attr)\n", " if (\"SrcT\" in node.attr.keys()):\n", " if (node.attr[\"SrcT\"] == dt_float_type_attr):\n", " output_node.attr[\"SrcT\"].CopyFrom(dt_half_type_attr)\n", " if (\"DstT\" in node.attr.keys()):\n", " if (node.attr[\"DstT\"] == dt_float_type_attr):\n", " output_node.attr[\"DstT\"].CopyFrom(dt_half_type_attr)\n", " output_graph_def.node.extend([output_node])\n", " return output_graph_def\n", "\n", "def load_graph(model_file):\n", " graph_def = tf.GraphDef()\n", "\n", " with open(model_file, \"rb\") as f:\n", " graph_def.ParseFromString(f.read())\n", "\n", " return graph_def\n", "\n", "graph_f32 = load_graph('resnet50_fp32_keras_opt.pb')\n", "graph_f16 = ConvertFP32ToOther(graph_f32)\n", "output_xformed_graph_name = 'resnet50_fp16_keras_opt.pb'\n", "with gfile.GFile(output_xformed_graph_name, \"wb\") as f:\n", " f.write(graph_f16.SerializeToString())\n" ] }, { "cell_type": "markdown", "id": "correct-travel", "metadata": {}, "source": [ "Run the compilation script to sweep through various batch sizes up to 5 and several NeuronCore Group sizes up to 16. The script calls the compilation script pb2sm_compile.py which tries to perform compilation. Some error messages are expected due to known issues (see Known Issues section in the tutorial). If you run all the configurations it will take about 45 minutes." ] }, { "cell_type": "code", "execution_count": null, "id": "shared-ratio", "metadata": {}, "outputs": [], "source": [ "%%bash\n", "#!/usr/bin/env bash\n", "\n", "echo \"\" > full_sweep.log\n", "echo \"\" > full_sweep_results.txt\n", "\n", "results=()\n", "for b in $(seq 1 5); do \n", " for i in 1 2 4 8 12 16; do \n", " python pb2sm_compile.py --batch_size=$b --neuroncore-pipeline-cores=$i | tee -a full_sweep.log;\n", " results[$b]+=\", \"`tail -1 full_sweep.log`\n", " done\n", "done\n", "\n", "head=\"batch\"\n", "for i in 1 2 4 8 12 16; do\n", " head+=\", nc${i}\"\n", "done \n", "echo $head | tee -a full_sweep_results.txt\n", "for b in $(seq 1 5); do \n", " echo $b${results[$b]} | tee -a full_sweep_results.txt\n", "done" ] }, { "cell_type": "markdown", "id": "attached-austin", "metadata": {}, "source": [ "You should see some output like this:\n", "```\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "*** Batch size 1, num NeuronCores 2 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc2) ***\n", "\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "*** Batch size 1, num NeuronCores 4 (input shape: (1, 224, 224, 3), saved model dir: rn50_fp16_compiled_b1_nc4) ***\n", "\n", "INFO: Compilation finished in 95 seconds with 99.5% operations placed on Inferentia\n", "\n", "1\n", "\n", "... (outputs removed)\n", "\n", "*** Batch size 5, num NeuronCores 16 (input shape: (5, 224, 224, 3), saved model dir: rn50_fp16_compiled_b5_nc16) ***\n", "\n", "ERROR: Compilation finished in 120 seconds with less than 50% operations placed on Inferentia (0.0%)\n", "\n", "INFO: Retry compilation without static weights\n", "\n", "ERROR: Retry compilation finished in 137 seconds with less than 50% operations placed on Inferentia (0.0%)\n", "\n", "0\n", "\n", "The file full_sweep_results.txt shows a summary of the sweep results with latest Neuron 1/27/20 release (0 means compilation unsuccessful and 0 ops mapped to Inferentia, 1 means most ops mapped to Inferentia and non-static weights, 2 means most ops mapped to Inferentia and using static weights):\n", "\n", "batch, nc1, nc2, nc4, nc8, nc12, nc16\n", "1, 1, 1, 1, 2, 2, 2\n", "2, 1, 1, 0, 1, 2, 2\n", "3, 1, 1, 1, 1, 1, 1\n", "4, 1, 1, 0, 1, 1, 1\n", "5, 1, 1, 0, 0, 0, 0\n", "```\n" ] }, { "cell_type": "markdown", "id": "surprised-abortion", "metadata": {}, "source": [ "## Inference" ] }, { "cell_type": "markdown", "id": "departmental-surprise", "metadata": {}, "source": [ "Run inference over different batch sizes and Neuroncore groups to obtain throughput and latency results for ResNet50. To apply dynamic batching, the user batch size is set to 10x the compiled batch size, in order to keep input queue full and to amortize framework-to-Neuron overhead.\n", "\n", "Note: The results are based on the Neuron v1.12.2 (Mar 4th 2021) release. These will continue improve as we increase Neuron performance.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "requested-inspiration", "metadata": {}, "outputs": [], "source": [ "!cd ~/aws-neuron-sdk/src/examples/tensorflow/keras_resnet50/\n", "!echo \"\" > batch.log\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=1 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=2 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=4 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=8 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=12 | tee -a batch.log; done\n", "!for i in $(seq 1 5); do python infer_resnet50_keras_loadtest.py --batch_size=$i --neuroncore-pipeline-cores=16 | tee -a batch.log; done" ] }, { "cell_type": "markdown", "id": "split-genesis", "metadata": {}, "source": [ "The file batch.log now contains the results for each batch size. We can look at the throughput values to get an idea of which models are performing well. The output should look something like this:\n", "\n", "The model best model configuration for throughput (if you run on an Inf1.6xlarge as suggested in the tutorial) is batch size 5 NeuronCore group size 2. Increasing batch size usually helps to increase throughput (up to a certain extent)." ] }, { "cell_type": "markdown", "id": "filled-township", "metadata": {}, "source": [ "```\n", "*** Compiled batch size 5, user batch size 10, num NeuronCores 2 (input shape: (10, 224, 224, 3), saved model dir: ./rn50_fp16_compiled_b5_nc2/1) ***\n", "\n", "Instance type inf1.6xlarge with 16 NeuronCores\n", "NEURON_MAX_NUM_INFERS (env): 5\n", "NEURONCORE_GROUP_SIZES (env): 2,2,2,2,2,2,2,2\n", "NUM THREADS: 16\n", "NUM_LOOPS_PER_THREAD: 400\n", "USER_BATCH_SIZE: 10\n", "Throughput values collected:\n", "[10680, 10700, 10660]\n", "\n", "(rest of outputs removed)\n", "```" ] }, { "cell_type": "markdown", "id": "189c4f0e-1a4e-4067-921f-95449c45dedd", "metadata": {}, "source": [ "## Known Issues\n", "\n", "### Unable to compile with batch and num NeuronCores combination\n", "\n", "For some combination of batch and number of NeuronCores setting, you may\n", "see an internal compiler error as below. Please see the sweep result\n", "above for Neuron 1/27/20 release. Furthermore, if using auto-casting to\n", "bfloat16 from FP32 network and batch size is larger than 1 would result\n", "in the same error.\n", "\n", "\n", "```bash\n", "\n", "INFO:tensorflow:fusing subgraph neuron_op_a73aed4b95ca5d5b with neuron-cc; log file is at /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neuron-cc.log\n", " WARNING:tensorflow:Failed to fuse subgraph neuron_op_a73aed4b95ca5d5b with '/home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config \"{\\\"inputs\\\": {\\\"input_10/_0:0\\\": [[6, 224, 224, 3], \\\"float16\\\"]}, \\\"outputs\\\": [\\\"probs/Softmax:0\\\"]}\" --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True'\n", " WARNING:tensorflow:neuron-cc error message:\n", " WARNING:tensorflow:01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: An Internal Compiler Error has occurred\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: ***************************************************************\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Please contact Customer Support and provide the following details.\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error message: Non-zero exit status (134) for command: /home/ubuntu/test_venv/lib/python3.6/site-packages/neuroncc/starfish/bin/list_sch --hhir hh-tr-external-move.json --verbose 0 --sb_size 120 --arith_intensity_target 2300 --sb_watermark_low 0.250000 --sb_watermark_high 0.750000 --sb_size_tol 1 --alloc simple1 --alloc_opt --depth_diff 0.100000 --verbose_start_cycle 0 --tt_dist --mm_meet_cnt 1 --load_speed_factor 0.300000 --schir sch_tmp.json --spill_depth_limit 5 --spill_dis --true_dep --mm_order --batching_en --rematerialization_en\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error class: CompilerInternalError\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Error location: job.Scheduler.3\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Command line: /home/ubuntu/test_venv/bin/neuron-cc compile /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ubuntu/keras_fp16_benchmarking_db/compiler_workdir/neuron_op_a73aed4b95ca5d5b/graph_def.neff --io-config '{\"inputs\": {\"input_10/_0:0\": [[6, 224, 224, 3], \"float16\"]}, \"outputs\": [\"probs/Softmax:0\"]}' --batching_en --rematerialization_en --sb_size 120 --spill_dis --enable-replication True\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Internal details:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/Job.py\", line 207, in neuroncc.driver.Job.runSingleInputFn\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/jobs/Scheduler.py\", line 58, in neuroncc.driver.jobs.Scheduler.Scheduler.runSingleInput\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: File \"neuroncc/driver/Job.py\", line 145, in neuroncc.driver.Job.Job.shellCommand\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:40 AM ERROR [neuron-cc]: Version information:\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: Neuron Compiler version 1.0.6632.0+6001610955\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]:\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: HWM version 1.0.839.0-6001300654\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: NEFF version 0.6\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: TVM version 1.0.1589.0+6001610955\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: NumPy version 1.16.5\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: MXNet not available\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]: TF version 1.15.0\n", " 01/23/2020 01:15:41 AM ERROR [neuron-cc]:\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "gentle-census", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/tensorflow/keras_resnet50/optimize_for_inference.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import re import copy import argparse import tensorflow as tf import numpy as np import string from google.protobuf import text_format from tensorflow.core.framework import node_def_pb2 from tensorflow.core.framework import attr_value_pb2 from tensorflow.python.framework import tensor_util from tensorflow.tools.graph_transforms import TransformGraph def clear_input(node): for i in range(len(node.input)): node.input.pop() def replace_name(node, name): node.name = name def replace_input(node, input_name, new_name): # node.input.replace(input_name, new_name) temp = [] for i in node.input: temp.extend([new_name if i == input_name else i]) clear_input(node) for i in temp: node.input.extend([i]) def swap_names(node1, node2): temp = node2.name node2.name = node1.name node1.name = temp def get_const_node(const_node_name, const_by_name): name = re.sub("/read$", "", const_node_name) return const_by_name[name] def get_const_ndarray(const_node_name, const_by_name): name = re.sub("/read$", "", const_node_name) node = const_by_name[name] return tf.make_ndarray(node.attr.get("value").tensor) def adjust_bias_values(bias_node, fbn_node, const_by_name): bias_val = get_const_ndarray(bias_node.input[1], const_by_name) gamma_val = get_const_ndarray(fbn_node.input[1], const_by_name) mean_val = get_const_ndarray(fbn_node.input[3], const_by_name) variance_val = get_const_ndarray(fbn_node.input[4], const_by_name) new_bias = bias_val * gamma_val / np.sqrt(variance_val) new_tensor = tensor_util.make_tensor_proto(new_bias, new_bias.dtype, new_bias.shape) bias_const_node = get_const_node(bias_node.input[1], const_by_name) bias_const_node.attr["value"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor)) def MoveBiasAddAfterFusedBatchNorm(graphdef): """fold_batch_norm function of TransformGraph is unable to fold Keras ResNet50 because of BiasAdd between Conv2D and FusedBatchNorm (BiasAdd is not needed if FusedBatchNorm is used, but it exists in Keras ResNet50). Here, we move BiasAdd to after FusedBatchNorm, and adjust bias value by gamma/sqrt(variance). """ sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef)) output_graph_def = tf.compat.v1.GraphDef() node_by_name = {} const_by_name = {} for node in graphdef.node: # Hack: use FusedBatchNormV2 so fold_batch_norm can recognize if node.op == "FusedBatchNormV3": node.op = "FusedBatchNorm" del(node.attr["U"]) #import pdb; pdb.set_trace() copied_node = node_def_pb2.NodeDef() copied_node.CopyFrom(node) node_by_name[node.name] = copied_node skip_add_node = False # Switch Mul/BiasAdd in Keras RN50 so fold_batch_norm transform would work if node.op == "Const": const_by_name[node.name] = copied_node elif node.op.startswith("FusedBatchNorm"): inputs = node.input for i in inputs: input_node = node_by_name[i] if input_node.op == "BiasAdd": output_graph_def.node.remove(input_node) input_node_input0 = input_node.input[0] # Adjust bias values (multiply by scale/sqrt(variance)) adjust_bias_values(input_node, node, const_by_name) # Hack: swap names to avoid changing input of activation swap_names(copied_node, input_node) # Fix inputs for these two ops replace_input(copied_node, i, input_node_input0) replace_input(input_node, input_node_input0, copied_node.name) # Fix order in node list output_graph_def.node.extend([copied_node]) output_graph_def.node.extend([input_node]) skip_add_node = True # Add maybe-modified nodes if not already done if not skip_add_node: output_graph_def.node.extend([copied_node]) return output_graph_def def FoldFusedBatchNorm(graph_def): """Optimize training graph for inference: - Remove Identity and CheckNumerics nodes - Fold FusedBatchNorm constants into previous Conv2D weights - Fold other constants - Strip unused nodes - Sort by execution order """ transformed_graph_def = TransformGraph ( graph_def, ['input_1'], ['probs/Softmax'], [ 'add_default_attributes', 'remove_nodes(op=Identity, op=CheckNumerics)', 'fold_constants(ignore_errors=true)', 'fold_batch_norms', 'fold_old_batch_norms', 'strip_unused_nodes', 'sort_by_execution_order', ]) return transformed_graph_def def load_graph(model_file): graph_def = tf.compat.v1.GraphDef() with open(model_file, "rb") as f: graph_def.ParseFromString(f.read()) return graph_def if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--graph", help="graph/model to be executed", required=True) parser.add_argument("--out_graph", help="graph/model to be generated", required=True) args = parser.parse_args() graph_orig = load_graph(args.graph) graph_mod = MoveBiasAddAfterFusedBatchNorm(graph_orig) graph_mod2 = FoldFusedBatchNorm(graph_mod) with tf.io.gfile.GFile(args.out_graph, "wb") as f: f.write(graph_mod2.SerializeToString()) #with tf.io.gfile.GFile(args.out_graph + "txt", 'w') as f: # f.write(text_format.MessageToString(graph_mod2)) ================================================ FILE: src/examples/tensorflow/keras_resnet50/pb2sm_compile.py ================================================ """ Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 """ import time import shutil import numpy as np import argparse import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications import resnet50 import tensorflow.neuron as tfn tf.keras.backend.set_image_data_format('channels_last') arg_parser = argparse.ArgumentParser() arg_parser.add_argument('--batch_size', type=int, default=5, choices=range(1, 6), help='Input data batch size for compilation of model') arg_parser.add_argument('--neuroncore-pipeline-cores', type=int, default=1, choices=range(1, 17), help='Number of NeuronCores limit for each partitioned graph') arg_parser.add_argument('--debug_args', type=str, default="", help='Optional Compiler debug args') arg_parser.add_argument('--workdir', type=str, default="compiler_workdir", help='Compiler work directory') args = arg_parser.parse_args() def pb_to_saved_model(pb_path, input_names, output_names, model_dir): graph_def = tf.GraphDef() graph_def.ParseFromString(open(pb_path, 'rb').read()) with tf.Session(graph=tf.Graph()) as sess: tf.import_graph_def(graph_def, name='') inputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in input_names.items()} outputs = {name: sess.graph.get_tensor_by_name(ts_name) for name, ts_name in output_names.items()} tf.saved_model.simple_save(sess, model_dir, inputs, outputs) saved_model_dir = "rn50_fp16" shutil.rmtree(saved_model_dir, ignore_errors=True) pb_to_saved_model("resnet50_fp16_keras_opt.pb", {"input_1:0": "input_1:0"}, {"probs/Softmax:0" : "probs/Softmax:0"}, saved_model_dir) batch_size = args.batch_size img_arr = np.zeros([batch_size, 224, 224, 3], dtype='float16') compiled_saved_model_dir = saved_model_dir + "_compiled_b" + str(batch_size) + "_nc" + str(args.neuroncore_pipeline_cores) shutil.rmtree(compiled_saved_model_dir + "/1", ignore_errors=True) print("\n*** Batch size {}, num NeuronCores {} (input shape: {}, saved model dir: {}) ***\n".format(batch_size, args.neuroncore_pipeline_cores, img_arr.shape, compiled_saved_model_dir)) compiler_args = ['--neuroncore-pipeline-cores', str(args.neuroncore_pipeline_cores)] if args.debug_args: compiler_args.extend(args.debug_args.split(" ")) static_weights = False if args.neuroncore_pipeline_cores >= 8: static_weights = True shutil.rmtree(args.workdir, ignore_errors=True) start = time.time() rslts = tfn.saved_model.compile(saved_model_dir, compiled_saved_model_dir + "/1", model_feed_dict={'input_1:0' : img_arr}, compiler_workdir=args.workdir, dynamic_batch_size=True, compiler_args = compiler_args) delta = time.time() - start perc_on_inf = rslts['OnNeuronRatio'] * 100 compile_success = False if perc_on_inf < 50: print("\nERROR: Compilation finished in {:.0f} seconds with less than 50% operations placed on Inferentia ({:.1f}%)\n".format(delta, perc_on_inf)) if '--static-weights' in compiler_args: print("INFO: Retry compilation without static weights") compiler_args.remove('--static-weights') static_weights = False shutil.rmtree(compiled_saved_model_dir + "/1", ignore_errors=True) shutil.rmtree('compiler_workdir2', ignore_errors=True) start = time.time() rslts = tfn.saved_model.compile(saved_model_dir, compiled_saved_model_dir + "/1", model_feed_dict={'input_1:0' : img_arr}, compiler_workdir='compiler_workdir2', dynamic_batch_size=True, compiler_args = compiler_args) delta = time.time() - start perc_on_inf = rslts['OnNeuronRatio'] * 100 if perc_on_inf < 50: print("\nERROR: Retry compilation finished in {:.0f} seconds with less than 50% operations placed on Inferentia ({:.1f}%)\n".format(delta, perc_on_inf)) else: print("\nINFO: Retry compilation finished in {:.0f} seconds with {:.1f}% operations placed on Inferentia\n".format(delta, perc_on_inf)) compile_success = True else: print("\nINFO: Compilation finished in {:.0f} seconds with {:.1f}% operations placed on Inferentia\n".format(delta, perc_on_inf)) compile_success = True # Prepare SavedModel for uploading to Inf1 instance completion_code = 0 if compile_success: shutil.make_archive('./' + compiled_saved_model_dir, 'zip', './', compiled_saved_model_dir) completion_code = 1 + int(static_weights) print(completion_code) exit(int(not compile_success)) ================================================ FILE: src/examples/tensorflow/keras_resnet50/run_all ================================================ #!/usr/bin/env bash ########################################################################## # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. # SPDX-License-Identifier: MIT-0 ########################################################################## pip install pillow # Extract Keras ResNet50 FP32 and check inference python gen_resnet50_keras.py python infer_resnet50_keras.py --graph resnet50_fp32_keras.pb # Optimize fp32 graph for inference before casting python optimize_for_inference.py --graph resnet50_fp32_keras.pb --out_graph resnet50_fp32_keras_opt.pb python infer_resnet50_keras.py --graph resnet50_fp32_keras_opt.pb # Cast full graph to FP16 python fp32tofp16.py --graph resnet50_fp32_keras_opt.pb --out_graph resnet50_fp16_keras_opt.pb python infer_resnet50_keras.py --graph resnet50_fp16_keras_opt.pb # Compile python pb2sm_compile.py # Infer python infer_resnet50_keras_loadtest.py ================================================ FILE: src/examples/tensorflow/openpose_demo/openpose.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "caff04ba", "metadata": {}, "source": [ "# Running OpenPose on Inferentia\n" ] }, { "cell_type": "markdown", "id": "09b2919a", "metadata": {}, "source": [ "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "id": "4dcf9bb1", "metadata": {}, "source": [ "## Introduction:\n", "\n", "In this tutorial we will compile and deploy Openpose model for Inferentia. This jupyter notebook should run on an inf1.6xlarge instance for compilation and inference. The inference part of this tutorial requires inf1.6xlarge and not the compilation itself. For simplicity we will run this tutorial on a single instance but in real life scenario the compilation can be done on a compute c5.4xlarge instance and the deployment on the inf1 instance family.\n", "\n", "In this tutorial we provide two main sections:\n", "1. Compile the OpenPose model on inf1x6large.\n", "2. Infer the same compiled model on inf1x6large.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\n" ] }, { "cell_type": "markdown", "id": "04ae0838", "metadata": {}, "source": [ "## Acknowledgement:\n", "\n", "Many thanks to https://github.com/ildoonet for providing pretrained model as well as the image preprocessing/pose estimating infrastructure." ] }, { "cell_type": "markdown", "id": "d0d6d08e", "metadata": {}, "source": [ "## Download tensorflow pose net frozen graph." ] }, { "cell_type": "code", "execution_count": null, "id": "1926d4e3", "metadata": { "scrolled": false }, "outputs": [], "source": [ "!wget -c --tries=2 $( wget -q -O - http://www.mediafire.com/file/qlzzr20mpocnpa3/graph_opt.pb | grep -o 'http*://download[^\"]*' | tail -n 1 ) -O graph_opt.pb\n", "\n", "!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n", "!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "id": "83eb578b", "metadata": {}, "source": [ "## Compile\n", "Compile the pose net frozen graph into AWS Neuron compatible form. Network input image resolution is adjustable with argument --net_resolution (e. g., --net_resolution=656x368). The compiled model can accept arbitrary batch size input at runtime." ] }, { "cell_type": "code", "execution_count": null, "id": "362f322e", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "Usage: python convert_graph_opt.py /path/to/graph_opt.pb /path/to/graph_opt_neuron.pb\n", "\"\"\"\n", "#import argparse\n", "import numpy as np\n", "import tensorflow as tf\n", "from tensorflow.core.framework.tensor_shape_pb2 import TensorShapeProto\n", "import tensorflow.neuron as tfn\n", "\n", "\n", "def compile():\n", " #parser = argparse.ArgumentParser()\n", " #parser.add_argument('input_pb_path', help='Input serialized GraphDef protobuf')\n", " #parser.add_argument('output_pb_path', help='Ouput serialized GraphDef protobuf')\n", " #parser.add_argument('--net_resolution', default='656x368', help='Network resolution in WxH format, e. g., --net_resolution=656x368')\n", " #parser.add_argument('--debug_verify', action='store_true')\n", " #args = parser.parse_args()\n", " \n", " input_pb_path = './graph_opt.pb'\n", " net_resolution = '656x368'\n", " output_pb_path = './graph_opt_neuron_' + net_resolution + '.pb'\n", " \n", " debug_verify = 'store_true'\n", " dim_w, dim_h = net_resolution.split('x')\n", " dim_w = int(dim_w)\n", " dim_h = int(dim_h)\n", " graph_def = tf.GraphDef()\n", " with open(input_pb_path, 'rb') as f:\n", " graph_def.ParseFromString(f.read())\n", "\n", " if debug_verify:\n", " np.random.seed(0)\n", " feed_dict = {'image:0': np.random.rand(1, dim_h, dim_w, 3)}\n", " output_name = 'Openpose/concat_stage7:0'\n", " with tf.Session(graph=tf.Graph()) as sess:\n", " tf.import_graph_def(graph_def, name='')\n", " result_reference = sess.run(output_name, feed_dict)\n", "\n", " preprocessing_ops = {'preprocess_divide', 'preprocess_divide/y', 'preprocess_subtract', 'preprocess_subtract/y'}\n", " graph_def = nhwc_to_nchw(graph_def, preprocessing_ops)\n", " graph_def = inline_float32_to_float16(graph_def, preprocessing_ops)\n", " with tf.Session(graph=tf.Graph()) as sess:\n", " tf.import_graph_def(graph_def, name='')\n", " no_fuse_ops = preprocessing_ops.union({'Openpose/concat_stage7'})\n", " infer_graph = tfn.graph_util.inference_graph_from_session(\n", " sess, shape_feed_dict={'image:0': [1, dim_h, dim_w, 3]}, output_tensors=['Openpose/concat_stage7:0'],\n", " no_fuse_ops=no_fuse_ops, dynamic_batch_size=True,\n", " )\n", " with open(output_pb_path, 'wb') as f:\n", " f.write(infer_graph.as_graph_def().SerializeToString())\n", "\n", " if debug_verify:\n", " with tf.Session(graph=infer_graph) as sess:\n", " result_compiled = sess.run(output_name, feed_dict)\n", " np.testing.assert_allclose(result_compiled, result_reference, rtol=1e-2, atol=1e-3)\n", "\n", "\n", "def inline_float32_to_float16(graph_def, preprocessing_ops):\n", " float32_enum = tf.float32.as_datatype_enum\n", " float16_enum = tf.float16.as_datatype_enum\n", " graph = tf.Graph()\n", " with graph.as_default():\n", " tf.import_graph_def(graph_def, name='')\n", " graph_def = graph.as_graph_def()\n", " for node in graph_def.node:\n", " if node.name in preprocessing_ops or node.op == 'Placeholder':\n", " cast_input_node_name = node.name\n", " continue\n", " if node.op == 'Const':\n", " if node.attr['dtype'].type == float32_enum:\n", " node.attr['dtype'].type = float16_enum\n", " tensor_def = node.attr['value'].tensor\n", " tensor_def.dtype = float16_enum\n", " if tensor_def.tensor_content:\n", " const_np = np.frombuffer(tensor_def.tensor_content, dtype=np.float32).astype(np.float16)\n", " tensor_def.tensor_content = const_np.tobytes()\n", " elif len(tensor_def.float_val):\n", " const_np = np.array(tensor_def.float_val).astype(np.float16).view(np.uint16)\n", " tensor_def.float_val[:] = []\n", " tensor_def.half_val[:] = list(const_np)\n", " else:\n", " raise NotImplementedError\n", " elif 'T' in node.attr and node.attr['T'].type == float32_enum:\n", " node.attr['T'].type = float16_enum\n", " for node in graph_def.node:\n", " if node.name == cast_input_node_name:\n", " node.name = '{}_PreCastFloat32ToFlot16'.format(node.name)\n", " input_node = node\n", " break\n", " cast_input_node = _gen_cast_node_def(cast_input_node_name, tf.float16, input_node)\n", "\n", " output_node = graph_def.node[-1]\n", " cast_output_node_name = output_node.name\n", " output_node.name = '{}_PreCastFloat16ToFlot32'.format(output_node.name)\n", " cast_output_node = _gen_cast_node_def(cast_output_node_name, tf.float32, output_node)\n", "\n", " preprocessing_ops.add(input_node.name)\n", " new_graph_def = tf.GraphDef()\n", " new_graph_def.node.extend(graph_def.node)\n", " new_graph_def.node.append(cast_input_node)\n", " new_graph_def.node.append(cast_output_node)\n", " graph = tf.Graph()\n", " with graph.as_default():\n", " tf.import_graph_def(new_graph_def, name='')\n", " return graph.as_graph_def()\n", "\n", "\n", "def nhwc_to_nchw(graph_def, preprocessing_ops):\n", " graph = tf.Graph()\n", " with graph.as_default():\n", " tf.import_graph_def(graph_def, name='')\n", " graph_def = graph.as_graph_def()\n", " node_name_to_node = {node.name: node for node in graph_def.node}\n", " for node in graph_def.node:\n", " if node.name in preprocessing_ops or node.op == 'Placeholder':\n", " transpose_input_node_name = node.name\n", " continue\n", " if node.op == 'Conv2D':\n", " node.attr['data_format'].s = b'NCHW'\n", " strides = node.attr['strides'].list.i\n", " strides[:] = [strides[0], strides[3], strides[1], strides[2]]\n", " elif node.op == 'BiasAdd':\n", " if node.name != 'probs/BiasAdd':\n", " node.attr['data_format'].s = b'NCHW'\n", " elif node.op == 'MaxPool':\n", " node.attr['data_format'].s = b'NCHW'\n", " ksize = node.attr['ksize'].list.i\n", " ksize[:] = [ksize[0], ksize[3], ksize[1], ksize[2]]\n", " strides = node.attr['strides'].list.i\n", " strides[:] = [strides[0], strides[3], strides[1], strides[2]]\n", " elif node.op in {'Concat', 'ConcatV2'}:\n", " node_axes = node_name_to_node[node.input[-1]]\n", " node_axes.attr['value'].tensor.int_val[:] = [1]\n", " for node in graph_def.node:\n", " if node.name == transpose_input_node_name:\n", " node.name = '{}_PreTransposeNHWC2NCHW'.format(node.name)\n", " input_node = node\n", " break\n", " transpose_input_node, transpose_input_perm_node = _gen_transpose_def(transpose_input_node_name, [0, 3, 1, 2], input_node)\n", "\n", " output_node = graph_def.node[-1]\n", " transpose_output_node_name = output_node.name\n", " output_node.name = '{}_PreTransposeNCHW2NHWC'.format(output_node.name)\n", " transpose_output_node, transpose_output_perm_node = _gen_transpose_def(transpose_output_node_name, [0, 2, 3, 1], output_node)\n", "\n", " preprocessing_ops.add(input_node.name)\n", " preprocessing_ops.add(transpose_input_perm_node.name)\n", " new_graph_def = tf.GraphDef()\n", " new_graph_def.node.extend(graph_def.node)\n", " new_graph_def.node.append(transpose_input_perm_node)\n", " new_graph_def.node.append(transpose_input_node)\n", " new_graph_def.node.append(transpose_output_perm_node)\n", " new_graph_def.node.append(transpose_output_node)\n", " graph = tf.Graph()\n", " with graph.as_default():\n", " tf.import_graph_def(new_graph_def, name='')\n", " return graph.as_graph_def()\n", "\n", "\n", "def _gen_cast_node_def(name, target_dtype, input_node):\n", " cast_node = tf.NodeDef(name=name, op='Cast')\n", " cast_node.input.append(input_node.name)\n", " cast_node.attr['DstT'].type = target_dtype.as_datatype_enum\n", " cast_node.attr['SrcT'].type = input_node.attr['T'].type\n", " cast_node.attr['Truncate'].b = False\n", " return cast_node\n", "\n", "\n", "def _gen_transpose_def(name, perm, input_node):\n", " perm_node = tf.NodeDef(name='{}/perm'.format(name), op='Const')\n", " perm_node.attr['dtype'].type = tf.int32.as_datatype_enum\n", " tensor_def = perm_node.attr['value'].tensor\n", " tensor_def.dtype = tf.int32.as_datatype_enum\n", " tensor_def.tensor_shape.dim.append(TensorShapeProto.Dim(size=4))\n", " tensor_def.tensor_content = np.array(perm, dtype=np.int32).tobytes()\n", " transpose_node = tf.NodeDef(name=name, op='Transpose')\n", " transpose_node.input.append(input_node.name)\n", " transpose_node.input.append(perm_node.name)\n", " transpose_node.attr['T'].type = input_node.attr['T'].type\n", " transpose_node.attr['Tperm'].type = tf.int32.as_datatype_enum\n", " return transpose_node, perm_node\n" ] }, { "cell_type": "code", "execution_count": null, "id": "88c41e01", "metadata": { "scrolled": true }, "outputs": [], "source": [ "compile()\n", "\n", "# Sample output will look like below:\n", "# WARNING:tensorflow:From :47: inference_graph_from_session (from tensorflow_neuron.python.graph_util) is deprecated and will be removed in a future version.\n", "# Instructions for updating:\n", "# Please refer to AWS documentation on Neuron integrated TensorFlow 2.0.\n", "# INFO:tensorflow:Froze 0 variables.\n", "# INFO:tensorflow:Converted 0 variables to const ops.\n", "# INFO:tensorflow:fusing subgraph {subgraph neuron_op_ed41d2deb8c54255 with input tensors [\"\"], output tensors [\"\"]} with neuron-cc\n", "# INFO:tensorflow:Number of operations in TensorFlow session: 474\n", "# INFO:tensorflow:Number of operations after tf.neuron optimizations: 474\n", "# INFO:tensorflow:Number of operations placed on Neuron runtime: 465" ] }, { "cell_type": "markdown", "id": "5a9af0c7", "metadata": {}, "source": [ "## Deploy\n", "Using same instance to deploy the model.\n", "In case of different deployment instance, launch a deployment inf1 instance and copy the AWS Neuron optimized tensorflow frozen graph graph_opt_neuron_656x368.pb to the deployment inf1 instance. The smallest instance type inf1.xlarge is sufficient for this demo.\n", "\n", "Your graph_opt_neuron_656x368.pb can now be plugged into https://github.com/ildoonet seemlessly if you have tensorflow-neuron installed. When it is used at runtime, please ensure that the image resolution is the same as compile-time image resolution, i. e., 656x368.\n", "\n", "Measure performance on the compiled frozen graph using dummy inputs.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0481d049", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "Copyright (C) 2020, Amazon.com. All Rights Reserved\n", "\"\"\"\n", "import os\n", "import atexit\n", "import time\n", "import math\n", "import json\n", "from collections import OrderedDict, Counter\n", "from contextlib import contextmanager, ContextDecorator\n", "from functools import wraps\n", "from tensorflow.python.client import session\n", "from tensorflow.python.platform import tf_logging as logging\n", "\n", "\n", "class measure_performance(ContextDecorator):\n", " \"\"\"Convenient tool for performance measurements.\n", " Can be apply on tensorflow session.run, tf-serving unary gRPC calls, or a given custom function.\n", " Usage:\n", " To generate performance report for the entire Python or gRPC-client process, insert\n", " the following function call before running inferences:\n", " `tfn.measure_performance()`\n", " Then latency/throughput report will be generated when the process terminates.\n", " Alternatively, it is possible to use `tfn.measure_performance` programmatically\n", " as a context manager. Performance measurement will be done for all inferences\n", " happening under this context. Report will be displayed as INFO level log when exiting\n", " the context. It is also possible to obtain a JSON format report in Python.\n", " For example:\n", " ```\n", " with tfn.measure_performance() as perf:\n", " ... (run some inferences) ...\n", " report_json = perf.report()\n", " report_full_json = perf.report(verbosity=1)\n", " ```\n", " \"\"\"\n", "\n", " def __init__(self, func=None, window_size=1):\n", " self.perf_tracker = PerformanceTracker(window_size)\n", " atexit.register(self.perf_tracker.report)\n", " self._original_run = session.Session.run\n", " self._original_grpc_call = None\n", " if callable(func):\n", " self.perf_tracker.register_func(self._track_performance(func))\n", " else:\n", " session.Session.run = self._track_performance(session.Session.run)\n", " try:\n", " import grpc\n", " from tensorflow_serving.apis import prediction_service_pb2_grpc\n", " dummy_stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel(''))\n", " self._grpc_callable_type = type(dummy_stub.Predict)\n", " self._original_grpc_call = self._grpc_callable_type.__call__\n", " except ImportError:\n", " pass\n", " if callable(self._original_grpc_call):\n", " self._grpc_callable_type.__call__ = self._track_performance(\n", " grpc._channel._UnaryUnaryMultiCallable.__call__\n", " )\n", "\n", " def __enter__(self):\n", " return self.perf_tracker\n", "\n", " def __exit__(self, *exc):\n", " atexit.unregister(self.perf_tracker.report)\n", " self.perf_tracker.report()\n", " session.Session.run = self._original_run\n", " if self._original_grpc_call is not None:\n", " self._grpc_callable_type.__call__ = self._original_grpc_call\n", " return False\n", "\n", " def _track_performance(self, func):\n", " @wraps(func)\n", " def wrapper(*args, **kwargs):\n", " start = time.time()\n", " result = func(*args, **kwargs)\n", " end = time.time()\n", " self.perf_tracker.add_timestamps(start, end)\n", " return result\n", " return wrapper\n", "\n", "\n", "class PerformanceTracker(ContextDecorator):\n", "\n", " description = (\n", " \"Latency unit: second. Throughput unit: number of batched inferences per second. \"\n", " \"Reported throughput is a lower bound of the actual throughput as inferences \"\n", " \"spanning across window boundaries are not counted towards any of the windows. \"\n", " \"'Quiet' periods (i. e., window buckets where the inference function is not called) \"\n", " \"are not counted towards the reported average throughput.\"\n", " )\n", "\n", " def __init__(self, window_size):\n", " self.window_size = window_size\n", " self.timestamps_list = []\n", " self._func = None\n", "\n", " def __call__(self, *args, **kwargs):\n", " return self._func(*args, **kwargs)\n", "\n", " def register_func(self, func):\n", " self._func = func\n", "\n", " def add_timestamps(self, start, end):\n", " self.timestamps_list.append([start, end])\n", "\n", " def report(self, verbosity=0):\n", " if self.timestamps_list:\n", " latency_list = [end - start for start, end in self.timestamps_list]\n", " latency_json = {\n", " 'p50': percentile(latency_list, 50),\n", " 'p90': percentile(latency_list, 90),\n", " 'p99': percentile(latency_list, 99),\n", " 'p100': percentile(latency_list, 100),\n", " }\n", " bucketed_timestamps = [self._get_bucket(start, end) for start, end in self.timestamps_list]\n", " counted_buckets = Counter(item for item in bucketed_timestamps if item is not None)\n", " bucket_throughputs = [(key, value / self.window_size) for key, value in sorted(counted_buckets.items())]\n", " busy_throughputs = list(OrderedDict((key, value) for key, value in bucket_throughputs).values())\n", " throughput_json = {\n", " 'peak': max(busy_throughputs),\n", " 'median': percentile(busy_throughputs, 50),\n", " 'average': sum(busy_throughputs) / len(busy_throughputs),\n", " }\n", " if verbosity > 0:\n", " throughput_json['trend'] = busy_throughputs\n", " report_json = {\n", " 'pid': os.getpid(),\n", " 'throughput': throughput_json,\n", " 'latency': latency_json,\n", " 'description': PerformanceTracker.description,\n", " }\n", " with _logging_show_info():\n", " logging.info('performance report:\\n{}'.format(json.dumps(report_json, indent=4)))\n", " return report_json\n", "\n", " def _get_bucket(self, start, end):\n", " bucketed_start = math.floor(start / self.window_size) * self.window_size\n", " bucketed_end = math.ceil(end / self.window_size) * self.window_size\n", " if bucketed_end - bucketed_start == self.window_size:\n", " return bucketed_start\n", " else:\n", " return None\n", "\n", "\n", "def percentile(number_list, percent):\n", " pos_float = len(number_list) * percent / 100\n", " max_pos = len(number_list) - 1\n", " pos_floor = min(math.floor(pos_float), max_pos)\n", " pos_ceil = min(math.ceil(pos_float), max_pos)\n", " number_list = sorted(number_list)\n", " return number_list[pos_ceil] if pos_float - pos_floor > 0.5 else number_list[pos_floor]\n", "\n", "\n", "@contextmanager\n", "def _logging_show_info():\n", " try:\n", " verbosity = logging.get_verbosity()\n", " logging.set_verbosity(logging.INFO)\n", " yield\n", " finally:\n", " logging.set_verbosity(verbosity)" ] }, { "cell_type": "code", "execution_count": null, "id": "960c6aa9", "metadata": {}, "outputs": [], "source": [ "\"\"\"\n", "Below are the inputs for compiled frozen graph \n", "\n", "pb_path is a /path/graph_opt_neuron_656x368.pb\n", "num_thread = 8 ( Number of threads that work on each tensorflow session ) \n", "batch_size =1 ( batch_size )\n", "net_resolution ,default=656x368\n", "num_inferences = 200\n", "\"\"\"\n", "import os\n", "from concurrent import futures\n", "import numpy as np\n", "import tensorflow as tf\n", "import tensorflow.neuron as tfn\n", "\n", "def run_with_dummy(sess, dummy_feed_dict, num_inferences):\n", " for _ in range(num_inferences):\n", " sess.run('Openpose/concat_stage7:0', dummy_feed_dict)\n", " \n", "def main():\n", " NUM_NEURON_CORES = 16\n", " pb_path = './graph_opt_neuron_656x368.pb'\n", " num_thread = 8\n", " batch_size = 1\n", " net_resolution = '656x368'\n", " num_inferences = 200\n", " dim_w, dim_h = net_resolution.split('x')\n", " dim_w = int(dim_w)\n", " dim_h = int(dim_h)\n", " graph_def = tf.GraphDef()\n", " with open(pb_path, 'rb') as f:\n", " graph_def.ParseFromString(f.read())\n", " \n", " graph_def = tfn.graph_util.tag_multicore(graph_def, NUM_NEURON_CORES)\n", " \n", " with tfn.measure_performance() as perf:\n", " with tf.Session(graph=tf.Graph()) as sess:\n", " tf.import_graph_def(graph_def, name='')\n", " input_name = 'image:0'\n", " input_shape = sess.graph.get_tensor_by_name(input_name).shape.as_list()\n", " input_shape[0] = batch_size\n", " input_shape[1] = dim_h\n", " input_shape[2] = dim_w\n", " dummy_feed_dict = {input_name: np.zeros(input_shape).astype(np.float32)}\n", " with futures.ThreadPoolExecutor(max_workers=num_thread) as executor:\n", " fut_list = [executor.submit(run_with_dummy, sess, dummy_feed_dict, num_inferences) for _ in range(num_thread)]\n", " res_list = [fut.result() for fut in fut_list] \n", "\n", "main()\n", "\n", "# Sample output will look like below:\n", "# INFO:tensorflow:performance report:\n", "# {\n", "# \"pid\": 17713,\n", "# \"throughput\": {\n", "# \"peak\": 66.0,\n", "# \"median\": 64.0,\n", "# \"average\": 61.56521739130435\n", "# },\n", "# \"latency\": {\n", "# \"p50\": 0.1106414794921875,\n", "# \"p90\": 0.11212301254272461,\n", "# \"p99\": 0.11337876319885254,\n", "# \"p100\": 7.08282732963562\n", "# },\n", "# \"description\": \"Latency unit: second. Throughput unit: number of batched inferences per second. Reported throughput is a lower bound of the actual throughput as inferences spanning across window boundaries are not counted towards any of the windows. 'Quiet' periods (i. e., window buckets where the inference function is not called) are not counted towards the reported average throughput.\"\n", "# }" ] }, { "cell_type": "raw", "id": "4f15e776", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/tensorflow/ssd300_demo/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/tensorflow/ssd300_demo/ssd300_detection.py ================================================ import argparse import json import pkg_resources from distutils.version import LooseVersion import numpy as np from PIL import Image import matplotlib.pyplot as plt import matplotlib.patches as patches import tensorflow as tf import tensorflow.neuron as tfn def main(): parser = argparse.ArgumentParser() parser.add_argument('--image', required=True, help='Path to image that is to be detected. Support jpeg and png format.') parser.add_argument('--image_with_detections', required=True, help='Path to save image after detection (with bounding boxes drawn). Png format.') parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel') parser.add_argument('--score_threshold', type=float, default=0.15, help='Minimum required score for drawing a bounding box') parser.add_argument('--instances_val2017_json', default=None, help='Json file that contains labeling information') parser.add_argument('--save_results', default=None) parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if not args.disable_version_check: tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.0.1.0.1333.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) with open(args.image, 'rb') as f: img_jpg_bytes = f.read() model_feed_dict = {'batch_image': [img_jpg_bytes]} predictor = tf.contrib.predictor.from_saved_model(args.saved_model) results = predictor(model_feed_dict) if args.save_results is not None: np.savez(args.save_results, **results) boxes_np = results['boxes'] scores_np = results['scores'] classes_np = results['classes'] if args.instances_val2017_json is not None: with open(args.instances_val2017_json) as f: annotate_json = json.load(f) label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])} plt.switch_backend('agg') fig, ax = plt.subplots(1) ax.imshow(Image.open(args.image).convert('RGB')) wanted = scores_np[0] > args.score_threshold for xywh, label_no_bg in zip(boxes_np[0][wanted], classes_np[0][wanted]): rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none') ax.add_patch(rect) rx, ry = rect.get_xy() rx = rx + rect.get_width() / 2.0 if args.instances_val2017_json is not None: ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10, ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5)) plt.savefig(args.image_with_detections) plt.close(fig) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/ssd300_demo/ssd300_evaluation.py ================================================ import argparse import os import json import glob from concurrent import futures import time import pkg_resources from distutils.version import LooseVersion import numpy as np import tensorflow as tf import tensorflow.neuron as tfn from pycocotools.cocoeval import COCOeval from DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection def get_val_dataset(val_annotate, val_coco_root): dboxes = dboxes300_coco() val_trans = SSDTransformer(dboxes, (300, 300), val=True) val_coco = COCODetection(val_coco_root, val_annotate, val_trans) return val_coco def main(): parser = argparse.ArgumentParser() parser.add_argument('--saved_model', required=True, help='TensorFlow SSD300 SavedModel') parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset') parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information') parser.add_argument('--num_sessions', type=int, default=1, help='Number of tensorflow sessions') parser.add_argument('--num_threads', type=int, default=4, help='Number of threads') parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput') parser.add_argument('--save_results', default=None) parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if not args.disable_version_check: tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.0.1.0.1333.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) predictor_list = [tf.contrib.predictor.from_saved_model(args.saved_model) for _ in range(args.num_sessions)] val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017) inv_map = {v: k for k, v in val_dataset.label_map.items()} model_feed_dict_list = [] for img_id in val_dataset.img_keys: img_path = os.path.join(args.val2017, val_dataset.images[img_id][0]) with open(img_path, 'rb') as f: img_jpg_bytes = f.read() model_feed_dict_list.append({'batch_image': [img_jpg_bytes]}) latency_list = [] throughput_list = [] def predict(pred, model_feed_dict): start = time.time() result = pred(model_feed_dict) latency_list.append(time.time() - start) return result def performance(): last_num_infer = len(latency_list) while len(latency_list) < len(model_feed_dict_list): current_num_infer = len(latency_list) throughput = (current_num_infer - last_num_infer) / args.throughput_interval throughput_list.append(throughput) p50 = 0.0 p90 = 0.0 if latency_list: p50 = np.percentile(latency_list, 50) p90 = np.percentile(latency_list, 90) print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90)) last_num_infer = current_num_infer time.sleep(args.throughput_interval) executor = futures.ThreadPoolExecutor(max_workers=(args.num_sessions*args.num_threads)+1) performance_future = executor.submit(performance) eval_futures = [] for idx, model_feed_dict in enumerate(model_feed_dict_list): eval_fut = executor.submit(predict, predictor_list[idx%len(predictor_list)], model_feed_dict) eval_futures.append(eval_fut) waited_results = [] for idx, eval_fut in enumerate(eval_futures): if idx % 100 == 0: print('evaluating image {}/{}'.format(idx, len(eval_futures))) waited_results.append(eval_fut.result()) eval_results = [] for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)): boxes = results['boxes'] for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]): res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]] # +1 to account for background eval_results.append(res) performance_future.result() coco_gt = COCO(annotation_file=args.instances_val2017_json) coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32)) coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox') coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() if args.save_results is not None: np.save(args.save_results, coco_eval.stats) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/ssd300_demo/ssd300_evaluation_client.py ================================================ import argparse import os import json import glob from concurrent import futures import time import subprocess from distutils.version import LooseVersion import numpy as np import tensorflow as tf import grpc from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from pycocotools.cocoeval import COCOeval from DeepLearningExamples.PyTorch.Detection.SSD.src.coco import COCO from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import dboxes300_coco from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import SSDTransformer from DeepLearningExamples.PyTorch.Detection.SSD.src.utils import COCODetection def get_val_dataset(val_annotate, val_coco_root): dboxes = dboxes300_coco() val_trans = SSDTransformer(dboxes, (300, 300), val=True) val_coco = COCODetection(val_coco_root, val_annotate, val_trans) return val_coco def main(): parser = argparse.ArgumentParser() parser.add_argument('--server_address', default='localhost:8500', help='tensorflow-model-server-neuron grpc address') parser.add_argument('--model_name', default='default', help='Serving model name') parser.add_argument('--val2017', required=True, help='Path to COCO 2017 validation dataset') parser.add_argument('--instances_val2017_json', required=True, help='Json file that contains labeling information') parser.add_argument('--num_threads', type=int, default=4, help='Number of threads') parser.add_argument('--throughput_interval', type=int, default=10, help='Interval for counting throughput') parser.add_argument('--save_results', default=None) args = parser.parse_args() channel = grpc.insecure_channel(args.server_address) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) val_dataset = get_val_dataset(args.instances_val2017_json, args.val2017) inv_map = {v: k for k, v in val_dataset.label_map.items()} request_list = [] for img_id in val_dataset.img_keys: img_path = os.path.join(args.val2017, val_dataset.images[img_id][0]) with open(img_path, 'rb') as f: img_jpg_bytes = f.read() data = np.array([img_jpg_bytes], dtype=object) data = tf.contrib.util.make_tensor_proto(data, shape=data.shape) request = predict_pb2.PredictRequest() request.model_spec.name = args.model_name request.inputs['batch_image'].CopyFrom(data) request_list.append(request) latency_list = [] throughput_list = [] def predict(request): start = time.time() result = stub.Predict(request).outputs latency_list.append(time.time() - start) return result def performance(): last_num_infer = len(latency_list) while len(latency_list) < len(request_list): current_num_infer = len(latency_list) throughput = (current_num_infer - last_num_infer) / args.throughput_interval throughput_list.append(throughput) p50 = 0.0 p90 = 0.0 if latency_list: p50 = np.percentile(latency_list, 50) p90 = np.percentile(latency_list, 90) print('pid {}: current throughput {}, latency p50={:.3f} p90={:.3f}'.format(os.getpid(), throughput, p50, p90)) last_num_infer = current_num_infer time.sleep(args.throughput_interval) executor = futures.ThreadPoolExecutor(max_workers=args.num_threads+1) performance_future = executor.submit(performance) eval_futures = [] for idx, request in enumerate(request_list): eval_fut = executor.submit(predict, request) eval_futures.append(eval_fut) waited_results = [] for idx, eval_fut in enumerate(eval_futures): if idx % 100 == 0: print('evaluating image {}/{}'.format(idx, len(eval_futures))) waited_results.append(eval_fut.result()) eval_results = [] for idx, (img_id, results) in enumerate(zip(val_dataset.img_keys, waited_results)): results = {key: tf.make_ndarray(value) for key, value in results.items()} boxes = results['boxes'] for box, label, prob in zip(results['boxes'][0], results['classes'][0], results['scores'][0]): res = [img_id, box[0], box[1], box[2], box[3], prob, inv_map[label+1]] # +1 to account for background eval_results.append(res) performance_future.result() coco_gt = COCO(annotation_file=args.instances_val2017_json) coco_dt = coco_gt.loadRes(np.array(eval_results).astype(np.float32)) coco_eval = COCOeval(coco_gt, coco_dt, iouType='bbox') coco_eval.evaluate() coco_eval.accumulate() coco_eval.summarize() if args.save_results is not None: np.save(args.save_results, coco_eval.stats) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/ssd300_demo/ssd300_model.py ================================================ import sys import os import argparse import time import itertools from functools import partial from collections import Counter import json import shutil import pkg_resources from distutils.version import LooseVersion import numpy as np import tensorflow as tf from tensorflow.core.framework import attr_value_pb2 import tensorflow.neuron as tfn import torch def decode_jpeg_resize(input_tensor, image_size): # decode jpeg tensor = tf.image.decode_png(input_tensor, channels=3) # resize decoded_shape = tf.shape(tensor) tensor = tf.cast(tensor, tf.float32) decoded_shape_hw = decoded_shape[0:2] decoded_shape_hw_float32 = tf.cast(decoded_shape_hw, tf.float32) tensor = tf.image.resize(tensor, image_size) # normalize tensor -= np.array([0.485, 0.456, 0.406]).astype(np.float32) * 255.0 return tensor, decoded_shape_hw_float32[::-1] def preprocessor(input_tensor, image_size): with tf.name_scope('Preprocessor'): tensor, bbox_scale_hw = tf.map_fn( partial(decode_jpeg_resize, image_size=image_size), input_tensor, dtype=(tf.float32, tf.float32), back_prop=False, parallel_iterations=16) return tensor, bbox_scale_hw def tf_Conv2d(input_tensor, module, first_conv=False): np_dtype = input_tensor.dtype.as_numpy_dtype kernel_np = module.weight.detach().numpy().transpose([2, 3, 1, 0]) if first_conv: kernel_np /= (np.array([0.229, 0.224, 0.225]).astype(np.float32) * 255.0)[:, np.newaxis] kernel = tf.constant(kernel_np.astype(np_dtype)) if any(module.padding): pad_h, pad_w = module.padding padding = [[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]] input_tensor = tf.pad(input_tensor, padding) stride_h, stride_w = module.stride tensor = tf.nn.conv2d(input_tensor, kernel, strides=[1, stride_h, stride_w, 1], padding='VALID') if module.bias is not None: bias = tf.constant(module.bias.detach().numpy().astype(np_dtype)) tensor = tf.nn.bias_add(tensor, bias) return tensor def tf_BatchNorm2d(input_tensor, module): def _norm_np(ts): return ts.astype(input_tensor.dtype.as_numpy_dtype) mean = _norm_np(module.running_mean.detach().numpy()) offset = _norm_np(module.bias.detach().numpy()) inv_std = np.sqrt(module.running_var.detach().numpy() + module.eps) scale_inv_std = _norm_np(module.weight.detach().numpy() / inv_std) return scale_inv_std * (input_tensor - mean) + offset def tf_MaxPool2d(input_tensor, module): pad = module.padding tensor = tf.pad(input_tensor, [[0, 0], [pad, pad], [pad, pad], [0, 0]]) return tf.nn.max_pool2d(tensor, ksize=module.kernel_size, strides=module.stride, padding='VALID') def tf_Bottleneck(input_tensor, module): tensor = tf_Conv2d(input_tensor, module.conv1) tensor = tf_BatchNorm2d(tensor, module.bn1) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, module.conv2) tensor = tf_BatchNorm2d(tensor, module.bn2) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, module.conv3) tensor = tf_BatchNorm2d(tensor, module.bn3) if module.downsample is not None: input_tensor = tf_Conv2d(input_tensor, module.downsample[0]) input_tensor = tf_BatchNorm2d(input_tensor, module.downsample[1]) return tf.nn.relu(input_tensor + tensor) def tf_SequentialBottleneck(tensor, seq, resnet): with tf.name_scope('{}.Sequential'.format(seq)): for idx, module in enumerate(resnet[seq]): with tf.name_scope('{}.BasicBlock'.format(idx)): tensor = tf_Bottleneck(tensor, module) return tensor def tf_bbox_view(detection_feed, modules, ndim): results = [] for idx, (tensor, mod) in enumerate(zip(detection_feed, modules)): with tf.name_scope('branch{}'.format(idx)): tensor = tf_Conv2d(tensor, mod) tensor = tf.transpose(tensor, [0, 3, 1, 2]) tensor = tf.cast(tensor, tf.float32) shape = tensor.shape.as_list() batch_size = -1 if shape[0] is None else shape[0] new_shape = [batch_size, ndim, np.prod(shape[1:]) // ndim] results.append(tf.reshape(tensor, new_shape)) tensor = tf.concat(results, axis=-1) return tensor def tf_feature_extractor(input_tensor, resnet): with tf.name_scope('FeatureExtractor'): with tf.name_scope('0.Conv2d'): tensor = tf_Conv2d(input_tensor, resnet[0], first_conv=True) with tf.name_scope('1.BatchNorm2d'): tensor = tf_BatchNorm2d(tensor, resnet[1]) with tf.name_scope('2.ReLU'): tensor = tf.nn.relu(tensor) with tf.name_scope('3.MaxPool2d'): tensor = tf_MaxPool2d(tensor, resnet[3]) tensor = tf_SequentialBottleneck(tensor, 4, resnet) tensor = tf_SequentialBottleneck(tensor, 5, resnet) tensor = tf_SequentialBottleneck(tensor, 6, resnet) tensor = tf.cast(tensor, tf.float16) return tensor def tf_box_predictor(tensor, ssd300_torch): with tf.name_scope('BoxPredictor'): detection_feed = [tensor] for idx, block in enumerate(ssd300_torch.additional_blocks): with tf.name_scope('{}.Sequential'.format(idx)): tensor = tf_Conv2d(tensor, block[0]) tensor = tf_BatchNorm2d(tensor, block[1]) tensor = tf.nn.relu(tensor) tensor = tf_Conv2d(tensor, block[3]) tensor = tf_BatchNorm2d(tensor, block[4]) tensor = tf.nn.relu(tensor) detection_feed.append(tensor) with tf.name_scope('Boxes'): loc = tf_bbox_view(detection_feed, ssd300_torch.loc, ndim=4) with tf.name_scope('Probabilities'): conf = tf_bbox_view(detection_feed, ssd300_torch.conf, ndim=ssd300_torch.label_num) return loc, conf @tfn.fuse(batch_size=1, dynamic_batch_size=True) def tf_ssd300(input_tensor, ssd300_torch): with tf.name_scope('SSD300'): tensor = tf_feature_extractor(input_tensor, ssd300_torch.feature_extractor.feature_extractor) loc, conf = tf_box_predictor(tensor, ssd300_torch) return loc, conf def scale_back_batch(bboxes_in, scores_in, scale_xy, scale_wh, dboxes_xywh): """ Do scale and transform from xywh to ltrb suppose input Nx4xnum_bbox Nxlabel_numxnum_bbox """ with tf.name_scope('ScaleBackBatch'): bboxes_in = tf.transpose(bboxes_in, [0, 2, 1]) scores_in = tf.transpose(scores_in, [0, 2, 1]) bboxes_xy = bboxes_in[:, :, :2] bboxes_wh = bboxes_in[:, :, 2:] bboxes_xy *= scale_xy bboxes_wh *= scale_wh bboxes_xy = bboxes_xy * dboxes_xywh[:, :, 2:] + dboxes_xywh[:, :, :2] bboxes_wh = tf.exp(bboxes_wh) * dboxes_xywh[:, :, 2:] bboxes_wh_half = 0.5 * bboxes_wh bboxes_lt = bboxes_xy - bboxes_wh_half bboxes_rb = bboxes_xy + bboxes_wh_half bboxes_in = tf.concat([bboxes_lt, bboxes_rb], axis=-1) return bboxes_in, tf.nn.softmax(scores_in, axis=-1) def select_nms_outputs(input_tensors): boxes_xywh, scores, classes, valid_detections = input_tensors return boxes_xywh[:valid_detections], scores[:valid_detections], classes[:valid_detections] def postprocessor(ploc_ts, plabel_ts, bbox_scale_hw_ts, scale_xy, scale_wh, dboxes_xywh): with tf.name_scope('Postprocessor'): ploc_ts = tf.cast(ploc_ts, tf.float32) plabel_ts = tf.cast(plabel_ts, tf.float32) bboxes_ts, probs_ts = scale_back_batch(ploc_ts, plabel_ts, scale_xy, scale_wh, dboxes_xywh) bboxes_ts = bboxes_ts[:, :, tf.newaxis, :] probs_ts = probs_ts[:, :, 1:] nms_outputs = tf.image.combined_non_max_suppression( bboxes_ts, probs_ts, max_output_size_per_class=200, max_total_size=200, iou_threshold=0.5, score_threshold=0.05, pad_per_class=False, clip_boxes=False, name='CombinedNonMaxSuppression', ) nmsed_boxes_x0y0x1y1, nmsed_scores, nmsed_classes, valid_detections = nms_outputs nmsed_boxes_x0y0 = nmsed_boxes_x0y0x1y1[..., :2] nmsed_boxes_x1y1 = nmsed_boxes_x0y0x1y1[..., 2:] bbox_scale_hw_ts = bbox_scale_hw_ts[:, tf.newaxis, :] nmsed_boxes_xy = nmsed_boxes_x0y0 * bbox_scale_hw_ts nmsed_boxes_wh = (nmsed_boxes_x1y1 - nmsed_boxes_x0y0) * bbox_scale_hw_ts nmsed_boxes_xywh = tf.concat([nmsed_boxes_xy, nmsed_boxes_wh], axis=-1) nmsed_boxes_xywh, nmsed_scores, nmsed_classes = tf.map_fn( select_nms_outputs, (nmsed_boxes_xywh, nmsed_scores, nmsed_classes, valid_detections), dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16) return nmsed_boxes_xywh, nmsed_scores, nmsed_classes class DefaultBoxes(object): def __init__(self, fig_size, feat_size, steps, scales, aspect_ratios, scale_xy=0.1, scale_wh=0.2): self.feat_size = feat_size self.fig_size = fig_size self.scale_xy_ = scale_xy self.scale_wh_ = scale_wh # According to https://github.com/weiliu89/caffe # Calculation method slightly different from paper self.steps = steps self.scales = scales fk = fig_size/np.array(steps) self.aspect_ratios = aspect_ratios self.default_boxes = [] # size of feature and number of feature for idx, sfeat in enumerate(self.feat_size): sk1 = scales[idx]/fig_size sk2 = scales[idx+1]/fig_size sk3 = np.sqrt(sk1*sk2) all_sizes = [(sk1, sk1), (sk3, sk3)] for alpha in aspect_ratios[idx]: w, h = sk1*np.sqrt(alpha), sk1/np.sqrt(alpha) all_sizes.append((w, h)) all_sizes.append((h, w)) for w, h in all_sizes: for i, j in itertools.product(range(sfeat), repeat=2): cx, cy = (j+0.5)/fk[idx], (i+0.5)/fk[idx] self.default_boxes.append((cx, cy, w, h)) self.dboxes = np.array(self.default_boxes) self.dboxes = self.dboxes.clip(min=0, max=1) # For IoU calculation self.dboxes_ltrb = self.dboxes.copy() self.dboxes_ltrb[:, 0] = self.dboxes[:, 0] - 0.5 * self.dboxes[:, 2] self.dboxes_ltrb[:, 1] = self.dboxes[:, 1] - 0.5 * self.dboxes[:, 3] self.dboxes_ltrb[:, 2] = self.dboxes[:, 0] + 0.5 * self.dboxes[:, 2] self.dboxes_ltrb[:, 3] = self.dboxes[:, 1] + 0.5 * self.dboxes[:, 3] @property def scale_xy(self): return self.scale_xy_ @property def scale_wh(self): return self.scale_wh_ def __call__(self, order="ltrb"): if order == "ltrb": return self.dboxes_ltrb if order == "xywh": return self.dboxes def dboxes300_coco(): figsize = 300 feat_size = [38, 19, 10, 5, 3, 1] steps = [8, 16, 32, 64, 100, 300] # use the scales here: https://github.com/amdegroot/ssd.pytorch/blob/master/data/config.py scales = [21, 45, 99, 153, 207, 261, 315] aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]] dboxes = DefaultBoxes(figsize, feat_size, steps, scales, aspect_ratios) return dboxes def main(): parser = argparse.ArgumentParser() parser.add_argument('--torch_checkpoint', required=True, help='Path to PyTorch SSD300 model checkpoint') parser.add_argument('--output_saved_model', required=True, help='Output TensorFlow SavedModel that runs on Inferentia') parser.add_argument('--disable_version_check', action='store_true') args = parser.parse_args() if os.path.exists(args.output_saved_model): raise OSError('SavedModel dir {} already exists'.format(args.output_saved_model)) if not args.disable_version_check: neuroncc_version = LooseVersion(pkg_resources.get_distribution('neuron-cc').version) if neuroncc_version < LooseVersion('1.0.18000'): raise RuntimeError( 'neuron-cc version {} is too low for this demo. Please upgrade ' 'by "pip install -U neuron-cc --extra-index-url=https://pip.repos.neuron.amazonaws.com"'.format(neuroncc_version)) tfn_version = LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) if tfn_version < LooseVersion('1.15.3.1.0.1900.0'): raise RuntimeError( 'tensorflow-neuron version {} is too low for this demo. Please upgrade ' 'by "pip install -U tensorflow-neuron --extra-index-url=https://pip.repos.neuron.amazonaws.com"'.format(tfn_version)) sys.path.append(os.getcwd()) from DeepLearningExamples.PyTorch.Detection.SSD.src import model as torch_ssd300_model ssd300_torch = torch_ssd300_model.SSD300() ckpt = torch.load(args.torch_checkpoint, map_location=torch.device('cpu')) ssd300_torch.load_state_dict(ckpt['model']) ssd300_torch.eval() input_tensor = tf.placeholder(tf.string, [None]) image_tensor, bbox_scale_hw_tensor = preprocessor(input_tensor, [300, 300]) dboxes = dboxes300_coco() dboxes_xywh = dboxes(order="xywh")[np.newaxis, ...] ploc_tensor, plabel_tensor = tf_ssd300(image_tensor, ssd300_torch) boxes_tensor, scores_tensor, classes_tensor = postprocessor( ploc_tensor, plabel_tensor, bbox_scale_hw_tensor, dboxes.scale_xy, dboxes.scale_wh, dboxes_xywh) outputs = { 'boxes': boxes_tensor, 'scores': scores_tensor, 'classes': classes_tensor, } sess = tf.Session() try: sess.run(outputs) except: pass for op in sess.graph.get_operations(): if op.type == 'NeuronOp': if not op.get_attr('executable'): raise AttributeError( 'Neuron executable (neff) is empty. Please check neuron-cc is installed and working properly ' '("pip install neuron-cc --force --extra-index-url=https://pip.repos.neuron.amazonaws.com" ' 'to force reinstall neuron-cc).') model_config = op.node_def.attr['model_config'].list if model_config.i: model_config.i[0] = 1 else: model_config.i.extend([1, 1, 1, 10]) op._set_attr('model_config', attr_value_pb2.AttrValue(list=model_config)) tf.saved_model.simple_save(sess, args.output_saved_model, {'batch_image': input_tensor}, outputs) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/tensorflow-neuronx/tfneuronx-roberta-base-tutorial.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "e91cf83b", "metadata": {}, "source": [ "# Running Huggingface Roberta-Base with TensorFlow-NeuronX" ] }, { "cell_type": "markdown", "id": "71394e1e", "metadata": {}, "source": [ "This tutorial demonstrates how to compile the Huggingface roberta-base model and infer on a trn1.2xlarge instance with \n", "```tensorflow-neuronx```. To compile larger models like roberta-large, please consider using an inf2 instance." ] }, { "cell_type": "markdown", "id": "828ef9bd", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "5becc549", "metadata": {}, "source": [ "To run this tutorial please follow the instructions for [TensorFlow-NeuronX Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html) and the [Jupyter Notebook Quickstart](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html) and set your kernel to \"Python (tensorflow-neuronx)\".\n", "\n", "Next, install some additional dependencies." ] }, { "cell_type": "code", "execution_count": null, "id": "ee1a3b84", "metadata": {}, "outputs": [], "source": [ "%env TOKENIZERS_PARALLELISM=True #Supresses tokenizer warnings making errors easier to detect\n", "!pip install transformers" ] }, { "cell_type": "markdown", "id": "c301cfce", "metadata": {}, "source": [ "## Download From Huggingface and Compile for AWS-Neuron" ] }, { "cell_type": "code", "execution_count": null, "id": "92e8050d", "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf\n", "import tensorflow_neuronx as tfnx\n", "from transformers import RobertaTokenizer, TFRobertaModel\n", "from transformers import BertTokenizer, TFBertModel\n", "\n", "# Create a wrapper for the roberta model that will accept inputs as a list\n", "# instead of a dictionary. This will allow the compiled model to be saved\n", "# to disk with the model.save() fucntion.\n", "class RobertaWrapper(tf.keras.Model):\n", " def __init__(self, model):\n", " super().__init__()\n", " self.model = model\n", " def __call__(self, example_inputs):\n", " return self.model({'input_ids' : example_inputs[0], 'attention_mask' : example_inputs[1]})\n", " \n", "\n", "tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\n", "model = RobertaWrapper(TFRobertaModel.from_pretrained('roberta-base'))\n", "\n", "batch_size = 16\n", "\n", "# create example inputs with a batch size of 16\n", "text = [\"Paris is the of France.\"] * batch_size\n", "encoded_input = tokenizer(text, return_tensors='tf', padding='max_length', max_length=64)\n", "\n", "# turn inputs into a list\n", "example_input = [encoded_input['input_ids'], encoded_input['attention_mask']]\n", "\n", "#compile\n", "model_neuron = tfnx.trace(model, example_input)\n", "\n", "print(\"Running on neuron:\", model_neuron(example_input))\n", "\n", "# save the model to disk to save recompilation time for next usage\n", "model_neuron.save('./roberta-neuron-b16')" ] }, { "cell_type": "markdown", "id": "0f2e159a", "metadata": {}, "source": [ "## Run Basic Inference Benchmarking" ] }, { "cell_type": "code", "execution_count": null, "id": "ccf22e74", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import concurrent.futures\n", "import time\n", "\n", "reloaded_neuron_model = tf.keras.models.load_model('./roberta-neuron-b16')\n", "print(\"Reloaded model running on neuron:\", reloaded_neuron_model(example_input))\n", "\n", "num_threads = 4\n", "num_inferences = 1000\n", "\n", "latency_list = []\n", "def inference_with_latency_calculation(example_input):\n", " global latency_list\n", " start = time.time()\n", " result = reloaded_neuron_model(example_input)\n", " end = time.time()\n", " latency_list.append((end-start) * 1000)\n", " return result\n", "\n", "start = time.time()\n", "with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:\n", " futures = []\n", " for i in range(num_inferences):\n", " futures.append(executor.submit(inference_with_latency_calculation, example_input))\n", " for future in concurrent.futures.as_completed(futures):\n", " get_result = future.result()\n", "end = time.time()\n", "\n", "total_time = end - start\n", "\n", "print(f\"Throughput was {(num_inferences * batch_size)/total_time} samples per second.\")\n", "print(f\"Latency p50 was {np.percentile(latency_list, 50)} ms\")\n", "print(f\"Latency p90 was {np.percentile(latency_list, 90)} ms\")\n", "print(f\"Latency p99 was {np.percentile(latency_list, 99)} ms\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python (Neuron TensorFlow)", "language": "python", "name": "aws_neuron_venv_tf" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: src/examples/tensorflow/tensorflow_resnet50/resnet50.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "a3bskVXPvchm" }, "source": [ "# Running ResNet50 on Inferentia\n", "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text" }, "source": [ "## Introduction:" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Rb5rSpcZvYbX" }, "source": [ "In this tutorial we will compile and deploy ResNet50 model for Inferentia.\n", "In this tutorial we provide two main sections:\n", "1. Compile the ResNet50 model.\n", "2. Infer the same compiled model.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\n", "\n", "Instructions of how to setup Neuron Tensorflow environment and run the tutorial as a Jupyter notebook are available in the [Tensorflow Quick Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.html#tensorflow-tutorial-setup)\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "!pip install tensorflow_neuron==1.15.5.2.8.9.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com/\n", "!pip install neuron_cc==1.13.5.0 --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "E8FhiMivhcYB" }, "source": [ "## Compile for Neuron\n", "\n", "A trained model must be compiled to Inferentia target before it can be deployed on Inferentia instances. In this step we compile the Keras ResNet50 model and export it as a SavedModel which is an interchange format for TensorFlow models.\n", "At the end of compilation, the compiled SavedModel is saved in resnet50_neuron local directory:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "import shutil\n", "import tensorflow as tf\n", "import tensorflow.neuron as tfn\n", "import tensorflow.compat.v1.keras as keras\n", "from tensorflow.keras.applications.resnet50 import ResNet50\n", "from tensorflow.keras.applications.resnet50 import preprocess_input\n", "\n", "# Create a workspace\n", "WORKSPACE = './ws_resnet50'\n", "os.makedirs(WORKSPACE, exist_ok=True)\n", "\n", "# Prepare export directory (old one removed)\n", "model_dir = os.path.join(WORKSPACE, 'resnet50')\n", "compiled_model_dir = os.path.join(WORKSPACE, 'resnet50_neuron')\n", "shutil.rmtree(model_dir, ignore_errors=True)\n", "shutil.rmtree(compiled_model_dir, ignore_errors=True)\n", "\n", "# Instantiate Keras ResNet50 model\n", "keras.backend.set_learning_phase(0)\n", "keras.backend.set_image_data_format('channels_last')\n", "\n", "model = ResNet50(weights='imagenet')\n", "\n", "# Export SavedModel\n", "tf.saved_model.simple_save(\n", " session = keras.backend.get_session(),\n", " export_dir = model_dir,\n", " inputs = {'input': model.inputs[0]},\n", " outputs = {'output': model.outputs[0]})\n", "\n", "# Compile using Neuron\n", "tfn.saved_model.compile(model_dir, compiled_model_dir)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "I52jQOyO8vAn" }, "source": [ "## Deploy on Inferentia\n", "\n", "Using same instance to deploy the model.\n", "In case of different deployment instance, launch a deployment inf1 instance and copy compiled model to the deployment inf1 instance.\n", "\n", "Download the example image, and install pillow module for inference on deployement instance" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg\n", "!pip install pillow # Necessary for loading images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### After downloading the example image, run the inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "import numpy as np\n", "import tensorflow as tf\n", "from tensorflow.keras.preprocessing import image\n", "from tensorflow.keras.applications import resnet50\n", "\n", "tf.keras.backend.set_image_data_format('channels_last')\n", "\n", "# Create input from image\n", "img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))\n", "img_arr = image.img_to_array(img_sgl)\n", "img_arr2 = np.expand_dims(img_arr, axis=0)\n", "img_arr3 = resnet50.preprocess_input(img_arr2)\n", "\n", "# Load model\n", "COMPILED_MODEL_DIR = './ws_resnet50/resnet50_neuron/'\n", "predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR)\n", "\n", "# Run inference\n", "model_feed_dict={'input': img_arr3}\n", "infa_rslts = predictor_inferentia(model_feed_dict);\n", "\n", "# Display results\n", "print(resnet50.decode_predictions(infa_rslts[\"output\"], top=5)[0])\n", "\n", "# Sample output will look like below:\n", "#[('n02123045', 'tabby', 0.68817204), ('n02127052', 'lynx', 0.12701613), ('n02123159', 'tiger_cat', 0.08736559), ('n02124075', 'Egyptian_cat', 0.063844085), ('n02128757', 'snow_leopard', 0.009240591)]" ] } ], "metadata": { "colab": { "default_view": {}, "name": "Untitled", "provenance": [], "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 1 } ================================================ FILE: src/examples/tensorflow/tensorflow_serving_tutorial.rst ================================================ .. _tensorflow-serving-neuronrt-visible-cores: Using NEURON_RT_VISIBLE_CORES with TensorFlow Serving ===================================================== TensorFlow serving allows customers to scale-up inference workloads across a network. TensorFlow Neuron Serving uses the same API as normal TensorFlow Serving with two differences: (a) the saved model must be compiled for Inferentia and (b) the entry point is a different binary named ``tensorflow_model_server_neuron``. Follow the steps below to install the package using apt-get or yum. This will be pre-installed in a future relase. Install TensorFlow Model Server and Serving API ----------------------------------------------- Follow the steps in the :ref:`install-neuron-tensorflow`. Then ensure you install using either apt-get or yum. If using TF 1.x, install the appropriate version (see above).: .. code:: bash sudo apt-get install tensorflow-model-server-neuron or .. code:: bash sudo dnf install tensorflow-model-server-neuron Also, you would need TensorFlow Serving API (use --no-deps to prevent installation of regular tensorflow). Depending on the version of Tensorflow you wish to use: For Tensorflow 1.x: .. code:: bash pip install --no-deps tensorflow_serving_api==1.15 For Tensorflow 2.x: .. code:: bash pip install --no-deps tensorflow_serving_api For the example image preprocessing using Keras preprocessing, the Python Imaging Library Pillow is required: .. code:: bash pip install pillow To workaround h5py issue https://github.com/aws/aws-neuron-sdk/issues/220: .. code:: bash pip install "h5py<3.0.0" Export and Compile Saved Model ------------------------------ The following example shows graph construction followed by the addition of Neuron compilation step before exporting to saved model. For Tensorflow 1.x: .. code:: python import tensorflow as tf import tensorflow.neuron tf.keras.backend.set_learning_phase(0) tf.keras.backend.set_image_data_format('channels_last') model = tf.keras.applications.ResNet50(weights='imagenet') sess = tf.keras.backend.get_session() inputs = {'input': model.inputs[0]} outputs = {'output': model.outputs[0]} # save the model using tf.saved_model.simple_save modeldir = "./resnet50/1" tf.saved_model.simple_save(sess, modeldir, inputs, outputs) # compile the model for Inferentia neuron_modeldir = "./resnet50_inf1/1" tf.neuron.saved_model.compile(modeldir, neuron_modeldir, batch_size=1) For Tensorflow 2.x: .. code:: python import tensorflow as tf import tensorflow.neuron as tfn import numpy as np tf.keras.backend.set_learning_phase(0) tf.keras.backend.set_image_data_format('channels_last') image_sizes = [224, 224] model = tf.keras.applications.ResNet50(weights='imagenet') example_inputs = tf.random.uniform([1, *image_sizes, 3], dtype=tf.float32) # run the model once to define the forward pass and allow for saving model_neuron(example_inputs) model_neuron = tfn.trace(model, example_inputs) tf.keras.models.save_model(model_neuron, './resnet50_inf1/1') Serving Saved Model ------------------- User can now serve the saved model with the tensorflow_model_server_neuron binary. To utilize multiple NeuronCores, it is recommended to launch multiple tensorflow model servers that listen to the same gRPC port: .. code:: bash export NEURON_RT_VISIBLE_CORES=0 # important to set this environment variable before launching model servers tensorflow_model_server_neuron --model_name=resnet50_inf1 \ --model_base_path=$(pwd)/resnet50_inf1/ --port=8500 #then to run another server on a different neuron core open another #window and run this, except this time set NEURON_RT_VISIBLE_CORES=1 #you can keep doing this up to the number of Neuron Cores on your machine export NEURON_RT_VISIBLE_CORES=1 tensorflow_model_server_neuron --model_name=resnet50_inf1 \ --model_base_path=$(pwd)/resnet50_inf1/ --port=8500 The compiled model is staged in Inferentia DRAM by the server to prepare for inference. Generate inference requests to the model server ----------------------------------------------- Now run inferences via GRPC as shown in the following sample client code: For Tensorflow 1.x: .. code:: python import numpy as np import grpc import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow.keras.applications.resnet50 import decode_predictions from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc if __name__ == '__main__': channel = grpc.insecure_channel('localhost:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) img_file = tf.keras.utils.get_file( "./kitten_small.jpg", "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg") img = image.load_img(img_file, target_size=(224, 224)) img_array = preprocess_input(image.img_to_array(img)[None, ...]) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50_inf1' request.inputs['input'].CopyFrom( tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape)) result = stub.Predict(request) prediction = tf.make_ndarray(result.outputs['output']) print(decode_predictions(prediction)) For Tensorflow 2.x: .. code:: python import numpy as np import grpc import tensorflow as tf from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc from tensorflow.keras.applications.resnet50 import decode_predictions tf.keras.backend.set_image_data_format('channels_last') if __name__ == '__main__': channel = grpc.insecure_channel('localhost:8500') stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) img_file = tf.keras.utils.get_file( "./kitten_small.jpg", "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg") img = image.load_img(img_file, target_size=(224, 224)) img_array = preprocess_input(image.img_to_array(img)[None, ...]) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50_inf1' request.inputs['input_1'].CopyFrom( tf.make_tensor_proto(img_array, shape=img_array.shape)) result = stub.Predict(request) prediction = tf.make_ndarray(result.outputs['output_1']) print(decode_predictions(prediction)) ================================================ FILE: src/examples/tensorflow/yolo_v3_demo/yolo_v3.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [Broken] Evaluate YOLO v3 on Inferentia\n", "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This tutorial walks through compiling and evaluating YOLO v3 model on Inferentia using the AWS Neuron SDK.\n", "\n", "\n", "In this tutorial we provide two main sections:\n", "\n", "1. Download Dataset and Generate Pretrained SavedModel\n", "\n", "2. Compile the YOLO v3 model.\n", "\n", "3. Deploy the same compiled model.\n", "\n", "Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](../../../../frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow). You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page.\n", "\n", "Instructions of how to setup Neuron Tensorflow environment and run the tutorial as a Jupyter notebook are available in the Tutorial main page [Tensorflow-YOLO_v3 Tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/tutorials/yolo_v3_demo/yolo_v3_demo.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This demo requires the following pip packages:\n", "\n", "`pillow matplotlib pycocotools`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%pip install tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 requests pillow matplotlib pycocotools==2.0.1 numpy==1.18.2 torch~=1.5.0 --force \\\n", " --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Download Dataset and Generate Pretrained SavedModel\n", "### Download COCO 2017 validation dataset\n", "\n", "We start by downloading the COCO validation dataset, which we will use to validate our model. The COCO 2017 dataset is widely used for object-detection, segmentation and image captioning." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!curl -LO http://images.cocodataset.org/zips/val2017.zip\n", "!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", "!unzip -q val2017.zip\n", "!unzip annotations_trainval2017.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Generate YOLO v3 tensorflow SavedModel (pretrained on COCO 2017 dataset)\n", "\n", "Script yolo_v3_coco_saved_model.py will generate a tensorflow SavedModel using pretrained weights from https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%run yolo_v3_coco_saved_model.py ./yolo_v3_coco_saved_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tensorflow SavedModel can be loaded as a tensorflow predictor. When a JPEG format image is provided as input, the output result of the tensorflow predictor contains information for drawing bounding boxes and classification results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import json\n", "import tensorflow as tf\n", "from PIL import Image\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "\n", "# launch predictor and run inference on an arbitrary image in the validation dataset\n", "yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v3_coco_saved_model')\n", "image_path = './val2017/000000581781.jpg'\n", "with open(image_path, 'rb') as f:\n", " feeds = {'image': [f.read()]}\n", "results = yolo_pred_cpu(feeds)\n", "\n", "# load annotations to decode classification result\n", "with open('./annotations/instances_val2017.json') as f:\n", " annotate_json = json.load(f)\n", "label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\n", "\n", "# draw picture and bounding boxes\n", "fig, ax = plt.subplots(figsize=(10, 10))\n", "ax.imshow(Image.open(image_path).convert('RGB'))\n", "wanted = results['scores'][0] > 0.1\n", "for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):\n", " xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]\n", " rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\n", " ax.add_patch(rect)\n", " rx, ry = rect.get_xy()\n", " rx = rx + rect.get_width() / 2.0\n", " ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\n", " ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Compile the Pretrained SavedModel for Neuron\n", "\n", "We make use of the Python compilation API `tfn.saved_model.compile` that is available in `tensorflow-neuron<2`. For the purpose of reducing Neuron runtime overhead, it is necessary to make use of arguments `no_fuse_ops` and `minimum_segment_size`.\n", "Compiled model is saved in ./yolo_v3_coco_saved_model_neuron." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import shutil\n", "import tensorflow as tf\n", "import tensorflow.neuron as tfn\n", "\n", "\n", "def no_fuse_condition(op):\n", " return op.name.startswith('Preprocessor') or op.name.startswith('Postprocessor')\n", "\n", "with tf.Session(graph=tf.Graph()) as sess:\n", " tf.saved_model.loader.load(sess, ['serve'], './yolo_v3_coco_saved_model')\n", " no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]\n", "shutil.rmtree('./yolo_v3_coco_saved_model_neuron', ignore_errors=True)\n", "result = tfn.saved_model.compile(\n", " './yolo_v3_coco_saved_model', './yolo_v3_coco_saved_model_neuron',\n", " # to enforce trivial compilable subgraphs to run on CPU\n", " no_fuse_ops=no_fuse_ops,\n", " minimum_segment_size=100,\n", " batch_size=2,\n", " dynamic_batch_size=True,\n", ")\n", "print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deploy the model on Inferentia\n", "## Part 3:Evaluate Model Quality after Compilation\n", "\n", "### Define evaluation functions\n", "We first define some handy helper functions for running evaluation on the COCO 2017 dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import os\n", "import json\n", "import time\n", "import numpy as np\n", "import tensorflow as tf\n", "from pycocotools.coco import COCO\n", "from pycocotools.cocoeval import COCOeval\n", "\n", "\n", "def cocoapi_eval(jsonfile,\n", " style,\n", " coco_gt=None,\n", " anno_file=None,\n", " max_dets=(100, 300, 1000)):\n", " \"\"\"\n", " Args:\n", " jsonfile: Evaluation json file, eg: bbox.json, mask.json.\n", " style: COCOeval style, can be `bbox` , `segm` and `proposal`.\n", " coco_gt: Whether to load COCOAPI through anno_file,\n", " eg: coco_gt = COCO(anno_file)\n", " anno_file: COCO annotations file.\n", " max_dets: COCO evaluation maxDets.\n", " \"\"\"\n", " assert coco_gt is not None or anno_file is not None\n", "\n", " if coco_gt is None:\n", " coco_gt = COCO(anno_file)\n", " print(\"Start evaluate...\")\n", " coco_dt = coco_gt.loadRes(jsonfile)\n", " if style == 'proposal':\n", " coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')\n", " coco_eval.params.useCats = 0\n", " coco_eval.params.maxDets = list(max_dets)\n", " else:\n", " coco_eval = COCOeval(coco_gt, coco_dt, style)\n", " coco_eval.evaluate()\n", " coco_eval.accumulate()\n", " coco_eval.summarize()\n", " return coco_eval.stats\n", "\n", "\n", "def bbox_eval(anno_file, bbox_list):\n", " coco_gt = COCO(anno_file)\n", "\n", " outfile = 'bbox_detections.json'\n", " print('Generating json file...')\n", " with open(outfile, 'w') as f:\n", " json.dump(bbox_list, f)\n", "\n", " map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt)\n", " return map_stats\n", "\n", "\n", "def get_image_as_bytes(images, eval_pre_path):\n", " batch_im_id_list = []\n", " batch_im_name_list = []\n", " batch_img_bytes_list = []\n", " n = len(images)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " for i, im in enumerate(images):\n", " im_id = im['id']\n", " file_name = im['file_name']\n", " if i % eval_batch_size == 0 and i != 0:\n", " batch_im_id_list.append(batch_im_id)\n", " batch_im_name_list.append(batch_im_name)\n", " batch_img_bytes_list.append(batch_img_bytes)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " batch_im_id.append(im_id)\n", " batch_im_name.append(file_name)\n", "\n", " with open(os.path.join(eval_pre_path, file_name), 'rb') as f:\n", " batch_img_bytes.append(f.read())\n", " return batch_im_id_list, batch_im_name_list, batch_img_bytes_list\n", "\n", "\n", "def analyze_bbox(results, batch_im_id, _clsid2catid):\n", " bbox_list = []\n", " k = 0\n", " for boxes, scores, classes in zip(results['boxes'], results['scores'], results['classes']):\n", " if boxes is not None:\n", " im_id = batch_im_id[k]\n", " n = len(boxes)\n", " for p in range(n):\n", " clsid = classes[p]\n", " score = scores[p]\n", " xmin, ymin, xmax, ymax = boxes[p]\n", " catid = (_clsid2catid[int(clsid)])\n", " w = xmax - xmin + 1\n", " h = ymax - ymin + 1\n", "\n", " bbox = [xmin, ymin, w, h]\n", " # Round to the nearest 10th to avoid huge file sizes, as COCO suggests\n", " bbox = [round(float(x) * 10) / 10 for x in bbox]\n", " bbox_res = {\n", " 'image_id': im_id,\n", " 'category_id': catid,\n", " 'bbox': bbox,\n", " 'score': float(score),\n", " }\n", " bbox_list.append(bbox_res)\n", " k += 1\n", " return bbox_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the actual evaluation loop. To fully utilize all four cores on one Inferentia, the optimal setup is to run multi-threaded inference using a `ThreadPoolExecutor`. The following cell is a multi-threaded adaptation of the evaluation routine at https://github.com/miemie2013/Keras-YOLOv4/blob/910c4c6f7265f5828fceed0f784496a0b46516bf/tools/cocotools.py#L97." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from concurrent import futures\n", "\n", "def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):\n", " batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)\n", "\n", " # warm up\n", " yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})\n", "\n", " with futures.ThreadPoolExecutor(4) as exe:\n", " fut_im_list = []\n", " fut_list = []\n", " start_time = time.time()\n", " for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):\n", " if len(batch_img_bytes) != eval_batch_size:\n", " continue\n", " fut = exe.submit(yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})\n", " fut_im_list.append((batch_im_id, batch_im_name))\n", " fut_list.append(fut)\n", " bbox_list = []\n", " count = 0\n", " for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):\n", " results = fut.result()\n", " bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))\n", " for _ in batch_im_id:\n", " count += 1\n", " if count % 100 == 0:\n", " print('Test iter {}'.format(count))\n", " print('==================== Performance Measurement ====================')\n", " print('Finished inference on {} images in {} seconds'.format(len(images), time.time() - start_time))\n", " print('=================================================================')\n", " # start evaluation\n", " box_ap_stats = bbox_eval(anno_file, bbox_list)\n", " return box_ap_stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate mean average precision (mAP) score\n", "Here is the code to calculate mAP scores of the YOLO v3 model. The expected mAP score is around 0.328 if we use the pretrained weights." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "yolo_pred = tf.contrib.predictor.from_saved_model('./yolo_v3_coco_saved_model_neuron')\n", "\n", "val_coco_root = './val2017'\n", "val_annotate = './annotations/instances_val2017.json'\n", "clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,\n", " 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,\n", " 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,\n", " 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,\n", " 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,\n", " 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,\n", " 75: 86, 76: 87, 77: 88, 78: 89, 79: 90}\n", "eval_batch_size = 8\n", "with open(val_annotate, 'r', encoding='utf-8') as f2:\n", " for line in f2:\n", " line = line.strip()\n", " dataset = json.loads(line)\n", " images = dataset['images']\n", "box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.9 64-bit", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.9" }, "vscode": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/tensorflow/yolo_v3_demo/yolo_v3_coco_saved_model.py ================================================ import argparse import os import urllib.request import tempfile import shutil from functools import partial import numpy as np import tensorflow as tf STRIDES = [8, 16, 32] ANCHORS = np.array([1.25,1.625, 2.0,3.75, 4.125,2.875, 1.875,3.8125, 3.875,2.8125, 3.6875,7.4375, 3.625,2.8125, 4.875,6.1875, 11.65625,10.1875]).astype(np.float32).reshape([3, 3, 2]) ANCHOR_PER_SCALE = 3 BOX_SCORE_THRESH = 0.3 UPSAMPLE_METHOD = "resize" NUM_CLASSES = 80 class YOLOV3(object): """Implement tensoflow yolov3 here""" def __init__(self, input_data, input_size, trainable): self.trainable = trainable self.num_class = NUM_CLASSES self.strides = STRIDES self.anchors = ANCHORS self.anchor_per_scale = ANCHOR_PER_SCALE self.box_score_thresh = BOX_SCORE_THRESH self.upsample_method = UPSAMPLE_METHOD input_data, decoded_shape = preprocessor(input_data, [input_size, input_size]) self.conv_lbbox, self.conv_mbbox, self.conv_sbbox = self.__build_nework(input_data) def decode_boxes(bboxes_and_decoded_shape): conv_lbbox, conv_mbbox, conv_sbbox, decoded_shape = bboxes_and_decoded_shape conv_lbbox = tf.cast(conv_lbbox, tf.float32) conv_mbbox = tf.cast(conv_mbbox, tf.float32) conv_sbbox = tf.cast(conv_sbbox, tf.float32) conv_lbbox = conv_lbbox[tf.newaxis, ...] conv_mbbox = conv_mbbox[tf.newaxis, ...] conv_sbbox = conv_sbbox[tf.newaxis, ...] decoded_shape = decoded_shape[tf.newaxis, ...] with tf.variable_scope('pred_sbbox'): pred_sbbox_coors, pred_sbbox_class_scores = self.decode(conv_sbbox, self.anchors[0], self.strides[0], decoded_shape, input_size) with tf.variable_scope('pred_mbbox'): pred_mbbox_coors, pred_mbbox_class_scores = self.decode(conv_mbbox, self.anchors[1], self.strides[1], decoded_shape, input_size) with tf.variable_scope('pred_lbbox'): pred_lbbox_coors, pred_lbbox_class_scores = self.decode(conv_lbbox, self.anchors[2], self.strides[2], decoded_shape, input_size) with tf.variable_scope('pred_bbox_filter'): pred_bbox_coors = tf.concat([pred_sbbox_coors, pred_mbbox_coors, pred_lbbox_coors], axis=1) pred_bbox_class_scores = tf.concat([pred_sbbox_class_scores, pred_mbbox_class_scores, pred_lbbox_class_scores], axis=1) nms_top_k = 100 nms_thresh= 0.45 coors, scores, classes, valid_detections = tf.image.combined_non_max_suppression( pred_bbox_coors, pred_bbox_class_scores, max_output_size_per_class=nms_top_k, max_total_size=nms_top_k, iou_threshold=nms_thresh, score_threshold=self.box_score_thresh, pad_per_class=False, clip_boxes=False, name='CombinedNonMaxSuppression', ) scores = scores[..., tf.newaxis] classes = classes[..., tf.newaxis] return coors[0], scores[0], classes[0] with tf.name_scope('Postprocessor'): coors, scores, classes = tf.map_fn( decode_boxes, [self.conv_lbbox, self.conv_mbbox, self.conv_sbbox, decoded_shape], dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16) with tf.variable_scope('pred_bbox'): self.pred_bbox_boxes = tf.identity(coors, name='boxes') self.pred_bbox_scores = tf.identity(scores[..., 0], name='scores') self.pred_bbox_classes = tf.identity(classes[..., 0], name='classes') def __build_nework(self, input_data): route_1, route_2, input_data = darknet53(input_data, self.trainable) input_data = convolutional(input_data, (1, 1, 1024, 512), self.trainable, 'conv52') input_data = convolutional(input_data, (3, 3, 512, 1024), self.trainable, 'conv53') input_data = convolutional(input_data, (1, 1, 1024, 512), self.trainable, 'conv54') input_data = convolutional(input_data, (3, 3, 512, 1024), self.trainable, 'conv55') input_data = convolutional(input_data, (1, 1, 1024, 512), self.trainable, 'conv56') conv_lobj_branch = convolutional(input_data, (3, 3, 512, 1024), self.trainable, name='conv_lobj_branch') conv_lbbox = convolutional(conv_lobj_branch, (1, 1, 1024, 3*(self.num_class + 5)), trainable=self.trainable, name='conv_lbbox', activate=False, bn=False) input_data = convolutional(input_data, (1, 1, 512, 256), self.trainable, 'conv57') input_data = upsample(input_data, name='upsample0', method=self.upsample_method) with tf.variable_scope('route_1'): input_data = tf.concat([input_data, route_2], axis=-1) input_data = convolutional(input_data, (1, 1, 768, 256), self.trainable, 'conv58') input_data = convolutional(input_data, (3, 3, 256, 512), self.trainable, 'conv59') input_data = convolutional(input_data, (1, 1, 512, 256), self.trainable, 'conv60') input_data = convolutional(input_data, (3, 3, 256, 512), self.trainable, 'conv61') input_data = convolutional(input_data, (1, 1, 512, 256), self.trainable, 'conv62') conv_mobj_branch = convolutional(input_data, (3, 3, 256, 512), self.trainable, name='conv_mobj_branch' ) conv_mbbox = convolutional(conv_mobj_branch, (1, 1, 512, 3*(self.num_class + 5)), trainable=self.trainable, name='conv_mbbox', activate=False, bn=False) input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv63') input_data = upsample(input_data, name='upsample1', method=self.upsample_method) with tf.variable_scope('route_2'): input_data = tf.concat([input_data, route_1], axis=-1) input_data = convolutional(input_data, (1, 1, 384, 128), self.trainable, 'conv64') input_data = convolutional(input_data, (3, 3, 128, 256), self.trainable, 'conv65') input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv66') input_data = convolutional(input_data, (3, 3, 128, 256), self.trainable, 'conv67') input_data = convolutional(input_data, (1, 1, 256, 128), self.trainable, 'conv68') conv_sobj_branch = convolutional(input_data, (3, 3, 128, 256), self.trainable, name='conv_sobj_branch') conv_sbbox = convolutional(conv_sobj_branch, (1, 1, 256, 3*(self.num_class + 5)), trainable=self.trainable, name='conv_sbbox', activate=False, bn=False) return conv_lbbox, conv_mbbox, conv_sbbox def decode(self, conv_output, anchors, stride, decoded_shape, input_size): conv_output = tf.cast(conv_output, tf.float32) """ return tensor of shape [batch_size, output_size, output_size, anchor_per_scale, 5 + num_classes] contains (x, y, w, h, score, probability) """ conv_shape = tf.shape(conv_output) batch_size = conv_shape[0] output_size = conv_shape[1] anchor_per_scale = len(anchors) conv_output = tf.reshape(conv_output, (batch_size, output_size, output_size, anchor_per_scale, 5 + self.num_class)) conv_raw_dxdy = conv_output[:, :, :, :, 0:2] conv_raw_dwdh = conv_output[:, :, :, :, 2:4] conv_raw_conf = conv_output[:, :, :, :, 4:5] conv_raw_prob = conv_output[:, :, :, :, 5: ] y = tf.tile(tf.range(output_size, dtype=tf.int32)[:, tf.newaxis], [1, output_size]) x = tf.tile(tf.range(output_size, dtype=tf.int32)[tf.newaxis, :], [output_size, 1]) xy_grid = tf.concat([x[:, :, tf.newaxis], y[:, :, tf.newaxis]], axis=-1) xy_grid = tf.tile(xy_grid[tf.newaxis, :, :, tf.newaxis, :], [batch_size, 1, 1, anchor_per_scale, 1]) xy_grid = tf.cast(xy_grid, tf.float32) pred_xy = (tf.sigmoid(conv_raw_dxdy) + xy_grid) * stride pred_wh = (tf.exp(conv_raw_dwdh) * anchors) * stride pred_xywh = tf.concat([pred_xy, pred_wh], axis=-1) pred_conf = tf.sigmoid(conv_raw_conf) pred_prob = tf.sigmoid(conv_raw_prob) pred_xywh = tf.reshape(pred_xywh, (-1, output_size*output_size*3, pred_xywh.shape[-1])) pred_conf = tf.reshape(pred_conf, (-1, output_size*output_size*3)) pred_prob = tf.reshape(pred_prob, (-1, output_size*output_size*3, pred_prob.shape[-1])) return tf_postprocess_boxes(pred_xywh, pred_conf, pred_prob, decoded_shape, input_size, self.box_score_thresh) def darknet53(input_data, trainable): with tf.variable_scope('darknet'): input_data = convolutional(input_data, filters_shape=(3, 3, 3, 32), trainable=trainable, name='conv0') input_data = convolutional(input_data, filters_shape=(3, 3, 32, 64), trainable=trainable, name='conv1', downsample=True) for i in range(1): input_data = residual_block(input_data, 64, 32, 64, trainable=trainable, name='residual%d' %(i+0)) input_data = convolutional(input_data, filters_shape=(3, 3, 64, 128), trainable=trainable, name='conv4', downsample=True) for i in range(2): input_data = residual_block(input_data, 128, 64, 128, trainable=trainable, name='residual%d' %(i+1)) input_data = convolutional(input_data, filters_shape=(3, 3, 128, 256), trainable=trainable, name='conv9', downsample=True) for i in range(8): input_data = residual_block(input_data, 256, 128, 256, trainable=trainable, name='residual%d' %(i+3)) route_1 = input_data input_data = convolutional(input_data, filters_shape=(3, 3, 256, 512), trainable=trainable, name='conv26', downsample=True) for i in range(8): input_data = residual_block(input_data, 512, 256, 512, trainable=trainable, name='residual%d' %(i+11)) route_2 = input_data input_data = convolutional(input_data, filters_shape=(3, 3, 512, 1024), trainable=trainable, name='conv43', downsample=True) for i in range(4): input_data = residual_block(input_data, 1024, 512, 1024, trainable=trainable, name='residual%d' %(i+19)) return route_1, route_2, input_data def convolutional(input_data, filters_shape, trainable, name, downsample=False, activate=True, bn=True): with tf.variable_scope(name): if downsample: pad_h, pad_w = (filters_shape[0] - 2) // 2 + 1, (filters_shape[1] - 2) // 2 + 1 paddings = tf.constant([[0, 0], [pad_h, pad_h], [pad_w, pad_w], [0, 0]]) input_data = tf.pad(input_data, paddings, 'CONSTANT') strides = (1, 2, 2, 1) padding = 'VALID' else: strides = (1, 1, 1, 1) padding = "SAME" weight = tf.get_variable(name='weight', dtype=tf.float32, trainable=True, shape=filters_shape, initializer=tf.random_normal_initializer(stddev=0.01)) weight = tf.cast(weight, tf.float16) conv = tf.nn.conv2d(input=input_data, filter=weight, strides=strides, padding=padding) if bn: conv = tf.layers.batch_normalization(conv, beta_initializer=tf.zeros_initializer(), gamma_initializer=tf.ones_initializer(), moving_mean_initializer=tf.zeros_initializer(), moving_variance_initializer=tf.ones_initializer(), training=trainable, fused=False) else: bias = tf.get_variable(name='bias', shape=filters_shape[-1], trainable=True, dtype=tf.float32, initializer=tf.constant_initializer(0.0)) bias = tf.cast(bias, tf.float16) conv = tf.nn.bias_add(conv, bias) if activate == True: conv = tf.nn.leaky_relu(conv, alpha=0.1) return conv def residual_block(input_data, input_channel, filter_num1, filter_num2, trainable, name): short_cut = input_data with tf.variable_scope(name): input_data = convolutional(input_data, filters_shape=(1, 1, input_channel, filter_num1), trainable=trainable, name='conv1') input_data = convolutional(input_data, filters_shape=(3, 3, filter_num1, filter_num2), trainable=trainable, name='conv2') residual_output = input_data + short_cut return residual_output def upsample(input_data, name, method="deconv"): assert method in ["resize", "deconv"] if method == "resize": with tf.variable_scope(name): input_shape = tf.shape(input_data) output = tf.image.resize_nearest_neighbor(input_data, (input_shape[1] * 2, input_shape[2] * 2)) if method == "deconv": # replace resize_nearest_neighbor with conv2d_transpose To support TensorRT optimization numm_filter = input_data.shape.as_list()[-1] output = tf.layers.conv2d_transpose(input_data, numm_filter, kernel_size=2, padding='same', strides=(2,2), kernel_initializer=tf.random_normal_initializer()) return output def decode_jpeg_resize(input_tensor, image_size): tensor = tf.image.decode_png(input_tensor, channels=3) shape = tf.shape(tensor) tensor = tf.cast(tensor, tf.float32) tensor = tf.image.resize_image_with_pad(tensor, image_size[0], image_size[1]) tensor /= 255.0 return tf.cast(tensor, tf.float16), shape def preprocessor(input_tensor, image_size): with tf.name_scope('Preprocessor'): batch_tensor, batch_shape = tf.map_fn( partial(decode_jpeg_resize, image_size=image_size), input_tensor, dtype=(tf.float16, tf.int32), back_prop=False, parallel_iterations=16) return batch_tensor, batch_shape def tf_postprocess_boxes(pred_xywh, pred_conf, pred_prob, org_img_shape, input_size, score_threshold): batch_size = tf.shape(pred_xywh)[0] # # (1) (x, y, w, h) --> (xmin, ymin, xmax, ymax) pred_coor = tf.concat([pred_xywh[:, :, :2] - pred_xywh[:, :, 2:] * 0.5, pred_xywh[:, :, :2] + pred_xywh[:, :, 2:] * 0.5], axis=-1) # # (2) (xmin, ymin, xmax, ymax) -> (xmin_org, ymin_org, xmax_org, ymax_org) org_wh = org_img_shape[:, tf.newaxis, 1::-1] org_whwh = tf.concat([org_wh, org_wh], axis=-1) org_whwh = tf.cast(org_whwh, tf.float32) input_size = np.float32(input_size) resize_ratio = input_size / tf.reduce_max(org_whwh, axis=-1) dwhwh = (input_size - resize_ratio * org_whwh) / 2 pred_coor = (pred_coor - dwhwh) / resize_ratio # # (5) discard some boxes with low scores scores = pred_conf * tf.reduce_max(pred_prob, axis=-1) score_mask = scores > score_threshold coors = pred_coor[score_mask] pred_conf = pred_conf[score_mask] pred_conf = tf.reshape(pred_conf, [batch_size, -1, 1]) pred_prob = pred_prob[score_mask] pred_prob = tf.reshape(pred_prob, [batch_size, -1, pred_prob.shape[-1]]) class_scores = pred_conf * pred_prob coors = tf.reshape(coors, [batch_size, -1, 1, coors.shape[-1]]) class_scores = tf.reshape(class_scores, [batch_size, -1, class_scores.shape[-1]]) return coors, class_scores def convert_weights(org_weights_path, cur_weights_path, input_size): org_weights_mess = [] with tf.Session(graph=tf.Graph()) as sess: load = tf.train.import_meta_graph(org_weights_path + '.meta') load.restore(sess, org_weights_path) for var in tf.global_variables(): var_name = var.op.name var_name_mess = str(var_name).split('/') var_shape = var.shape org_weights_mess.append([var_name, var_shape]) print("=> " + str(var_name).ljust(50), var_shape) print() cur_weights_mess = [] with tf.Session(graph=tf.Graph()) as sess: with tf.name_scope('input'): input_data = tf.placeholder(dtype=tf.string, shape=(None,), name='input_data') training = tf.placeholder(dtype=tf.bool, name='trainable') model = YOLOV3(input_data, input_size, training) for var in tf.global_variables(): var_name = var.op.name var_name_mess = str(var_name).split('/') var_shape = var.shape print(var_name_mess[0]) cur_weights_mess.append([var_name, var_shape]) print("=> " + str(var_name).ljust(50), var_shape) org_weights_num = len(org_weights_mess) cur_weights_num = len(cur_weights_mess) if cur_weights_num != org_weights_num: raise RuntimeError print('=> Number of weights that will rename:\t%d' % cur_weights_num) cur_to_org_dict = {} for index in range(org_weights_num): org_name, org_shape = org_weights_mess[index] cur_name, cur_shape = cur_weights_mess[index] if cur_shape != org_shape: print(org_weights_mess[index]) print(cur_weights_mess[index]) raise RuntimeError cur_to_org_dict[cur_name] = org_name print("=> " + str(cur_name).ljust(50) + ' : ' + org_name) with tf.name_scope('load_save'): name_to_var_dict = {var.op.name: var for var in tf.global_variables()} restore_dict = {cur_to_org_dict[cur_name]: name_to_var_dict[cur_name] for cur_name in cur_to_org_dict} load = tf.train.Saver(restore_dict) save = tf.train.Saver(tf.global_variables()) for var in tf.global_variables(): print("=> " + var.op.name) sess.run(tf.global_variables_initializer()) print('=> Restoring weights from:\t %s' % org_weights_path) load.restore(sess, org_weights_path) save.save(sess, cur_weights_path) def main(): parser = argparse.ArgumentParser() parser.add_argument('model_dir') args = parser.parse_args() if os.path.exists(args.model_dir): raise OSError('Directory {} already exists; please specify a different path for the tensorflow SavedModel'.format(args.model_dir)) with tempfile.TemporaryDirectory() as workdir: ckpt_file = os.path.join(workdir, './yolov3_coco_demo.ckpt') input_size = 416 if not os.path.isfile(ckpt_file + '.meta'): yolov3_coco_tar_gz = os.path.join(workdir, './yolov3_coco.tar.gz') url = 'https://github.com/YunYang1994/tensorflow-yolov3/releases/download/v1.0/yolov3_coco.tar.gz' print('Downloading from {}'.format(url)) urllib.request.urlretrieve(url, yolov3_coco_tar_gz) shutil.unpack_archive(yolov3_coco_tar_gz, extract_dir=workdir) convert_weights(os.path.join(workdir, './yolov3_coco.ckpt'), ckpt_file, input_size) input_tensor_name = 'input/input_data:0' output_names = ['boxes', 'scores', 'classes'] output_tensor_names = ['pred_bbox/boxes:0', 'pred_bbox/scores:0', 'pred_bbox/classes:0'] with tf.Session(graph=tf.Graph()) as sess: with tf.name_scope('input'): input_data = tf.placeholder(dtype=tf.string, shape=[None], name='input_data') model = YOLOV3(input_data, input_size, trainable=False) print(model.conv_sbbox, model.conv_mbbox, model.conv_lbbox) saver = tf.train.Saver() saver.restore(sess, ckpt_file) input_tensor = sess.graph.get_tensor_by_name(input_tensor_name) inputs = {'image': input_tensor} outputs = {name: sess.graph.get_tensor_by_name(tensor_name) for name, tensor_name in zip(output_names, output_tensor_names)} tf.saved_model.simple_save(sess, args.model_dir, inputs, outputs) print('tensorflow YOLO v3 SavedModel generated at {}'.format(args.model_dir)) if __name__ == '__main__': main() ================================================ FILE: src/examples/tensorflow/yolo_v4_demo/README.md ================================================

Please view our documentation at **[https://awsdocs-neuron.readthedocs-hosted.com/](https://awsdocs-neuron.readthedocs-hosted.com/)** ================================================ FILE: src/examples/tensorflow/yolo_v4_demo/evaluate.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluate YOLO v4 on Inferentia\n", "## Note: this tutorial runs on tensorflow-neuron 1.x only" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "This tutorial walks through compiling and evaluating YOLO v4 model on Inferentia using the AWS Neuron SDK 09/2020 release. We recommend running this tutorial on an EC2 `inf1.2xlarge` instance which contains one Inferentia and 8 vCPU cores, as well as 16 GB of memory.Verify that this Jupyter notebook is running the Python kernel environment that was set up according to the [Tensorflow Installation Guide](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/tensorflow/tensorflow-neuron/setup/tensorflow-install.html#install-neuron-tensorflow) You can select the Kernel from the “Kernel -> Change Kernel” option on the top of this Jupyter notebook page." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This demo requires the following pip packages:\n", "\n", "`neuron-cc tensorflow-neuron<2 requests pillow matplotlib pycocotools torch`\n", "\n", "and debian/rpm package `aws-neuron-runtime`.\n", "\n", "On DLAMI, `aws-neuron-runtime` is already pre-installed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install tensorflow_neuron==1.15.5.2.8.9.0 neuron_cc==1.13.5.0 requests pillow matplotlib pycocotools==2.0.1 numpy==1.18.2 torch~=1.5.0 --force \\\n", " --extra-index-url=https://pip.repos.neuron.amazonaws.com" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 1: Download Dataset and Generate Pretrained SavedModel\n", "### Download COCO 2017 validation dataset\n", "We start by downloading the COCO validation dataset, which we will use to validate our model. The COCO 2017 dataset is widely used for object-detection, segmentation and image captioning." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!curl -LO http://images.cocodataset.org/zips/val2017.zip\n", "!curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n", "!unzip -q val2017.zip\n", "!unzip annotations_trainval2017.zip" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check required package versions\n", "Here are the minimum required versions of AWS Neuron packages. We run a check." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pkg_resources\n", "from distutils.version import LooseVersion\n", "\n", "assert LooseVersion(pkg_resources.get_distribution('neuron-cc').version) > LooseVersion('1.0.20000')\n", "assert LooseVersion(pkg_resources.get_distribution('tensorflow-neuron').version) > LooseVersion('1.15.3.1.0.2000')\n", "print('passed package version checks')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate YOLO v4 tensorflow SavedModel (pretrained on COCO 2017 dataset)\n", "Script `yolo_v4_coco_saved_model.py` will generate a tensorflow SavedModel using pretrained weights from https://github.com/Tianxiaomo/pytorch-YOLOv4." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python3 yolo_v4_coco_saved_model.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tensorflow SavedModel can be loaded as a tensorflow predictor. When a JPEG format image is provided as input, the output result of the tensorflow predictor contains information for drawing bounding boxes and classification results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import tensorflow as tf\n", "from PIL import Image\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as patches\n", "\n", "# launch predictor and run inference on an arbitrary image in the validation dataset\n", "yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model')\n", "image_path = './val2017/000000581781.jpg'\n", "with open(image_path, 'rb') as f:\n", " feeds = {'image': [f.read()]}\n", "results = yolo_pred_cpu(feeds)\n", "\n", "# load annotations to decode classification result\n", "with open('./annotations/instances_val2017.json') as f:\n", " annotate_json = json.load(f)\n", "label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}\n", "\n", "# draw picture and bounding boxes\n", "fig, ax = plt.subplots(figsize=(10, 10))\n", "ax.imshow(Image.open(image_path).convert('RGB'))\n", "wanted = results['scores'][0] > 0.1\n", "for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):\n", " xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]\n", " rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')\n", " ax.add_patch(rect)\n", " rx, ry = rect.get_xy()\n", " rx = rx + rect.get_width() / 2.0\n", " ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,\n", " ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 2: Compile the Pretrained SavedModel for Inferentia\n", "We make use of the Python compilation API `tfn.saved_model.compile` that is avaiable in `tensorflow-neuron<2`. For the purpose of reducing Neuron runtime overhead, it is necessary to make use of arguments `no_fuse_ops` and `minimum_segment_size`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "import tensorflow as tf\n", "import tensorflow.neuron as tfn\n", "\n", "\n", "def no_fuse_condition(op):\n", " return any(op.name.startswith(pat) for pat in ['reshape', 'lambda_1/Cast', 'lambda_2/Cast', 'lambda_3/Cast'])\n", "\n", "with tf.Session(graph=tf.Graph()) as sess:\n", " tf.saved_model.loader.load(sess, ['serve'], './yolo_v4_coco_saved_model')\n", " no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]\n", "shutil.rmtree('./yolo_v4_coco_saved_model_neuron', ignore_errors=True)\n", "result = tfn.saved_model.compile(\n", " './yolo_v4_coco_saved_model', './yolo_v4_coco_saved_model_neuron',\n", " # we partition the graph before casting from float16 to float32, to help reduce the output tensor size by 1/2\n", " no_fuse_ops=no_fuse_ops,\n", " # to enforce trivial compilable subgraphs to run on CPU\n", " minimum_segment_size=100,\n", " batch_size=1,\n", " dynamic_batch_size=True,\n", ")\n", "print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Part 3: Evaluate Model Quality after Compilation\n", "### Define evaluation functions\n", "We first define some handy helper functions for running evaluation on the COCO 2017 dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import time\n", "import numpy as np\n", "import tensorflow as tf\n", "from pycocotools.coco import COCO\n", "from pycocotools.cocoeval import COCOeval\n", "\n", "\n", "def cocoapi_eval(jsonfile,\n", " style,\n", " coco_gt=None,\n", " anno_file=None,\n", " max_dets=(100, 300, 1000)):\n", " \"\"\"\n", " Args:\n", " jsonfile: Evaluation json file, eg: bbox.json, mask.json.\n", " style: COCOeval style, can be `bbox` , `segm` and `proposal`.\n", " coco_gt: Whether to load COCOAPI through anno_file,\n", " eg: coco_gt = COCO(anno_file)\n", " anno_file: COCO annotations file.\n", " max_dets: COCO evaluation maxDets.\n", " \"\"\"\n", " assert coco_gt is not None or anno_file is not None\n", "\n", " if coco_gt is None:\n", " coco_gt = COCO(anno_file)\n", " print(\"Start evaluate...\")\n", " coco_dt = coco_gt.loadRes(jsonfile)\n", " if style == 'proposal':\n", " coco_eval = COCOeval(coco_gt, coco_dt, 'bbox')\n", " coco_eval.params.useCats = 0\n", " coco_eval.params.maxDets = list(max_dets)\n", " else:\n", " coco_eval = COCOeval(coco_gt, coco_dt, style)\n", " coco_eval.evaluate()\n", " coco_eval.accumulate()\n", " coco_eval.summarize()\n", " return coco_eval.stats\n", "\n", "\n", "def bbox_eval(anno_file, bbox_list):\n", " coco_gt = COCO(anno_file)\n", "\n", " outfile = 'bbox_detections.json'\n", " print('Generating json file...')\n", " with open(outfile, 'w') as f:\n", " json.dump(bbox_list, f)\n", "\n", " map_stats = cocoapi_eval(outfile, 'bbox', coco_gt=coco_gt)\n", " return map_stats\n", "\n", "\n", "def get_image_as_bytes(images, eval_pre_path):\n", " batch_im_id_list = []\n", " batch_im_name_list = []\n", " batch_img_bytes_list = []\n", " n = len(images)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " for i, im in enumerate(images):\n", " im_id = im['id']\n", " file_name = im['file_name']\n", " if i % eval_batch_size == 0 and i != 0:\n", " batch_im_id_list.append(batch_im_id)\n", " batch_im_name_list.append(batch_im_name)\n", " batch_img_bytes_list.append(batch_img_bytes)\n", " batch_im_id = []\n", " batch_im_name = []\n", " batch_img_bytes = []\n", " batch_im_id.append(im_id)\n", " batch_im_name.append(file_name)\n", "\n", " with open(os.path.join(eval_pre_path, file_name), 'rb') as f:\n", " batch_img_bytes.append(f.read())\n", " return batch_im_id_list, batch_im_name_list, batch_img_bytes_list\n", "\n", "\n", "def analyze_bbox(results, batch_im_id, _clsid2catid):\n", " bbox_list = []\n", " k = 0\n", " for boxes, scores, classes in zip(results['boxes'], results['scores'], results['classes']):\n", " if boxes is not None:\n", " im_id = batch_im_id[k]\n", " n = len(boxes)\n", " for p in range(n):\n", " clsid = classes[p]\n", " score = scores[p]\n", " xmin, ymin, xmax, ymax = boxes[p]\n", " catid = (_clsid2catid[int(clsid)])\n", " w = xmax - xmin + 1\n", " h = ymax - ymin + 1\n", "\n", " bbox = [xmin, ymin, w, h]\n", " # Round to the nearest 10th to avoid huge file sizes, as COCO suggests\n", " bbox = [round(float(x) * 10) / 10 for x in bbox]\n", " bbox_res = {\n", " 'image_id': im_id,\n", " 'category_id': catid,\n", " 'bbox': bbox,\n", " 'score': float(score),\n", " }\n", " bbox_list.append(bbox_res)\n", " k += 1\n", " return bbox_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the actual evaluation loop. To fully utilize all four cores on one Inferentia, the optimal setup is to run multi-threaded inference using a `ThreadPoolExecutor`. The following cell is a multi-threaded adaptation of the evaluation routine at https://github.com/miemie2013/Keras-YOLOv4/blob/910c4c6f7265f5828fceed0f784496a0b46516bf/tools/cocotools.py#L97." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from concurrent import futures\n", "\n", "NUM_THREADS = 4\n", "\n", "def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):\n", " batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)\n", "\n", " # warm up\n", " yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})\n", " \n", " def yolo_predictor_timer(yolo_pred, image):\n", " begin = time.time()\n", " result = yolo_pred(image)\n", " delta = time.time() - begin\n", " return result, delta\n", "\n", " latency = []\n", " with futures.ThreadPoolExecutor(NUM_THREADS) as exe:\n", " fut_im_list = []\n", " fut_list = []\n", "\n", " start_time = time.time()\n", " for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):\n", " if len(batch_img_bytes) != eval_batch_size:\n", " continue\n", " fut = exe.submit(yolo_predictor_timer, yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})\n", " fut_im_list.append((batch_im_id, batch_im_name))\n", " fut_list.append(fut)\n", " bbox_list = []\n", " sum_time = 0.0\n", " count = 0\n", " for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):\n", " results, times = fut.result()\n", " # Adjust latency since we are in batch\n", " latency.append(times / eval_batch_size)\n", " sum_time += times\n", " bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))\n", " for _ in batch_im_id:\n", " count += 1\n", " if count % 1000 == 0:\n", " print('Test iter {}'.format(count))\n", "\n", " throughput = len(images) / (sum_time / NUM_THREADS)\n", "\n", " \n", " print('Average Images Per Second:', throughput)\n", " print(\"Latency P50: {:.1f} ms\".format(np.percentile(latency, 50)*1000.0))\n", " print(\"Latency P90: {:.1f} ms\".format(np.percentile(latency, 90)*1000.0))\n", " print(\"Latency P95: {:.1f} ms\".format(np.percentile(latency, 95)*1000.0))\n", " print(\"Latency P99: {:.1f} ms\".format(np.percentile(latency, 99)*1000.0))\n", "\n", " # start evaluation\n", " box_ap_stats = bbox_eval(anno_file, bbox_list)\n", " return box_ap_stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluate mean average precision (mAP) score\n", "Here is the code to calculate mAP scores of the YOLO v4 model. The expected mAP score is around 0.487 if we use the pretrained weights." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yolo_pred = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model_neuron')\n", "\n", "val_coco_root = './val2017'\n", "val_annotate = './annotations/instances_val2017.json'\n", "clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,\n", " 15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,\n", " 27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,\n", " 39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,\n", " 51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,\n", " 63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,\n", " 75: 86, 76: 87, 77: 88, 78: 89, 79: 90}\n", "eval_batch_size = 8\n", "with open(val_annotate, 'r', encoding='utf-8') as f2:\n", " for line in f2:\n", " line = line.strip()\n", " dataset = json.loads(line)\n", " images = dataset['images']\n", "box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)" ] } ], "metadata": { "kernelspec": { "display_name": "Environment (conda_aws_neuron_tensorflow_p36)", "language": "python", "name": "conda_aws_neuron_tensorflow_p36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.13" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: src/examples/tensorflow/yolo_v4_demo/yolo_v4_coco_saved_model.py ================================================ import os import io from functools import partial import requests import numpy as np import torch import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers def rename_weights(checkpoint): name_mapping = { 'down1.conv1.conv.0.weight': 'models.0.conv1.weight', 'down1.conv1.conv.1.weight': 'models.0.bn1.weight', 'down1.conv1.conv.1.bias': 'models.0.bn1.bias', 'down1.conv1.conv.1.running_mean': 'models.0.bn1.running_mean', 'down1.conv1.conv.1.running_var': 'models.0.bn1.running_var', 'down1.conv1.conv.1.num_batches_tracked': 'models.0.bn1.num_batches_tracked', 'down1.conv2.conv.0.weight': 'models.1.conv2.weight', 'down1.conv2.conv.1.weight': 'models.1.bn2.weight', 'down1.conv2.conv.1.bias': 'models.1.bn2.bias', 'down1.conv2.conv.1.running_mean': 'models.1.bn2.running_mean', 'down1.conv2.conv.1.running_var': 'models.1.bn2.running_var', 'down1.conv2.conv.1.num_batches_tracked': 'models.1.bn2.num_batches_tracked', 'down1.conv3.conv.0.weight': 'models.2.conv3.weight', 'down1.conv3.conv.1.weight': 'models.2.bn3.weight', 'down1.conv3.conv.1.bias': 'models.2.bn3.bias', 'down1.conv3.conv.1.running_mean': 'models.2.bn3.running_mean', 'down1.conv3.conv.1.running_var': 'models.2.bn3.running_var', 'down1.conv3.conv.1.num_batches_tracked': 'models.2.bn3.num_batches_tracked', 'down1.conv4.conv.0.weight': 'models.4.conv4.weight', 'down1.conv4.conv.1.weight': 'models.4.bn4.weight', 'down1.conv4.conv.1.bias': 'models.4.bn4.bias', 'down1.conv4.conv.1.running_mean': 'models.4.bn4.running_mean', 'down1.conv4.conv.1.running_var': 'models.4.bn4.running_var', 'down1.conv4.conv.1.num_batches_tracked': 'models.4.bn4.num_batches_tracked', 'down1.conv5.conv.0.weight': 'models.5.conv5.weight', 'down1.conv5.conv.1.weight': 'models.5.bn5.weight', 'down1.conv5.conv.1.bias': 'models.5.bn5.bias', 'down1.conv5.conv.1.running_mean': 'models.5.bn5.running_mean', 'down1.conv5.conv.1.running_var': 'models.5.bn5.running_var', 'down1.conv5.conv.1.num_batches_tracked': 'models.5.bn5.num_batches_tracked', 'down1.conv6.conv.0.weight': 'models.6.conv6.weight', 'down1.conv6.conv.1.weight': 'models.6.bn6.weight', 'down1.conv6.conv.1.bias': 'models.6.bn6.bias', 'down1.conv6.conv.1.running_mean': 'models.6.bn6.running_mean', 'down1.conv6.conv.1.running_var': 'models.6.bn6.running_var', 'down1.conv6.conv.1.num_batches_tracked': 'models.6.bn6.num_batches_tracked', 'down1.conv7.conv.0.weight': 'models.8.conv7.weight', 'down1.conv7.conv.1.weight': 'models.8.bn7.weight', 'down1.conv7.conv.1.bias': 'models.8.bn7.bias', 'down1.conv7.conv.1.running_mean': 'models.8.bn7.running_mean', 'down1.conv7.conv.1.running_var': 'models.8.bn7.running_var', 'down1.conv7.conv.1.num_batches_tracked': 'models.8.bn7.num_batches_tracked', 'down1.conv8.conv.0.weight': 'models.10.conv8.weight', 'down1.conv8.conv.1.weight': 'models.10.bn8.weight', 'down1.conv8.conv.1.bias': 'models.10.bn8.bias', 'down1.conv8.conv.1.running_mean': 'models.10.bn8.running_mean', 'down1.conv8.conv.1.running_var': 'models.10.bn8.running_var', 'down1.conv8.conv.1.num_batches_tracked': 'models.10.bn8.num_batches_tracked', 'down2.conv1.conv.0.weight': 'models.11.conv9.weight', 'down2.conv1.conv.1.weight': 'models.11.bn9.weight', 'down2.conv1.conv.1.bias': 'models.11.bn9.bias', 'down2.conv1.conv.1.running_mean': 'models.11.bn9.running_mean', 'down2.conv1.conv.1.running_var': 'models.11.bn9.running_var', 'down2.conv1.conv.1.num_batches_tracked': 'models.11.bn9.num_batches_tracked', 'down2.conv2.conv.0.weight': 'models.12.conv10.weight', 'down2.conv2.conv.1.weight': 'models.12.bn10.weight', 'down2.conv2.conv.1.bias': 'models.12.bn10.bias', 'down2.conv2.conv.1.running_mean': 'models.12.bn10.running_mean', 'down2.conv2.conv.1.running_var': 'models.12.bn10.running_var', 'down2.conv2.conv.1.num_batches_tracked': 'models.12.bn10.num_batches_tracked', 'down2.conv3.conv.0.weight': 'models.14.conv11.weight', 'down2.conv3.conv.1.weight': 'models.14.bn11.weight', 'down2.conv3.conv.1.bias': 'models.14.bn11.bias', 'down2.conv3.conv.1.running_mean': 'models.14.bn11.running_mean', 'down2.conv3.conv.1.running_var': 'models.14.bn11.running_var', 'down2.conv3.conv.1.num_batches_tracked': 'models.14.bn11.num_batches_tracked', 'down2.resblock.module_list.0.0.conv.0.weight': 'models.15.conv12.weight', 'down2.resblock.module_list.0.0.conv.1.weight': 'models.15.bn12.weight', 'down2.resblock.module_list.0.0.conv.1.bias': 'models.15.bn12.bias', 'down2.resblock.module_list.0.0.conv.1.running_mean': 'models.15.bn12.running_mean', 'down2.resblock.module_list.0.0.conv.1.running_var': 'models.15.bn12.running_var', 'down2.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.15.bn12.num_batches_tracked', 'down2.resblock.module_list.0.1.conv.0.weight': 'models.16.conv13.weight', 'down2.resblock.module_list.0.1.conv.1.weight': 'models.16.bn13.weight', 'down2.resblock.module_list.0.1.conv.1.bias': 'models.16.bn13.bias', 'down2.resblock.module_list.0.1.conv.1.running_mean': 'models.16.bn13.running_mean', 'down2.resblock.module_list.0.1.conv.1.running_var': 'models.16.bn13.running_var', 'down2.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.16.bn13.num_batches_tracked', 'down2.resblock.module_list.1.0.conv.0.weight': 'models.18.conv14.weight', 'down2.resblock.module_list.1.0.conv.1.weight': 'models.18.bn14.weight', 'down2.resblock.module_list.1.0.conv.1.bias': 'models.18.bn14.bias', 'down2.resblock.module_list.1.0.conv.1.running_mean': 'models.18.bn14.running_mean', 'down2.resblock.module_list.1.0.conv.1.running_var': 'models.18.bn14.running_var', 'down2.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.18.bn14.num_batches_tracked', 'down2.resblock.module_list.1.1.conv.0.weight': 'models.19.conv15.weight', 'down2.resblock.module_list.1.1.conv.1.weight': 'models.19.bn15.weight', 'down2.resblock.module_list.1.1.conv.1.bias': 'models.19.bn15.bias', 'down2.resblock.module_list.1.1.conv.1.running_mean': 'models.19.bn15.running_mean', 'down2.resblock.module_list.1.1.conv.1.running_var': 'models.19.bn15.running_var', 'down2.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.19.bn15.num_batches_tracked', 'down2.conv4.conv.0.weight': 'models.21.conv16.weight', 'down2.conv4.conv.1.weight': 'models.21.bn16.weight', 'down2.conv4.conv.1.bias': 'models.21.bn16.bias', 'down2.conv4.conv.1.running_mean': 'models.21.bn16.running_mean', 'down2.conv4.conv.1.running_var': 'models.21.bn16.running_var', 'down2.conv4.conv.1.num_batches_tracked': 'models.21.bn16.num_batches_tracked', 'down2.conv5.conv.0.weight': 'models.23.conv17.weight', 'down2.conv5.conv.1.weight': 'models.23.bn17.weight', 'down2.conv5.conv.1.bias': 'models.23.bn17.bias', 'down2.conv5.conv.1.running_mean': 'models.23.bn17.running_mean', 'down2.conv5.conv.1.running_var': 'models.23.bn17.running_var', 'down2.conv5.conv.1.num_batches_tracked': 'models.23.bn17.num_batches_tracked', 'down3.conv1.conv.0.weight': 'models.24.conv18.weight', 'down3.conv1.conv.1.weight': 'models.24.bn18.weight', 'down3.conv1.conv.1.bias': 'models.24.bn18.bias', 'down3.conv1.conv.1.running_mean': 'models.24.bn18.running_mean', 'down3.conv1.conv.1.running_var': 'models.24.bn18.running_var', 'down3.conv1.conv.1.num_batches_tracked': 'models.24.bn18.num_batches_tracked', 'down3.conv2.conv.0.weight': 'models.25.conv19.weight', 'down3.conv2.conv.1.weight': 'models.25.bn19.weight', 'down3.conv2.conv.1.bias': 'models.25.bn19.bias', 'down3.conv2.conv.1.running_mean': 'models.25.bn19.running_mean', 'down3.conv2.conv.1.running_var': 'models.25.bn19.running_var', 'down3.conv2.conv.1.num_batches_tracked': 'models.25.bn19.num_batches_tracked', 'down3.conv3.conv.0.weight': 'models.27.conv20.weight', 'down3.conv3.conv.1.weight': 'models.27.bn20.weight', 'down3.conv3.conv.1.bias': 'models.27.bn20.bias', 'down3.conv3.conv.1.running_mean': 'models.27.bn20.running_mean', 'down3.conv3.conv.1.running_var': 'models.27.bn20.running_var', 'down3.conv3.conv.1.num_batches_tracked': 'models.27.bn20.num_batches_tracked', 'down3.resblock.module_list.0.0.conv.0.weight': 'models.28.conv21.weight', 'down3.resblock.module_list.0.0.conv.1.weight': 'models.28.bn21.weight', 'down3.resblock.module_list.0.0.conv.1.bias': 'models.28.bn21.bias', 'down3.resblock.module_list.0.0.conv.1.running_mean': 'models.28.bn21.running_mean', 'down3.resblock.module_list.0.0.conv.1.running_var': 'models.28.bn21.running_var', 'down3.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.28.bn21.num_batches_tracked', 'down3.resblock.module_list.0.1.conv.0.weight': 'models.29.conv22.weight', 'down3.resblock.module_list.0.1.conv.1.weight': 'models.29.bn22.weight', 'down3.resblock.module_list.0.1.conv.1.bias': 'models.29.bn22.bias', 'down3.resblock.module_list.0.1.conv.1.running_mean': 'models.29.bn22.running_mean', 'down3.resblock.module_list.0.1.conv.1.running_var': 'models.29.bn22.running_var', 'down3.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.29.bn22.num_batches_tracked', 'down3.resblock.module_list.1.0.conv.0.weight': 'models.31.conv23.weight', 'down3.resblock.module_list.1.0.conv.1.weight': 'models.31.bn23.weight', 'down3.resblock.module_list.1.0.conv.1.bias': 'models.31.bn23.bias', 'down3.resblock.module_list.1.0.conv.1.running_mean': 'models.31.bn23.running_mean', 'down3.resblock.module_list.1.0.conv.1.running_var': 'models.31.bn23.running_var', 'down3.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.31.bn23.num_batches_tracked', 'down3.resblock.module_list.1.1.conv.0.weight': 'models.32.conv24.weight', 'down3.resblock.module_list.1.1.conv.1.weight': 'models.32.bn24.weight', 'down3.resblock.module_list.1.1.conv.1.bias': 'models.32.bn24.bias', 'down3.resblock.module_list.1.1.conv.1.running_mean': 'models.32.bn24.running_mean', 'down3.resblock.module_list.1.1.conv.1.running_var': 'models.32.bn24.running_var', 'down3.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.32.bn24.num_batches_tracked', 'down3.resblock.module_list.2.0.conv.0.weight': 'models.34.conv25.weight', 'down3.resblock.module_list.2.0.conv.1.weight': 'models.34.bn25.weight', 'down3.resblock.module_list.2.0.conv.1.bias': 'models.34.bn25.bias', 'down3.resblock.module_list.2.0.conv.1.running_mean': 'models.34.bn25.running_mean', 'down3.resblock.module_list.2.0.conv.1.running_var': 'models.34.bn25.running_var', 'down3.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.34.bn25.num_batches_tracked', 'down3.resblock.module_list.2.1.conv.0.weight': 'models.35.conv26.weight', 'down3.resblock.module_list.2.1.conv.1.weight': 'models.35.bn26.weight', 'down3.resblock.module_list.2.1.conv.1.bias': 'models.35.bn26.bias', 'down3.resblock.module_list.2.1.conv.1.running_mean': 'models.35.bn26.running_mean', 'down3.resblock.module_list.2.1.conv.1.running_var': 'models.35.bn26.running_var', 'down3.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.35.bn26.num_batches_tracked', 'down3.resblock.module_list.3.0.conv.0.weight': 'models.37.conv27.weight', 'down3.resblock.module_list.3.0.conv.1.weight': 'models.37.bn27.weight', 'down3.resblock.module_list.3.0.conv.1.bias': 'models.37.bn27.bias', 'down3.resblock.module_list.3.0.conv.1.running_mean': 'models.37.bn27.running_mean', 'down3.resblock.module_list.3.0.conv.1.running_var': 'models.37.bn27.running_var', 'down3.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.37.bn27.num_batches_tracked', 'down3.resblock.module_list.3.1.conv.0.weight': 'models.38.conv28.weight', 'down3.resblock.module_list.3.1.conv.1.weight': 'models.38.bn28.weight', 'down3.resblock.module_list.3.1.conv.1.bias': 'models.38.bn28.bias', 'down3.resblock.module_list.3.1.conv.1.running_mean': 'models.38.bn28.running_mean', 'down3.resblock.module_list.3.1.conv.1.running_var': 'models.38.bn28.running_var', 'down3.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.38.bn28.num_batches_tracked', 'down3.resblock.module_list.4.0.conv.0.weight': 'models.40.conv29.weight', 'down3.resblock.module_list.4.0.conv.1.weight': 'models.40.bn29.weight', 'down3.resblock.module_list.4.0.conv.1.bias': 'models.40.bn29.bias', 'down3.resblock.module_list.4.0.conv.1.running_mean': 'models.40.bn29.running_mean', 'down3.resblock.module_list.4.0.conv.1.running_var': 'models.40.bn29.running_var', 'down3.resblock.module_list.4.0.conv.1.num_batches_tracked': 'models.40.bn29.num_batches_tracked', 'down3.resblock.module_list.4.1.conv.0.weight': 'models.41.conv30.weight', 'down3.resblock.module_list.4.1.conv.1.weight': 'models.41.bn30.weight', 'down3.resblock.module_list.4.1.conv.1.bias': 'models.41.bn30.bias', 'down3.resblock.module_list.4.1.conv.1.running_mean': 'models.41.bn30.running_mean', 'down3.resblock.module_list.4.1.conv.1.running_var': 'models.41.bn30.running_var', 'down3.resblock.module_list.4.1.conv.1.num_batches_tracked': 'models.41.bn30.num_batches_tracked', 'down3.resblock.module_list.5.0.conv.0.weight': 'models.43.conv31.weight', 'down3.resblock.module_list.5.0.conv.1.weight': 'models.43.bn31.weight', 'down3.resblock.module_list.5.0.conv.1.bias': 'models.43.bn31.bias', 'down3.resblock.module_list.5.0.conv.1.running_mean': 'models.43.bn31.running_mean', 'down3.resblock.module_list.5.0.conv.1.running_var': 'models.43.bn31.running_var', 'down3.resblock.module_list.5.0.conv.1.num_batches_tracked': 'models.43.bn31.num_batches_tracked', 'down3.resblock.module_list.5.1.conv.0.weight': 'models.44.conv32.weight', 'down3.resblock.module_list.5.1.conv.1.weight': 'models.44.bn32.weight', 'down3.resblock.module_list.5.1.conv.1.bias': 'models.44.bn32.bias', 'down3.resblock.module_list.5.1.conv.1.running_mean': 'models.44.bn32.running_mean', 'down3.resblock.module_list.5.1.conv.1.running_var': 'models.44.bn32.running_var', 'down3.resblock.module_list.5.1.conv.1.num_batches_tracked': 'models.44.bn32.num_batches_tracked', 'down3.resblock.module_list.6.0.conv.0.weight': 'models.46.conv33.weight', 'down3.resblock.module_list.6.0.conv.1.weight': 'models.46.bn33.weight', 'down3.resblock.module_list.6.0.conv.1.bias': 'models.46.bn33.bias', 'down3.resblock.module_list.6.0.conv.1.running_mean': 'models.46.bn33.running_mean', 'down3.resblock.module_list.6.0.conv.1.running_var': 'models.46.bn33.running_var', 'down3.resblock.module_list.6.0.conv.1.num_batches_tracked': 'models.46.bn33.num_batches_tracked', 'down3.resblock.module_list.6.1.conv.0.weight': 'models.47.conv34.weight', 'down3.resblock.module_list.6.1.conv.1.weight': 'models.47.bn34.weight', 'down3.resblock.module_list.6.1.conv.1.bias': 'models.47.bn34.bias', 'down3.resblock.module_list.6.1.conv.1.running_mean': 'models.47.bn34.running_mean', 'down3.resblock.module_list.6.1.conv.1.running_var': 'models.47.bn34.running_var', 'down3.resblock.module_list.6.1.conv.1.num_batches_tracked': 'models.47.bn34.num_batches_tracked', 'down3.resblock.module_list.7.0.conv.0.weight': 'models.49.conv35.weight', 'down3.resblock.module_list.7.0.conv.1.weight': 'models.49.bn35.weight', 'down3.resblock.module_list.7.0.conv.1.bias': 'models.49.bn35.bias', 'down3.resblock.module_list.7.0.conv.1.running_mean': 'models.49.bn35.running_mean', 'down3.resblock.module_list.7.0.conv.1.running_var': 'models.49.bn35.running_var', 'down3.resblock.module_list.7.0.conv.1.num_batches_tracked': 'models.49.bn35.num_batches_tracked', 'down3.resblock.module_list.7.1.conv.0.weight': 'models.50.conv36.weight', 'down3.resblock.module_list.7.1.conv.1.weight': 'models.50.bn36.weight', 'down3.resblock.module_list.7.1.conv.1.bias': 'models.50.bn36.bias', 'down3.resblock.module_list.7.1.conv.1.running_mean': 'models.50.bn36.running_mean', 'down3.resblock.module_list.7.1.conv.1.running_var': 'models.50.bn36.running_var', 'down3.resblock.module_list.7.1.conv.1.num_batches_tracked': 'models.50.bn36.num_batches_tracked', 'down3.conv4.conv.0.weight': 'models.52.conv37.weight', 'down3.conv4.conv.1.weight': 'models.52.bn37.weight', 'down3.conv4.conv.1.bias': 'models.52.bn37.bias', 'down3.conv4.conv.1.running_mean': 'models.52.bn37.running_mean', 'down3.conv4.conv.1.running_var': 'models.52.bn37.running_var', 'down3.conv4.conv.1.num_batches_tracked': 'models.52.bn37.num_batches_tracked', 'down3.conv5.conv.0.weight': 'models.54.conv38.weight', 'down3.conv5.conv.1.weight': 'models.54.bn38.weight', 'down3.conv5.conv.1.bias': 'models.54.bn38.bias', 'down3.conv5.conv.1.running_mean': 'models.54.bn38.running_mean', 'down3.conv5.conv.1.running_var': 'models.54.bn38.running_var', 'down3.conv5.conv.1.num_batches_tracked': 'models.54.bn38.num_batches_tracked', 'down4.conv1.conv.0.weight': 'models.55.conv39.weight', 'down4.conv1.conv.1.weight': 'models.55.bn39.weight', 'down4.conv1.conv.1.bias': 'models.55.bn39.bias', 'down4.conv1.conv.1.running_mean': 'models.55.bn39.running_mean', 'down4.conv1.conv.1.running_var': 'models.55.bn39.running_var', 'down4.conv1.conv.1.num_batches_tracked': 'models.55.bn39.num_batches_tracked', 'down4.conv2.conv.0.weight': 'models.56.conv40.weight', 'down4.conv2.conv.1.weight': 'models.56.bn40.weight', 'down4.conv2.conv.1.bias': 'models.56.bn40.bias', 'down4.conv2.conv.1.running_mean': 'models.56.bn40.running_mean', 'down4.conv2.conv.1.running_var': 'models.56.bn40.running_var', 'down4.conv2.conv.1.num_batches_tracked': 'models.56.bn40.num_batches_tracked', 'down4.conv3.conv.0.weight': 'models.58.conv41.weight', 'down4.conv3.conv.1.weight': 'models.58.bn41.weight', 'down4.conv3.conv.1.bias': 'models.58.bn41.bias', 'down4.conv3.conv.1.running_mean': 'models.58.bn41.running_mean', 'down4.conv3.conv.1.running_var': 'models.58.bn41.running_var', 'down4.conv3.conv.1.num_batches_tracked': 'models.58.bn41.num_batches_tracked', 'down4.resblock.module_list.0.0.conv.0.weight': 'models.59.conv42.weight', 'down4.resblock.module_list.0.0.conv.1.weight': 'models.59.bn42.weight', 'down4.resblock.module_list.0.0.conv.1.bias': 'models.59.bn42.bias', 'down4.resblock.module_list.0.0.conv.1.running_mean': 'models.59.bn42.running_mean', 'down4.resblock.module_list.0.0.conv.1.running_var': 'models.59.bn42.running_var', 'down4.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.59.bn42.num_batches_tracked', 'down4.resblock.module_list.0.1.conv.0.weight': 'models.60.conv43.weight', 'down4.resblock.module_list.0.1.conv.1.weight': 'models.60.bn43.weight', 'down4.resblock.module_list.0.1.conv.1.bias': 'models.60.bn43.bias', 'down4.resblock.module_list.0.1.conv.1.running_mean': 'models.60.bn43.running_mean', 'down4.resblock.module_list.0.1.conv.1.running_var': 'models.60.bn43.running_var', 'down4.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.60.bn43.num_batches_tracked', 'down4.resblock.module_list.1.0.conv.0.weight': 'models.62.conv44.weight', 'down4.resblock.module_list.1.0.conv.1.weight': 'models.62.bn44.weight', 'down4.resblock.module_list.1.0.conv.1.bias': 'models.62.bn44.bias', 'down4.resblock.module_list.1.0.conv.1.running_mean': 'models.62.bn44.running_mean', 'down4.resblock.module_list.1.0.conv.1.running_var': 'models.62.bn44.running_var', 'down4.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.62.bn44.num_batches_tracked', 'down4.resblock.module_list.1.1.conv.0.weight': 'models.63.conv45.weight', 'down4.resblock.module_list.1.1.conv.1.weight': 'models.63.bn45.weight', 'down4.resblock.module_list.1.1.conv.1.bias': 'models.63.bn45.bias', 'down4.resblock.module_list.1.1.conv.1.running_mean': 'models.63.bn45.running_mean', 'down4.resblock.module_list.1.1.conv.1.running_var': 'models.63.bn45.running_var', 'down4.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.63.bn45.num_batches_tracked', 'down4.resblock.module_list.2.0.conv.0.weight': 'models.65.conv46.weight', 'down4.resblock.module_list.2.0.conv.1.weight': 'models.65.bn46.weight', 'down4.resblock.module_list.2.0.conv.1.bias': 'models.65.bn46.bias', 'down4.resblock.module_list.2.0.conv.1.running_mean': 'models.65.bn46.running_mean', 'down4.resblock.module_list.2.0.conv.1.running_var': 'models.65.bn46.running_var', 'down4.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.65.bn46.num_batches_tracked', 'down4.resblock.module_list.2.1.conv.0.weight': 'models.66.conv47.weight', 'down4.resblock.module_list.2.1.conv.1.weight': 'models.66.bn47.weight', 'down4.resblock.module_list.2.1.conv.1.bias': 'models.66.bn47.bias', 'down4.resblock.module_list.2.1.conv.1.running_mean': 'models.66.bn47.running_mean', 'down4.resblock.module_list.2.1.conv.1.running_var': 'models.66.bn47.running_var', 'down4.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.66.bn47.num_batches_tracked', 'down4.resblock.module_list.3.0.conv.0.weight': 'models.68.conv48.weight', 'down4.resblock.module_list.3.0.conv.1.weight': 'models.68.bn48.weight', 'down4.resblock.module_list.3.0.conv.1.bias': 'models.68.bn48.bias', 'down4.resblock.module_list.3.0.conv.1.running_mean': 'models.68.bn48.running_mean', 'down4.resblock.module_list.3.0.conv.1.running_var': 'models.68.bn48.running_var', 'down4.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.68.bn48.num_batches_tracked', 'down4.resblock.module_list.3.1.conv.0.weight': 'models.69.conv49.weight', 'down4.resblock.module_list.3.1.conv.1.weight': 'models.69.bn49.weight', 'down4.resblock.module_list.3.1.conv.1.bias': 'models.69.bn49.bias', 'down4.resblock.module_list.3.1.conv.1.running_mean': 'models.69.bn49.running_mean', 'down4.resblock.module_list.3.1.conv.1.running_var': 'models.69.bn49.running_var', 'down4.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.69.bn49.num_batches_tracked', 'down4.resblock.module_list.4.0.conv.0.weight': 'models.71.conv50.weight', 'down4.resblock.module_list.4.0.conv.1.weight': 'models.71.bn50.weight', 'down4.resblock.module_list.4.0.conv.1.bias': 'models.71.bn50.bias', 'down4.resblock.module_list.4.0.conv.1.running_mean': 'models.71.bn50.running_mean', 'down4.resblock.module_list.4.0.conv.1.running_var': 'models.71.bn50.running_var', 'down4.resblock.module_list.4.0.conv.1.num_batches_tracked': 'models.71.bn50.num_batches_tracked', 'down4.resblock.module_list.4.1.conv.0.weight': 'models.72.conv51.weight', 'down4.resblock.module_list.4.1.conv.1.weight': 'models.72.bn51.weight', 'down4.resblock.module_list.4.1.conv.1.bias': 'models.72.bn51.bias', 'down4.resblock.module_list.4.1.conv.1.running_mean': 'models.72.bn51.running_mean', 'down4.resblock.module_list.4.1.conv.1.running_var': 'models.72.bn51.running_var', 'down4.resblock.module_list.4.1.conv.1.num_batches_tracked': 'models.72.bn51.num_batches_tracked', 'down4.resblock.module_list.5.0.conv.0.weight': 'models.74.conv52.weight', 'down4.resblock.module_list.5.0.conv.1.weight': 'models.74.bn52.weight', 'down4.resblock.module_list.5.0.conv.1.bias': 'models.74.bn52.bias', 'down4.resblock.module_list.5.0.conv.1.running_mean': 'models.74.bn52.running_mean', 'down4.resblock.module_list.5.0.conv.1.running_var': 'models.74.bn52.running_var', 'down4.resblock.module_list.5.0.conv.1.num_batches_tracked': 'models.74.bn52.num_batches_tracked', 'down4.resblock.module_list.5.1.conv.0.weight': 'models.75.conv53.weight', 'down4.resblock.module_list.5.1.conv.1.weight': 'models.75.bn53.weight', 'down4.resblock.module_list.5.1.conv.1.bias': 'models.75.bn53.bias', 'down4.resblock.module_list.5.1.conv.1.running_mean': 'models.75.bn53.running_mean', 'down4.resblock.module_list.5.1.conv.1.running_var': 'models.75.bn53.running_var', 'down4.resblock.module_list.5.1.conv.1.num_batches_tracked': 'models.75.bn53.num_batches_tracked', 'down4.resblock.module_list.6.0.conv.0.weight': 'models.77.conv54.weight', 'down4.resblock.module_list.6.0.conv.1.weight': 'models.77.bn54.weight', 'down4.resblock.module_list.6.0.conv.1.bias': 'models.77.bn54.bias', 'down4.resblock.module_list.6.0.conv.1.running_mean': 'models.77.bn54.running_mean', 'down4.resblock.module_list.6.0.conv.1.running_var': 'models.77.bn54.running_var', 'down4.resblock.module_list.6.0.conv.1.num_batches_tracked': 'models.77.bn54.num_batches_tracked', 'down4.resblock.module_list.6.1.conv.0.weight': 'models.78.conv55.weight', 'down4.resblock.module_list.6.1.conv.1.weight': 'models.78.bn55.weight', 'down4.resblock.module_list.6.1.conv.1.bias': 'models.78.bn55.bias', 'down4.resblock.module_list.6.1.conv.1.running_mean': 'models.78.bn55.running_mean', 'down4.resblock.module_list.6.1.conv.1.running_var': 'models.78.bn55.running_var', 'down4.resblock.module_list.6.1.conv.1.num_batches_tracked': 'models.78.bn55.num_batches_tracked', 'down4.resblock.module_list.7.0.conv.0.weight': 'models.80.conv56.weight', 'down4.resblock.module_list.7.0.conv.1.weight': 'models.80.bn56.weight', 'down4.resblock.module_list.7.0.conv.1.bias': 'models.80.bn56.bias', 'down4.resblock.module_list.7.0.conv.1.running_mean': 'models.80.bn56.running_mean', 'down4.resblock.module_list.7.0.conv.1.running_var': 'models.80.bn56.running_var', 'down4.resblock.module_list.7.0.conv.1.num_batches_tracked': 'models.80.bn56.num_batches_tracked', 'down4.resblock.module_list.7.1.conv.0.weight': 'models.81.conv57.weight', 'down4.resblock.module_list.7.1.conv.1.weight': 'models.81.bn57.weight', 'down4.resblock.module_list.7.1.conv.1.bias': 'models.81.bn57.bias', 'down4.resblock.module_list.7.1.conv.1.running_mean': 'models.81.bn57.running_mean', 'down4.resblock.module_list.7.1.conv.1.running_var': 'models.81.bn57.running_var', 'down4.resblock.module_list.7.1.conv.1.num_batches_tracked': 'models.81.bn57.num_batches_tracked', 'down4.conv4.conv.0.weight': 'models.83.conv58.weight', 'down4.conv4.conv.1.weight': 'models.83.bn58.weight', 'down4.conv4.conv.1.bias': 'models.83.bn58.bias', 'down4.conv4.conv.1.running_mean': 'models.83.bn58.running_mean', 'down4.conv4.conv.1.running_var': 'models.83.bn58.running_var', 'down4.conv4.conv.1.num_batches_tracked': 'models.83.bn58.num_batches_tracked', 'down4.conv5.conv.0.weight': 'models.85.conv59.weight', 'down4.conv5.conv.1.weight': 'models.85.bn59.weight', 'down4.conv5.conv.1.bias': 'models.85.bn59.bias', 'down4.conv5.conv.1.running_mean': 'models.85.bn59.running_mean', 'down4.conv5.conv.1.running_var': 'models.85.bn59.running_var', 'down4.conv5.conv.1.num_batches_tracked': 'models.85.bn59.num_batches_tracked', 'down5.conv1.conv.0.weight': 'models.86.conv60.weight', 'down5.conv1.conv.1.weight': 'models.86.bn60.weight', 'down5.conv1.conv.1.bias': 'models.86.bn60.bias', 'down5.conv1.conv.1.running_mean': 'models.86.bn60.running_mean', 'down5.conv1.conv.1.running_var': 'models.86.bn60.running_var', 'down5.conv1.conv.1.num_batches_tracked': 'models.86.bn60.num_batches_tracked', 'down5.conv2.conv.0.weight': 'models.87.conv61.weight', 'down5.conv2.conv.1.weight': 'models.87.bn61.weight', 'down5.conv2.conv.1.bias': 'models.87.bn61.bias', 'down5.conv2.conv.1.running_mean': 'models.87.bn61.running_mean', 'down5.conv2.conv.1.running_var': 'models.87.bn61.running_var', 'down5.conv2.conv.1.num_batches_tracked': 'models.87.bn61.num_batches_tracked', 'down5.conv3.conv.0.weight': 'models.89.conv62.weight', 'down5.conv3.conv.1.weight': 'models.89.bn62.weight', 'down5.conv3.conv.1.bias': 'models.89.bn62.bias', 'down5.conv3.conv.1.running_mean': 'models.89.bn62.running_mean', 'down5.conv3.conv.1.running_var': 'models.89.bn62.running_var', 'down5.conv3.conv.1.num_batches_tracked': 'models.89.bn62.num_batches_tracked', 'down5.resblock.module_list.0.0.conv.0.weight': 'models.90.conv63.weight', 'down5.resblock.module_list.0.0.conv.1.weight': 'models.90.bn63.weight', 'down5.resblock.module_list.0.0.conv.1.bias': 'models.90.bn63.bias', 'down5.resblock.module_list.0.0.conv.1.running_mean': 'models.90.bn63.running_mean', 'down5.resblock.module_list.0.0.conv.1.running_var': 'models.90.bn63.running_var', 'down5.resblock.module_list.0.0.conv.1.num_batches_tracked': 'models.90.bn63.num_batches_tracked', 'down5.resblock.module_list.0.1.conv.0.weight': 'models.91.conv64.weight', 'down5.resblock.module_list.0.1.conv.1.weight': 'models.91.bn64.weight', 'down5.resblock.module_list.0.1.conv.1.bias': 'models.91.bn64.bias', 'down5.resblock.module_list.0.1.conv.1.running_mean': 'models.91.bn64.running_mean', 'down5.resblock.module_list.0.1.conv.1.running_var': 'models.91.bn64.running_var', 'down5.resblock.module_list.0.1.conv.1.num_batches_tracked': 'models.91.bn64.num_batches_tracked', 'down5.resblock.module_list.1.0.conv.0.weight': 'models.93.conv65.weight', 'down5.resblock.module_list.1.0.conv.1.weight': 'models.93.bn65.weight', 'down5.resblock.module_list.1.0.conv.1.bias': 'models.93.bn65.bias', 'down5.resblock.module_list.1.0.conv.1.running_mean': 'models.93.bn65.running_mean', 'down5.resblock.module_list.1.0.conv.1.running_var': 'models.93.bn65.running_var', 'down5.resblock.module_list.1.0.conv.1.num_batches_tracked': 'models.93.bn65.num_batches_tracked', 'down5.resblock.module_list.1.1.conv.0.weight': 'models.94.conv66.weight', 'down5.resblock.module_list.1.1.conv.1.weight': 'models.94.bn66.weight', 'down5.resblock.module_list.1.1.conv.1.bias': 'models.94.bn66.bias', 'down5.resblock.module_list.1.1.conv.1.running_mean': 'models.94.bn66.running_mean', 'down5.resblock.module_list.1.1.conv.1.running_var': 'models.94.bn66.running_var', 'down5.resblock.module_list.1.1.conv.1.num_batches_tracked': 'models.94.bn66.num_batches_tracked', 'down5.resblock.module_list.2.0.conv.0.weight': 'models.96.conv67.weight', 'down5.resblock.module_list.2.0.conv.1.weight': 'models.96.bn67.weight', 'down5.resblock.module_list.2.0.conv.1.bias': 'models.96.bn67.bias', 'down5.resblock.module_list.2.0.conv.1.running_mean': 'models.96.bn67.running_mean', 'down5.resblock.module_list.2.0.conv.1.running_var': 'models.96.bn67.running_var', 'down5.resblock.module_list.2.0.conv.1.num_batches_tracked': 'models.96.bn67.num_batches_tracked', 'down5.resblock.module_list.2.1.conv.0.weight': 'models.97.conv68.weight', 'down5.resblock.module_list.2.1.conv.1.weight': 'models.97.bn68.weight', 'down5.resblock.module_list.2.1.conv.1.bias': 'models.97.bn68.bias', 'down5.resblock.module_list.2.1.conv.1.running_mean': 'models.97.bn68.running_mean', 'down5.resblock.module_list.2.1.conv.1.running_var': 'models.97.bn68.running_var', 'down5.resblock.module_list.2.1.conv.1.num_batches_tracked': 'models.97.bn68.num_batches_tracked', 'down5.resblock.module_list.3.0.conv.0.weight': 'models.99.conv69.weight', 'down5.resblock.module_list.3.0.conv.1.weight': 'models.99.bn69.weight', 'down5.resblock.module_list.3.0.conv.1.bias': 'models.99.bn69.bias', 'down5.resblock.module_list.3.0.conv.1.running_mean': 'models.99.bn69.running_mean', 'down5.resblock.module_list.3.0.conv.1.running_var': 'models.99.bn69.running_var', 'down5.resblock.module_list.3.0.conv.1.num_batches_tracked': 'models.99.bn69.num_batches_tracked', 'down5.resblock.module_list.3.1.conv.0.weight': 'models.100.conv70.weight', 'down5.resblock.module_list.3.1.conv.1.weight': 'models.100.bn70.weight', 'down5.resblock.module_list.3.1.conv.1.bias': 'models.100.bn70.bias', 'down5.resblock.module_list.3.1.conv.1.running_mean': 'models.100.bn70.running_mean', 'down5.resblock.module_list.3.1.conv.1.running_var': 'models.100.bn70.running_var', 'down5.resblock.module_list.3.1.conv.1.num_batches_tracked': 'models.100.bn70.num_batches_tracked', 'down5.conv4.conv.0.weight': 'models.102.conv71.weight', 'down5.conv4.conv.1.weight': 'models.102.bn71.weight', 'down5.conv4.conv.1.bias': 'models.102.bn71.bias', 'down5.conv4.conv.1.running_mean': 'models.102.bn71.running_mean', 'down5.conv4.conv.1.running_var': 'models.102.bn71.running_var', 'down5.conv4.conv.1.num_batches_tracked': 'models.102.bn71.num_batches_tracked', 'down5.conv5.conv.0.weight': 'models.104.conv72.weight', 'down5.conv5.conv.1.weight': 'models.104.bn72.weight', 'down5.conv5.conv.1.bias': 'models.104.bn72.bias', 'down5.conv5.conv.1.running_mean': 'models.104.bn72.running_mean', 'down5.conv5.conv.1.running_var': 'models.104.bn72.running_var', 'down5.conv5.conv.1.num_batches_tracked': 'models.104.bn72.num_batches_tracked', 'neek.conv1.conv.0.weight': 'models.105.conv73.weight', 'neek.conv1.conv.1.weight': 'models.105.bn73.weight', 'neek.conv1.conv.1.bias': 'models.105.bn73.bias', 'neek.conv1.conv.1.running_mean': 'models.105.bn73.running_mean', 'neek.conv1.conv.1.running_var': 'models.105.bn73.running_var', 'neek.conv1.conv.1.num_batches_tracked': 'models.105.bn73.num_batches_tracked', 'neek.conv2.conv.0.weight': 'models.106.conv74.weight', 'neek.conv2.conv.1.weight': 'models.106.bn74.weight', 'neek.conv2.conv.1.bias': 'models.106.bn74.bias', 'neek.conv2.conv.1.running_mean': 'models.106.bn74.running_mean', 'neek.conv2.conv.1.running_var': 'models.106.bn74.running_var', 'neek.conv2.conv.1.num_batches_tracked': 'models.106.bn74.num_batches_tracked', 'neek.conv3.conv.0.weight': 'models.107.conv75.weight', 'neek.conv3.conv.1.weight': 'models.107.bn75.weight', 'neek.conv3.conv.1.bias': 'models.107.bn75.bias', 'neek.conv3.conv.1.running_mean': 'models.107.bn75.running_mean', 'neek.conv3.conv.1.running_var': 'models.107.bn75.running_var', 'neek.conv3.conv.1.num_batches_tracked': 'models.107.bn75.num_batches_tracked', 'neek.conv4.conv.0.weight': 'models.114.conv76.weight', 'neek.conv4.conv.1.weight': 'models.114.bn76.weight', 'neek.conv4.conv.1.bias': 'models.114.bn76.bias', 'neek.conv4.conv.1.running_mean': 'models.114.bn76.running_mean', 'neek.conv4.conv.1.running_var': 'models.114.bn76.running_var', 'neek.conv4.conv.1.num_batches_tracked': 'models.114.bn76.num_batches_tracked', 'neek.conv5.conv.0.weight': 'models.115.conv77.weight', 'neek.conv5.conv.1.weight': 'models.115.bn77.weight', 'neek.conv5.conv.1.bias': 'models.115.bn77.bias', 'neek.conv5.conv.1.running_mean': 'models.115.bn77.running_mean', 'neek.conv5.conv.1.running_var': 'models.115.bn77.running_var', 'neek.conv5.conv.1.num_batches_tracked': 'models.115.bn77.num_batches_tracked', 'neek.conv6.conv.0.weight': 'models.116.conv78.weight', 'neek.conv6.conv.1.weight': 'models.116.bn78.weight', 'neek.conv6.conv.1.bias': 'models.116.bn78.bias', 'neek.conv6.conv.1.running_mean': 'models.116.bn78.running_mean', 'neek.conv6.conv.1.running_var': 'models.116.bn78.running_var', 'neek.conv6.conv.1.num_batches_tracked': 'models.116.bn78.num_batches_tracked', 'neek.conv7.conv.0.weight': 'models.117.conv79.weight', 'neek.conv7.conv.1.weight': 'models.117.bn79.weight', 'neek.conv7.conv.1.bias': 'models.117.bn79.bias', 'neek.conv7.conv.1.running_mean': 'models.117.bn79.running_mean', 'neek.conv7.conv.1.running_var': 'models.117.bn79.running_var', 'neek.conv7.conv.1.num_batches_tracked': 'models.117.bn79.num_batches_tracked', 'neek.conv8.conv.0.weight': 'models.120.conv80.weight', 'neek.conv8.conv.1.weight': 'models.120.bn80.weight', 'neek.conv8.conv.1.bias': 'models.120.bn80.bias', 'neek.conv8.conv.1.running_mean': 'models.120.bn80.running_mean', 'neek.conv8.conv.1.running_var': 'models.120.bn80.running_var', 'neek.conv8.conv.1.num_batches_tracked': 'models.120.bn80.num_batches_tracked', 'neek.conv9.conv.0.weight': 'models.122.conv81.weight', 'neek.conv9.conv.1.weight': 'models.122.bn81.weight', 'neek.conv9.conv.1.bias': 'models.122.bn81.bias', 'neek.conv9.conv.1.running_mean': 'models.122.bn81.running_mean', 'neek.conv9.conv.1.running_var': 'models.122.bn81.running_var', 'neek.conv9.conv.1.num_batches_tracked': 'models.122.bn81.num_batches_tracked', 'neek.conv10.conv.0.weight': 'models.123.conv82.weight', 'neek.conv10.conv.1.weight': 'models.123.bn82.weight', 'neek.conv10.conv.1.bias': 'models.123.bn82.bias', 'neek.conv10.conv.1.running_mean': 'models.123.bn82.running_mean', 'neek.conv10.conv.1.running_var': 'models.123.bn82.running_var', 'neek.conv10.conv.1.num_batches_tracked': 'models.123.bn82.num_batches_tracked', 'neek.conv11.conv.0.weight': 'models.124.conv83.weight', 'neek.conv11.conv.1.weight': 'models.124.bn83.weight', 'neek.conv11.conv.1.bias': 'models.124.bn83.bias', 'neek.conv11.conv.1.running_mean': 'models.124.bn83.running_mean', 'neek.conv11.conv.1.running_var': 'models.124.bn83.running_var', 'neek.conv11.conv.1.num_batches_tracked': 'models.124.bn83.num_batches_tracked', 'neek.conv12.conv.0.weight': 'models.125.conv84.weight', 'neek.conv12.conv.1.weight': 'models.125.bn84.weight', 'neek.conv12.conv.1.bias': 'models.125.bn84.bias', 'neek.conv12.conv.1.running_mean': 'models.125.bn84.running_mean', 'neek.conv12.conv.1.running_var': 'models.125.bn84.running_var', 'neek.conv12.conv.1.num_batches_tracked': 'models.125.bn84.num_batches_tracked', 'neek.conv13.conv.0.weight': 'models.126.conv85.weight', 'neek.conv13.conv.1.weight': 'models.126.bn85.weight', 'neek.conv13.conv.1.bias': 'models.126.bn85.bias', 'neek.conv13.conv.1.running_mean': 'models.126.bn85.running_mean', 'neek.conv13.conv.1.running_var': 'models.126.bn85.running_var', 'neek.conv13.conv.1.num_batches_tracked': 'models.126.bn85.num_batches_tracked', 'neek.conv14.conv.0.weight': 'models.127.conv86.weight', 'neek.conv14.conv.1.weight': 'models.127.bn86.weight', 'neek.conv14.conv.1.bias': 'models.127.bn86.bias', 'neek.conv14.conv.1.running_mean': 'models.127.bn86.running_mean', 'neek.conv14.conv.1.running_var': 'models.127.bn86.running_var', 'neek.conv14.conv.1.num_batches_tracked': 'models.127.bn86.num_batches_tracked', 'neek.conv15.conv.0.weight': 'models.130.conv87.weight', 'neek.conv15.conv.1.weight': 'models.130.bn87.weight', 'neek.conv15.conv.1.bias': 'models.130.bn87.bias', 'neek.conv15.conv.1.running_mean': 'models.130.bn87.running_mean', 'neek.conv15.conv.1.running_var': 'models.130.bn87.running_var', 'neek.conv15.conv.1.num_batches_tracked': 'models.130.bn87.num_batches_tracked', 'neek.conv16.conv.0.weight': 'models.132.conv88.weight', 'neek.conv16.conv.1.weight': 'models.132.bn88.weight', 'neek.conv16.conv.1.bias': 'models.132.bn88.bias', 'neek.conv16.conv.1.running_mean': 'models.132.bn88.running_mean', 'neek.conv16.conv.1.running_var': 'models.132.bn88.running_var', 'neek.conv16.conv.1.num_batches_tracked': 'models.132.bn88.num_batches_tracked', 'neek.conv17.conv.0.weight': 'models.133.conv89.weight', 'neek.conv17.conv.1.weight': 'models.133.bn89.weight', 'neek.conv17.conv.1.bias': 'models.133.bn89.bias', 'neek.conv17.conv.1.running_mean': 'models.133.bn89.running_mean', 'neek.conv17.conv.1.running_var': 'models.133.bn89.running_var', 'neek.conv17.conv.1.num_batches_tracked': 'models.133.bn89.num_batches_tracked', 'neek.conv18.conv.0.weight': 'models.134.conv90.weight', 'neek.conv18.conv.1.weight': 'models.134.bn90.weight', 'neek.conv18.conv.1.bias': 'models.134.bn90.bias', 'neek.conv18.conv.1.running_mean': 'models.134.bn90.running_mean', 'neek.conv18.conv.1.running_var': 'models.134.bn90.running_var', 'neek.conv18.conv.1.num_batches_tracked': 'models.134.bn90.num_batches_tracked', 'neek.conv19.conv.0.weight': 'models.135.conv91.weight', 'neek.conv19.conv.1.weight': 'models.135.bn91.weight', 'neek.conv19.conv.1.bias': 'models.135.bn91.bias', 'neek.conv19.conv.1.running_mean': 'models.135.bn91.running_mean', 'neek.conv19.conv.1.running_var': 'models.135.bn91.running_var', 'neek.conv19.conv.1.num_batches_tracked': 'models.135.bn91.num_batches_tracked', 'neek.conv20.conv.0.weight': 'models.136.conv92.weight', 'neek.conv20.conv.1.weight': 'models.136.bn92.weight', 'neek.conv20.conv.1.bias': 'models.136.bn92.bias', 'neek.conv20.conv.1.running_mean': 'models.136.bn92.running_mean', 'neek.conv20.conv.1.running_var': 'models.136.bn92.running_var', 'neek.conv20.conv.1.num_batches_tracked': 'models.136.bn92.num_batches_tracked', 'head.conv1.conv.0.weight': 'models.137.conv93.weight', 'head.conv1.conv.1.weight': 'models.137.bn93.weight', 'head.conv1.conv.1.bias': 'models.137.bn93.bias', 'head.conv1.conv.1.running_mean': 'models.137.bn93.running_mean', 'head.conv1.conv.1.running_var': 'models.137.bn93.running_var', 'head.conv1.conv.1.num_batches_tracked': 'models.137.bn93.num_batches_tracked', 'head.conv2.conv.0.weight': 'models.138.conv94.weight', 'head.conv2.conv.0.bias': 'models.138.conv94.bias', 'head.conv3.conv.0.weight': 'models.141.conv95.weight', 'head.conv3.conv.1.weight': 'models.141.bn95.weight', 'head.conv3.conv.1.bias': 'models.141.bn95.bias', 'head.conv3.conv.1.running_mean': 'models.141.bn95.running_mean', 'head.conv3.conv.1.running_var': 'models.141.bn95.running_var', 'head.conv3.conv.1.num_batches_tracked': 'models.141.bn95.num_batches_tracked', 'head.conv4.conv.0.weight': 'models.143.conv96.weight', 'head.conv4.conv.1.weight': 'models.143.bn96.weight', 'head.conv4.conv.1.bias': 'models.143.bn96.bias', 'head.conv4.conv.1.running_mean': 'models.143.bn96.running_mean', 'head.conv4.conv.1.running_var': 'models.143.bn96.running_var', 'head.conv4.conv.1.num_batches_tracked': 'models.143.bn96.num_batches_tracked', 'head.conv5.conv.0.weight': 'models.144.conv97.weight', 'head.conv5.conv.1.weight': 'models.144.bn97.weight', 'head.conv5.conv.1.bias': 'models.144.bn97.bias', 'head.conv5.conv.1.running_mean': 'models.144.bn97.running_mean', 'head.conv5.conv.1.running_var': 'models.144.bn97.running_var', 'head.conv5.conv.1.num_batches_tracked': 'models.144.bn97.num_batches_tracked', 'head.conv6.conv.0.weight': 'models.145.conv98.weight', 'head.conv6.conv.1.weight': 'models.145.bn98.weight', 'head.conv6.conv.1.bias': 'models.145.bn98.bias', 'head.conv6.conv.1.running_mean': 'models.145.bn98.running_mean', 'head.conv6.conv.1.running_var': 'models.145.bn98.running_var', 'head.conv6.conv.1.num_batches_tracked': 'models.145.bn98.num_batches_tracked', 'head.conv7.conv.0.weight': 'models.146.conv99.weight', 'head.conv7.conv.1.weight': 'models.146.bn99.weight', 'head.conv7.conv.1.bias': 'models.146.bn99.bias', 'head.conv7.conv.1.running_mean': 'models.146.bn99.running_mean', 'head.conv7.conv.1.running_var': 'models.146.bn99.running_var', 'head.conv7.conv.1.num_batches_tracked': 'models.146.bn99.num_batches_tracked', 'head.conv8.conv.0.weight': 'models.147.conv100.weight', 'head.conv8.conv.1.weight': 'models.147.bn100.weight', 'head.conv8.conv.1.bias': 'models.147.bn100.bias', 'head.conv8.conv.1.running_mean': 'models.147.bn100.running_mean', 'head.conv8.conv.1.running_var': 'models.147.bn100.running_var', 'head.conv8.conv.1.num_batches_tracked': 'models.147.bn100.num_batches_tracked', 'head.conv9.conv.0.weight': 'models.148.conv101.weight', 'head.conv9.conv.1.weight': 'models.148.bn101.weight', 'head.conv9.conv.1.bias': 'models.148.bn101.bias', 'head.conv9.conv.1.running_mean': 'models.148.bn101.running_mean', 'head.conv9.conv.1.running_var': 'models.148.bn101.running_var', 'head.conv9.conv.1.num_batches_tracked': 'models.148.bn101.num_batches_tracked', 'head.conv10.conv.0.weight': 'models.149.conv102.weight', 'head.conv10.conv.0.bias': 'models.149.conv102.bias', 'head.conv11.conv.0.weight': 'models.152.conv103.weight', 'head.conv11.conv.1.weight': 'models.152.bn103.weight', 'head.conv11.conv.1.bias': 'models.152.bn103.bias', 'head.conv11.conv.1.running_mean': 'models.152.bn103.running_mean', 'head.conv11.conv.1.running_var': 'models.152.bn103.running_var', 'head.conv11.conv.1.num_batches_tracked': 'models.152.bn103.num_batches_tracked', 'head.conv12.conv.0.weight': 'models.154.conv104.weight', 'head.conv12.conv.1.weight': 'models.154.bn104.weight', 'head.conv12.conv.1.bias': 'models.154.bn104.bias', 'head.conv12.conv.1.running_mean': 'models.154.bn104.running_mean', 'head.conv12.conv.1.running_var': 'models.154.bn104.running_var', 'head.conv12.conv.1.num_batches_tracked': 'models.154.bn104.num_batches_tracked', 'head.conv13.conv.0.weight': 'models.155.conv105.weight', 'head.conv13.conv.1.weight': 'models.155.bn105.weight', 'head.conv13.conv.1.bias': 'models.155.bn105.bias', 'head.conv13.conv.1.running_mean': 'models.155.bn105.running_mean', 'head.conv13.conv.1.running_var': 'models.155.bn105.running_var', 'head.conv13.conv.1.num_batches_tracked': 'models.155.bn105.num_batches_tracked', 'head.conv14.conv.0.weight': 'models.156.conv106.weight', 'head.conv14.conv.1.weight': 'models.156.bn106.weight', 'head.conv14.conv.1.bias': 'models.156.bn106.bias', 'head.conv14.conv.1.running_mean': 'models.156.bn106.running_mean', 'head.conv14.conv.1.running_var': 'models.156.bn106.running_var', 'head.conv14.conv.1.num_batches_tracked': 'models.156.bn106.num_batches_tracked', 'head.conv15.conv.0.weight': 'models.157.conv107.weight', 'head.conv15.conv.1.weight': 'models.157.bn107.weight', 'head.conv15.conv.1.bias': 'models.157.bn107.bias', 'head.conv15.conv.1.running_mean': 'models.157.bn107.running_mean', 'head.conv15.conv.1.running_var': 'models.157.bn107.running_var', 'head.conv15.conv.1.num_batches_tracked': 'models.157.bn107.num_batches_tracked', 'head.conv16.conv.0.weight': 'models.158.conv108.weight', 'head.conv16.conv.1.weight': 'models.158.bn108.weight', 'head.conv16.conv.1.bias': 'models.158.bn108.bias', 'head.conv16.conv.1.running_mean': 'models.158.bn108.running_mean', 'head.conv16.conv.1.running_var': 'models.158.bn108.running_var', 'head.conv16.conv.1.num_batches_tracked': 'models.158.bn108.num_batches_tracked', 'head.conv17.conv.0.weight': 'models.159.conv109.weight', 'head.conv17.conv.1.weight': 'models.159.bn109.weight', 'head.conv17.conv.1.bias': 'models.159.bn109.bias', 'head.conv17.conv.1.running_mean': 'models.159.bn109.running_mean', 'head.conv17.conv.1.running_var': 'models.159.bn109.running_var', 'head.conv17.conv.1.num_batches_tracked': 'models.159.bn109.num_batches_tracked', 'head.conv18.conv.0.weight': 'models.160.conv110.weight', 'head.conv18.conv.0.bias': 'models.160.conv110.bias', } pth_weights = torch.load(checkpoint) pt_weights = type(pth_weights)() for name, new_name in name_mapping.items(): pt_weights[new_name] = pth_weights[name] return pt_weights def convert_pt_checkpoint_to_keras_h5(state_dict): print('============================================================') def copy1(conv, bn, idx): keyword1 = 'conv%d.weight' % idx keyword2 = 'bn%d.weight' % idx keyword3 = 'bn%d.bias' % idx keyword4 = 'bn%d.running_mean' % idx keyword5 = 'bn%d.running_var' % idx for key in state_dict: value = state_dict[key].numpy() if keyword1 in key: w = value elif keyword2 in key: y = value elif keyword3 in key: b = value elif keyword4 in key: m = value elif keyword5 in key: v = value w = w.transpose(2, 3, 1, 0) conv.set_weights([w]) bn.set_weights([y, b, m, v]) def copy2(conv, idx): keyword1 = 'conv%d.weight' % idx keyword2 = 'conv%d.bias' % idx for key in state_dict: value = state_dict[key].numpy() if keyword1 in key: w = value elif keyword2 in key: b = value w = w.transpose(2, 3, 1, 0) conv.set_weights([w, b]) num_classes = 80 num_anchors = 3 with tf.Session(graph=tf.Graph()): inputs = layers.Input(shape=[], dtype='string') model_body = YOLOv4(inputs, num_classes, num_anchors) model_body.summary() layer_name_to_idx = {layer.name: idx for idx, layer in enumerate(model_body.layers)} print('\nCopying...') i1 = layer_name_to_idx['conv2d'] i2 = layer_name_to_idx['batch_normalization'] copy1(model_body.layers[i1], model_body.layers[i2], 1) for i in range(2, 94, 1): i1 = layer_name_to_idx['conv2d_%d' % (i - 1)] i2 = layer_name_to_idx['batch_normalization_%d' % (i - 1)] copy1(model_body.layers[i1], model_body.layers[i2], i) for i in range(95, 102, 1): i1 = layer_name_to_idx['conv2d_%d' % (i - 1)] i2 = layer_name_to_idx['batch_normalization_%d' % (i - 2,)] copy1(model_body.layers[i1], model_body.layers[i2], i) for i in range(103, 110, 1): i1 = layer_name_to_idx['conv2d_%d' % (i - 1)] i2 = layer_name_to_idx['batch_normalization_%d' % (i - 3,)] copy1(model_body.layers[i1], model_body.layers[i2], i) i1 = layer_name_to_idx['conv2d_93'] copy2(model_body.layers[i1], 94) i1 = layer_name_to_idx['conv2d_101'] copy2(model_body.layers[i1], 102) i1 = layer_name_to_idx['conv2d_109'] copy2(model_body.layers[i1], 110) weights = model_body.get_weights() print('\nDone.') return weights class Mish(layers.Layer): def __init__(self): super(Mish, self).__init__() def compute_output_shape(self, input_shape): return input_shape def call(self, x): return x * tf.tanh(tf.math.softplus(x)) def conv2d_unit(x, filters, kernels, strides=1, padding='valid', bn=1, act='mish'): use_bias = (bn != 1) x = layers.Conv2D(filters, kernels, padding=padding, strides=strides, use_bias=use_bias, activation='linear', kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.01))(x) if bn: x = layers.BatchNormalization(fused=False)(x) if act == 'leaky': x = keras.layers.LeakyReLU(alpha=0.1)(x) elif act == 'mish': x = Mish()(x) return x def residual_block(inputs, filters_1, filters_2): x = conv2d_unit(inputs, filters_1, 1, strides=1, padding='valid') x = conv2d_unit(x, filters_2, 3, strides=1, padding='same') x = layers.add([inputs, x]) return x def stack_residual_block(inputs, filters_1, filters_2, n): x = residual_block(inputs, filters_1, filters_2) for i in range(n - 1): x = residual_block(x, filters_1, filters_2) return x def spp(x): x_1 = x x_2 = layers.MaxPooling2D(pool_size=5, strides=1, padding='same')(x) x_3 = layers.MaxPooling2D(pool_size=9, strides=1, padding='same')(x) x_4 = layers.MaxPooling2D(pool_size=13, strides=1, padding='same')(x) out = layers.Concatenate()([x_4, x_3, x_2, x_1]) return out def YOLOv4(inputs, num_classes, num_anchors, input_shape=(608, 608), initial_filters=32, fast=False, anchors=None, conf_thresh=0.05, nms_thresh=0.45, keep_top_k=100, nms_top_k=100): i32 = initial_filters i64 = i32 * 2 i128 = i32 * 4 i256 = i32 * 8 i512 = i32 * 16 i1024 = i32 * 32 x, image_shape = layers.Lambda(lambda t: preprocessor(t, input_shape))(inputs) # cspdarknet53 x = conv2d_unit(x, i32, 3, strides=1, padding='same') # ============================= s2 ============================= x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x) x = conv2d_unit(x, i64, 3, strides=2) s2 = conv2d_unit(x, i64, 1, strides=1) x = conv2d_unit(x, i64, 1, strides=1) x = stack_residual_block(x, i32, i64, n=1) x = conv2d_unit(x, i64, 1, strides=1) x = layers.Concatenate()([x, s2]) s2 = conv2d_unit(x, i64, 1, strides=1) # ============================= s4 ============================= x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s2) x = conv2d_unit(x, i128, 3, strides=2) s4 = conv2d_unit(x, i64, 1, strides=1) x = conv2d_unit(x, i64, 1, strides=1) x = stack_residual_block(x, i64, i64, n=2) x = conv2d_unit(x, i64, 1, strides=1) x = layers.Concatenate()([x, s4]) s4 = conv2d_unit(x, i128, 1, strides=1) # ============================= s8 ============================= x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s4) x = conv2d_unit(x, i256, 3, strides=2) s8 = conv2d_unit(x, i128, 1, strides=1) x = conv2d_unit(x, i128, 1, strides=1) x = stack_residual_block(x, i128, i128, n=8) x = conv2d_unit(x, i128, 1, strides=1) x = layers.Concatenate()([x, s8]) s8 = conv2d_unit(x, i256, 1, strides=1) # ============================= s16 ============================= x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s8) x = conv2d_unit(x, i512, 3, strides=2) s16 = conv2d_unit(x, i256, 1, strides=1) x = conv2d_unit(x, i256, 1, strides=1) x = stack_residual_block(x, i256, i256, n=8) x = conv2d_unit(x, i256, 1, strides=1) x = layers.Concatenate()([x, s16]) s16 = conv2d_unit(x, i512, 1, strides=1) # ============================= s32 ============================= x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(s16) x = conv2d_unit(x, i1024, 3, strides=2) s32 = conv2d_unit(x, i512, 1, strides=1) x = conv2d_unit(x, i512, 1, strides=1) x = stack_residual_block(x, i512, i512, n=4) x = conv2d_unit(x, i512, 1, strides=1) x = layers.Concatenate()([x, s32]) s32 = conv2d_unit(x, i1024, 1, strides=1) # fpn x = conv2d_unit(s32, i512, 1, strides=1, act='leaky') x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i512, 1, strides=1, act='leaky') x = spp(x) x = conv2d_unit(x, i512, 1, strides=1, act='leaky') x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky') fpn_s32 = conv2d_unit(x, i512, 1, strides=1, act='leaky') # pan01 x = conv2d_unit(fpn_s32, i256, 1, strides=1, act='leaky') x = layers.UpSampling2D(2)(x) s16 = conv2d_unit(s16, i256, 1, strides=1, act='leaky') x = layers.Concatenate()([s16, x]) x = conv2d_unit(x, i256, 1, strides=1, act='leaky') x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i256, 1, strides=1, act='leaky') x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky') fpn_s16 = conv2d_unit(x, i256, 1, strides=1, act='leaky') # pan02 x = conv2d_unit(fpn_s16, i128, 1, strides=1, act='leaky') x = layers.UpSampling2D(2)(x) s8 = conv2d_unit(s8, i128, 1, strides=1, act='leaky') x = layers.Concatenate()([s8, x]) x = conv2d_unit(x, i128, 1, strides=1, act='leaky') x = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i128, 1, strides=1, act='leaky') x = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i128, 1, strides=1, act='leaky') # output_s, doesn't need concat() output_s = conv2d_unit(x, i256, 3, strides=1, padding='same', act='leaky') output_s = conv2d_unit(output_s, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None) # output_m, need concat() x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x) x = conv2d_unit(x, i256, 3, strides=2, act='leaky') x = layers.Concatenate()([x, fpn_s16]) x = conv2d_unit(x, i256, 1, strides=1, act='leaky') x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i256, 1, strides=1, act='leaky') x = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i256, 1, strides=1, act='leaky') output_m = conv2d_unit(x, i512, 3, strides=1, padding='same', act='leaky') output_m = conv2d_unit(output_m, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None) # output_l, need concat() x = layers.ZeroPadding2D(padding=((1, 0), (1, 0)))(x) x = conv2d_unit(x, i512, 3, strides=2, act='leaky') x = layers.Concatenate()([x, fpn_s32]) x = conv2d_unit(x, i512, 1, strides=1, act='leaky') x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i512, 1, strides=1, act='leaky') x = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky') x = conv2d_unit(x, i512, 1, strides=1, act='leaky') output_l = conv2d_unit(x, i1024, 3, strides=1, padding='same', act='leaky') output_l = conv2d_unit(output_l, num_anchors * (num_classes + 5), 1, strides=1, bn=0, act=None) def cast_float32(tensor): return tf.cast(tensor, tf.float32) output_l = layers.Lambda(cast_float32)(output_l) output_m = layers.Lambda(cast_float32)(output_m) output_s = layers.Lambda(cast_float32)(output_s) # originally reshape in multi_thread_post output_lr = layers.Reshape((1, input_shape[0] // 32, input_shape[1] // 32, 3, 5 + num_classes))(output_l) output_mr = layers.Reshape((1, input_shape[0] // 16, input_shape[1] // 16, 3, 5 + num_classes))(output_m) output_sr = layers.Reshape((1, input_shape[0] // 8, input_shape[1] // 8, 3, 5 + num_classes))(output_s) # originally _yolo_out masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] anchors = [[12, 16], [19, 36], [40, 28], [36, 75], [76, 55], [72, 146], [142, 110], [192, 243], [459, 401]] def batch_process_feats(out, anchors, mask): grid_h, grid_w, num_boxes = map(int, out.shape[2:5]) anchors = [anchors[i] for i in mask] anchors_tensor = np.array(anchors).reshape(1, 1, len(anchors), 2) # Reshape to batch, height, width, num_anchors, box_params. box_xy = tf.sigmoid(out[..., :2]) box_wh = tf.exp(out[..., 2:4]) box_wh = box_wh * anchors_tensor box_confidence = tf.sigmoid(out[..., 4]) box_confidence = tf.expand_dims(box_confidence, axis=-1) box_class_probs = tf.sigmoid(out[..., 5:]) col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w) row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h) col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2) row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2) grid = np.concatenate((col, row), axis=-1).astype(np.float32) box_xy += grid box_xy /= (grid_w, grid_h) box_wh /= input_shape box_xy -= (box_wh / 2.) # normalized xywh boxes = tf.concat((box_xy, box_xy + box_wh), axis=-1) box_scores = box_confidence * box_class_probs num_boxes = np.prod(boxes.shape[1:-1]) boxes = tf.reshape(boxes, [-1, num_boxes, boxes.shape[-1]]) box_scores = tf.reshape(box_scores, [-1, num_boxes, box_scores.shape[-1]]) return boxes, box_scores def filter_boxes(outputs): boxes_l, boxes_m, boxes_s, box_scores_l, box_scores_m, box_scores_s, image_shape = outputs boxes_l, box_scores_l = filter_boxes_one_size(boxes_l, box_scores_l) boxes_m, box_scores_m = filter_boxes_one_size(boxes_m, box_scores_m) boxes_s, box_scores_s = filter_boxes_one_size(boxes_s, box_scores_s) boxes = tf.concat([boxes_l, boxes_m, boxes_s], axis=0) box_scores = tf.concat([box_scores_l, box_scores_m, box_scores_s], axis=0) image_shape_wh = image_shape[1::-1] image_shape_whwh = tf.concat([image_shape_wh, image_shape_wh], axis=-1) image_shape_whwh = tf.cast(image_shape_whwh, tf.float32) boxes *= image_shape_whwh boxes = tf.expand_dims(boxes, 0) box_scores = tf.expand_dims(box_scores, 0) boxes = tf.expand_dims(boxes, 2) nms_boxes, nms_scores, nms_classes, valid_detections = tf.image.combined_non_max_suppression( boxes, box_scores, max_output_size_per_class=nms_top_k, max_total_size=nms_top_k, iou_threshold=nms_thresh, score_threshold=conf_thresh, pad_per_class=False, clip_boxes=False, name='CombinedNonMaxSuppression', ) return nms_boxes[0], nms_scores[0], nms_classes[0] def filter_boxes_one_size(boxes, box_scores): box_class_scores = tf.reduce_max(box_scores, axis=-1) keep = box_class_scores > conf_thresh boxes = boxes[keep] box_scores = box_scores[keep] return boxes, box_scores def batch_yolo_out(outputs): with tf.name_scope('yolo_out'): b_output_lr, b_output_mr, b_output_sr, b_image_shape = outputs with tf.name_scope('process_feats'): b_boxes_l, b_box_scores_l = batch_process_feats(b_output_lr, anchors, masks[0]) with tf.name_scope('process_feats'): b_boxes_m, b_box_scores_m = batch_process_feats(b_output_mr, anchors, masks[1]) with tf.name_scope('process_feats'): b_boxes_s, b_box_scores_s = batch_process_feats(b_output_sr, anchors, masks[2]) with tf.name_scope('filter_boxes'): b_nms_boxes, b_nms_scores, b_nms_classes = tf.map_fn( filter_boxes, [b_boxes_l, b_boxes_m, b_boxes_s, b_box_scores_l, b_box_scores_m, b_box_scores_s, b_image_shape], dtype=(tf.float32, tf.float32, tf.float32), back_prop=False, parallel_iterations=16) return b_nms_boxes, b_nms_scores, b_nms_classes boxes_scores_classes = layers.Lambda(batch_yolo_out)([output_lr, output_mr, output_sr, image_shape]) model_body = keras.models.Model(inputs=inputs, outputs=boxes_scores_classes) return model_body def decode_jpeg_resize(input_tensor, image_size): tensor = tf.image.decode_png(input_tensor, channels=3) shape = tf.shape(tensor) tensor = tf.cast(tensor, tf.float32) tensor = tf.image.resize(tensor, image_size) tensor /= 255.0 return tf.cast(tensor, tf.float16), shape def preprocessor(input_tensor, image_size): with tf.name_scope('Preprocessor'): tensor = tf.map_fn( partial(decode_jpeg_resize, image_size=image_size), input_tensor, dtype=(tf.float16, tf.int32), back_prop=False, parallel_iterations=16) return tensor def main(): os.system('aws s3 cp s3://neuron-s3/training_checkpoints/pytorch/yolov4/yolov4.pth . --no-sign-request') torch_weights = rename_weights('./yolov4.pth') keras_weights = convert_pt_checkpoint_to_keras_h5(torch_weights) keras.backend.set_learning_phase(0) num_anchors = 3 num_classes = 80 input_shape = (608, 608) conf_thresh = 0.001 nms_thresh = 0.45 inputs = layers.Input(shape=[], dtype='string') yolo = YOLOv4(inputs, num_classes, num_anchors, input_shape, conf_thresh=conf_thresh, nms_thresh=nms_thresh) yolo.set_weights(keras_weights) sess = keras.backend.get_session() inputs = {'image': yolo.inputs[0]} output_names = ['boxes', 'scores', 'classes'] outputs = {name: ts for name, ts in zip(output_names, yolo.outputs)} tf.saved_model.simple_save(sess, './yolo_v4_coco_saved_model', inputs, outputs) if __name__ == '__main__': main() ================================================ FILE: src/helperscripts/installationScripts/python_instructions.txt ================================================ # AL2 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # U20 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # AL2 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx Upgrade(1.13) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(1.13) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx Upgrade(1.12) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.12.0 --neuron-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(1.12) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.12.0 --neuron-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx Upgrade(1.11) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --neuron-version=2.4.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(1.11) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.11.0 --neuron-version=2.4.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 tensorflow Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 EFA Installation .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 EFA Installation .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 PyTorch DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework # U20 PyTorch DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework # AL2 tensorflow Neuronx upgrade(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx upgrade(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx upgrade(2.9) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx upgrade(2.9) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx upgrade(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx upgrade(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx Install(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx Install(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx Install(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.8 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx Install(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.8 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 tensorflow Neuronx Install(2.7) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.7 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 tensorflow Neuronx Install(2.7) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.7 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami --category=compiler_framework # AL2 Tensorflow DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=tensorflow --framework-version=2.10 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework # U20 Tensorflow DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=tensorflow --framework-version=2.10 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework # AL2 PyTorch Neuron DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=inf1 --ami=dlami-framework # U20 PyTorch Neuron DLAMI .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=all --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=inf1 --ami=dlami-framework # U22 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U22 Tensorflow Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U22 Pytorch Neuron Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami # U22 Tensorflow Neuron Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=non-dlami # AL2 Pytorch Neuronx DLAMI Upgrade(1.13) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework # U20 Pytorch Neuronx DLAMI Upgrade(1.13) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework # AL2 tensorflow Neuronx upgrade DLAMI(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework # AL2 tensorflow Neuronx upgrade DLAMI(2.9) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework # AL2 tensorflow Neuronx upgrade DLAMI(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework --category=compiler_framework # U20 tensorflow Neuronx upgrade DLAMI(2.10) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework # U20 tensorflow Neuronx upgrade DLAMI(2.9) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.9.3 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework # U20 tensorflow Neuronx upgrade(2.8) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=tensorflow --framework-version=2.8.4 --neuron-version=2.10.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework --category=compiler_framework # U20 Pytorch Neuronx 2.0 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx 2.0 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U22 Pytorch Neuronx 2.0 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx Upgrade(2.0) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(2.0) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # U22 Pytorch Neuronx Upgrade(2.0) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2 Pytorch Neuronx DLAMI Upgrade(2.0) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2 --instance=trn1 --ami=dlami-framework # U20 Pytorch Neuronx DLAMI Upgrade(2.0) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework # AL2023 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # AL2023 tensorflow Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=compiler_framework # Al2023 Pytorch Neuronx 2.0 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # AL2023 tensorflow Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=tensorflow --framework-version=2.10.1.1.0.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=compiler_framework # U20 Pytorch Neuronx 2.1 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2023 Pytorch Neuronx 2.1 Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.5 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 Pytorch Neuronx Upgrade(2.1) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(2.1) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # U22 2.5.1 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 Pytorch Neuronx DLAMI Upgrade(2.1) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=dlami-framework # U20 Pytorch Neuronx DLAMI Upgrade(2.1) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=dlami-framework # U22 Neuron DLAMI - Torch-Neuronx-1.13.1 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron # U22 Neuron DLAMI - Torch-Neuronx- 2.1.1 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron # U22 Neuron DLAMI - Tensorflow-Neuronx- 2.10.1 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron # U22 Neuron DLAMI - Transofrmers-Neuronx .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=transformers-neuronx --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=dlami-neuron # U22 Neuron DLAMI - Torch-Neuron-1.13.1 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=dlami-neuron # U22 Neuron DLAMI - Tensorflow-Neuron- 2.10.1 .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=tensorflow --framework-version=2.10.1 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=inf1 --ami=dlami-neuron # Rocky Linux 9 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=rockylinux9 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # AL2023 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=1.13.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # U22 2.1 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 2.1 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U20 2.1 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # U20 Pytorch Neuronx Upgrade(2.1) .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.1.2 --file=src/helperscripts/n2-manifest.json --os=ubuntu20 --instance=trn1 --ami=non-dlami # AL2023 2.5.1 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.5.1 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # AL2023 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # U22 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # AL2 EFA Installation .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 EFA Installation .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U22 2.6.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 2.6.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.6.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.6.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 2.7.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.7.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 2.7.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.7.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.7.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 2.8.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.8.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # AL2023 Latest Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=amazonlinux2023 --instance=trn1 --ami=non-dlami # U22 2.8.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U22 2.9.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U22 Latest Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu22 --instance=trn1 --ami=non-dlami # U24 EFA Installation .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=efa --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami # U24 2.9.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami # U24 2.9.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami # U24 Driver and Tools .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --framework=pytorch --framework-version=2.9.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami --category=driver_runtime_tools # U24 2.8.0 Pytorch Neuronx Upgrade .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=update --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami # U24 2.8.0 Pytorch Neuronx Install .. program-output:: python3 src/helperscripts/n2-helper.py --install-type=install --category=compiler_framework --framework=pytorch --framework-version=2.8.0 --file=src/helperscripts/n2-manifest.json --os=ubuntu24 --instance=trn1 --ami=non-dlami ================================================ FILE: src/helperscripts/n2-helper.py ================================================ import json import argparse from packaging.version import Version, parse import pandas as pd from pandas import json_normalize class manifest: def __init__(self, manifest_file): self.manifest_file = manifest_file self.df_packages = pd.DataFrame() def parse_manifest(self): with open(self.manifest_file, 'r') as f: manifest = json.load(f) # repos self.df_repos = json_normalize(manifest['repos_n2']) # latest release self.df_latest_release = json_normalize(manifest['latest_release']) # os properties self.df_os_properties = json_normalize(manifest['os_properties']) # ami properties self.df_ami_properties = json_normalize(manifest['ami_properties']) # dlami properties self.df_dlami_properties = json_normalize(manifest['dlami_properties']) # major version properties self.df_major_version_properties = json_normalize(manifest['major_version_properties']) # package properties self.df_package_properties = json_normalize(manifest['package_properties']) # neuron releases for release in manifest['neuron_releases']: df_release = json_normalize(release['packages']) df_release['neuron_version'] = release['neuron_version'] self.df_packages = pd.concat([self.df_packages, df_release]) # merge release packages self.df_release_packages = self.df_packages.merge(self.df_package_properties, how='left', on='name') self.df_release_packages['supported_instances'] = self.df_release_packages['supported_instances'].tolist() def merge_release_packages(self): self.df_release_packages = self.df_packages.merge(self.df_package_properties, how='left', on='name') def extract_major_minor_version(self, version): return str(version.major) + '.' + str(version.minor) def get_pip_packages_supporting_python_versions(self, args): ''' Get supported python version by packages (compiler and framework) e.g., {"3.6","3.7","3.8"} ''' if args.neuron_version == None: neuron_version = self.get_latest_neuron_version_per_instance(args.instance) else: neuron_version = args.neuron_version df_instance = self.df_release_packages[ (self.df_release_packages['supported_instances'].map(lambda x: args.instance in x)) & ( self.df_release_packages['neuron_version'] == neuron_version)] # Compiler supporting Python versions compiler_python_versions = \ df_instance.loc[df_instance['component'] == 'Compiler']['supported_python_versions'].values[0] # Specific framework version supporting Python versions df_framework = df_instance.loc[df_instance['category'] == args.framework].copy() df_framework['version'] = df_framework['version'].map(lambda x: Version(x)) df_framework['major_minor_version'] = df_framework['version'].map(lambda x: str(x.major) + '.' + str(x.minor)) framework_python_versions = df_framework.loc[ df_framework['major_minor_version'] == self.extract_major_minor_version(Version(args.framework_version))][ 'supported_python_versions'].values[0] return list(set(compiler_python_versions) & set(framework_python_versions)) def get_major_version(self, package_name, instance): return self.df_major_version_properties.loc[(self.df_major_version_properties['name'] == package_name)][ args.instance].values[0] def generate_script(self, args): ''' It generates: (1) str_preamble (2) str_driver (3) str_runtime (4) str_tools (5) str_python (6) str_compiler (7) str_framework ''' str_preamble = '' # Install and enable EPEL (required only for rocky linux 9 currently) str_preamble += self.install_and_enable_epel(args) # Configure Neuron repository str_preamble += self.config_neuron_repository(args) # Update OS packages str_preamble += self.update_os_packages(args) # Install OS headers str_preamble += self.install_os_headers(args) # Install git str_preamble += self.install_git(args) # Install Neuron driver str_driver = self.install_neuron_driver(args) # Install Neuron runtime str_runtime = self.install_neuron_runtime(args) # Install EFA driver str_efa = self.install_efa_driver(args) # Install Neuron Tools str_tools = self.install_neuron_system_tools(args) # Add PATH if args.mode != 'compile' or args.ami != 'dlami-framework': str_tools += '\n# Add PATH\n' str_tools += 'export PATH=/opt/aws/neuron/bin:$PATH\n' # Install Python virtual environment str_python = self.set_python_venv(args) # Activate Pythohn venv str_python += self.activate_python_venv(args) # install jupyter notebook str_python += self.jupyter_notebook(args) # Set pip repository str_python += self.set_pip_repository() # Install wget, awscli str_python += self.install_aux(args) # install extra dependencies str_deps = self.install_extra_dependencies(args) # Install Neuron compiler str_compiler = self.install_neuron_compiler(args) # Install Neuron framework str_framework = self.install_neuron_framework(args) # install neuron compiler and framework str_compiler_framework = self.install_neuron_compiler_and_framework(args) if args.ami == 'dlami-framework': # dlami instructions str_dlami = self.install_dlami(args) return str_dlami elif args.ami == 'dlami-neuron': str_dlami = self.install_neuron_dlami(args) return str_dlami elif args.category == 'all': if args.instance == 'trn1': str_runtime += str_efa return str_preamble + str_driver + str_runtime + str_tools + str_deps + str_python + str_compiler_framework elif args.category == 'driver_runtime_tools': return str_preamble + str_driver + str_runtime + str_tools elif args.category == 'compiler_framework': return str_deps + str_python + str_compiler_framework elif args.category == 'driver': return str_preamble + str_driver elif args.category == 'runtime': return str_runtime elif args.category == 'tools': return str_tools elif args.category == 'compiler': if args.instance != 'inf1': return str_python + str_compiler else: return str_python elif args.category == 'framework': return str_framework elif args.category == 'efa': return str_efa def install_dlami(self, args): latest_release_for_instance = \ self.df_latest_release.loc[self.df_latest_release['instance'] == args.instance]['version'].values[0] latest_release_for_dlami = self.df_dlami_properties[ (self.df_dlami_properties['framework'] == args.framework) & ( self.df_dlami_properties['supported_instances'].map(lambda x: args.instance in x))][ 'neuron_released_version'].values[0] if (latest_release_for_instance == latest_release_for_dlami): return self.activate_python_venv(args) else: args.install_type = 'update' str_dlami = self.activate_python_venv(args) str_dlami += self.jupyter_notebook(args) str_dlami += self.set_pip_repository() str_dlami += self.install_neuron_compiler_and_framework(args) return str_dlami def install_neuron_dlami(self, args): str_dlami = "" if ((args.instance == 'trn1' or args.instance == 'inf2') and args.category == "transformers-neuronx"): str_dlami = '\n# Activate Python venv for Transformers-NeuronX \n' str_dlami += "source /opt/aws_neuronx_venv_transformers_neuronx/bin/activate" elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == "pytorch" and args.framework_version == "1.13.1"): str_dlami = '\n# Activate Python venv for Pytorch 1.13 \n' str_dlami += "source /opt/aws_neuronx_venv_pytorch_1_13/bin/activate" elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == "pytorch" and args.framework_version == "2.1"): str_dlami = '\n# Activate Python venv for Pytorch 2.1 \n' str_dlami += "source /opt/aws_neuronx_venv_pytorch_2_1/bin/activate" elif ((args.instance == 'trn1' or args.instance == 'inf2') and args.framework == "tensorflow" and args.framework_version == "2.10.1"): str_dlami = '\n# Activate Python venv for Tensorflow 2.10 \n' str_dlami += "source /opt/aws_neuronx_venv_tensorflow_2_10/bin/activate" elif (args.instance == 'inf1' and args.framework == "tensorflow" and args.framework_version == "2.10.1"): str_dlami = '\n# Activate Python venv for Tensorflow 2.10 \n' str_dlami += "source /opt/aws_neuron_venv_tensorflow_2_10_inf1/bin/activate" elif (args.instance == 'inf1' and args.framework == "pytorch" and args.framework_version == "1.13.1"): str_dlami = '\n# Activate Python venv for Pytorch 1.13 \n' str_dlami += "source /opt/aws_neuron_venv_pytorch_1_13_inf1/bin/activate" return str_dlami def jupyter_notebook(self, args): os_default_python_version = \ self.df_os_properties.loc[self.df_os_properties['os'] == args.os]['default_python_version'].values[0] packages_supporting_python_versions = self.get_pip_packages_supporting_python_versions(args) if os_default_python_version in packages_supporting_python_versions: target_python_version = os_default_python_version else: target_python_version = max(packages_supporting_python_versions) framework_name = self.get_package_names(category=args.framework, instance=args.instance)[0] str_jupiter = '\n# Install Jupyter notebook kernel\n' str_jupiter += 'pip install ipykernel ' + '\n' str_jupiter += 'python' + target_python_version + ' -m ipykernel install --user --name ' str_jupiter += 'aws_neuron_venv_' + args.framework if args.instance == 'inf1': str_jupiter += '_inf1' str_jupiter += ' --display-name "Python (' + framework_name + ')"' + '\n' str_jupiter += 'pip install jupyter notebook' + '\n' str_jupiter += 'pip install environment_kernels' + '\n' return str_jupiter def install_and_enable_epel(self, args): str = '' if args.mode != 'compile': if args.install_type == 'install': if args.os == 'rockylinux9': str += '\n# Install and enable EPEL\n' str += 'sudo dnf config-manager --set-enabled crb\n' str += 'sudo dnf install epel-release -y\n' return str def config_neuron_repository(self, args): """ Reads OS type from the arguments and generates scripts for configuration of Neuron repository """ str = '' if args.mode != 'compile': # Neuron repository needs when mode is 'develop' or 'deploy' if args.install_type == 'install': str += '\n# Configure Linux for Neuron repository updates' + '\n' if args.os == 'ubuntu18' or args.os == 'ubuntu20' or args.os == 'ubuntu22' or args.os == 'ubuntu24': str += '. /etc/os-release' + '\n' str += 'sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <", "package_categories": ["driver","runtime","tools","compiler"]} ], "dlami_properties": [ {"framework":"pytorch", "dlami": "1.13", "neuron_released_version": "2.17.0", "supported_instances":["trn1","inf2","inf1"]}, {"framework":"tensorflow", "dlami": "2.10", "neuron_released_version": "2.17.0", "supported_instances":["trn1","inf2"]} ], "major_version_properties": [ {"name":"neuronx-cc","inf1":"","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-k8-plugin","inf1":"2","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-k8-scheduler","inf1":"2","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-oci-hooks","inf1":"2","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"tensorflow-neuronx","inf1":"","trn1":"1","inf2":"1"}, {"name":"torch-neuronx","inf1":"","trn1":"1","inf2":"1","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-dkms","inf1":"2.21","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-collectives","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-runtime-lib","inf1":"","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"aws-neuronx-tools","inf1":"2","trn1":"2","inf2":"2","trn2":"2","trn3":"2"}, {"name":"tensorflow-model-server-neuronx","inf1":"2","trn1":"2","inf2":"2"}, {"name":"neuronperf","inf1":"2","trn1":"2","inf2":"2"}, {"name":"tensorboard-plugin-neuronx","inf1":"2","trn1":"2","inf2":"2","trn2":"2"}, {"name":"nki","trn1":"2","inf2":"2","trn2":"2","trn3":"2"} ], "package_properties": [ {"name":"aws-neuronx-runtime-discovery", "component":"General","category":"general","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"aws_neuron_sdk_release_version", "component":"Github","category":"github","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"libneuronxla","component":"Framework","category":"general","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"neuron-cc","component":"Compiler","category":"compiler","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"neuronx-cc","component":"Compiler","category":"compiler","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"}, {"name":"neuronx-cc-stubs","component":"Compiler","category":"compiler","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"}, {"name":"aws-neuronx-k8-plugin","component":"Kubernetes Plugin","category":"container","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"aws-neuronx-k8-scheduler","component":"Kubernetes Scheduler","category":"container","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"aws-neuronx-oci-hooks","component":"OCI Hooks","category":"container","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"mxnet-neuron","component":"MXNet","category":"mxnet","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"tensorflow-neuron","component":"TensorFlow","category":"tensorflow","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"tensorflow","component":"TensorFlow","category":"tensorflow","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"tensorflow-neuronx","component":"TensorFlow","category":"tensorflow","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"}, {"name":"torch-neuron","component":"PyTorch","category":"pytorch","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"torch-neuronx","component":"PyTorch","category":"pytorch","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"}, {"name":"transformers-neuronx","component":"Transformers Neuron","category":"transformers-neuronx","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"}, {"name":"mxnet_neuron","component":"MXNet","category":"mxnet","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"mx_neuron","component":"MXNet","category":"mxnet","package_type":"pip","use_cases":["inference"],"pin_major":"false"}, {"name":"aws-neuronx-dkms","component":"Driver","category":"driver","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"aws-neuronx-collectives","component":"Collective Communication Library","category":"runtime","package_type":"os","use_cases":["training"],"pin_major":"true"}, {"name":"efa-installer","component":"EFA","category":"efa","package_type":"na","use_cases":["training"],"pin_major":"false"}, {"name":"aws-neuronx-runtime-lib","component":"Runtime Library","category":"runtime","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"aws-neuron-tools","component":"System Tools","category":"system-tools","package_type":"os","use_cases":["inference"],"pin_major":"true"}, {"name":"aws-neuronx-tools","component":"System Tools","category":"system-tools","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"tensorflow-model-server-neuron","component":"TensorFlow Model Server","category":"model-server","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"tensorflow-model-server-neuronx","component":"TensorFlow Model Server","category":"model-server","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"neuronperf","component":"Perf Tools","category":"helper-tools","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"tensorboard-plugin-neuron","component":"TensorBoard","category":"profiling-tools","package_type":"os","use_cases":["inference"],"pin_major":"true"}, {"name":"tensorboard-plugin-neuronx","component":"TensorBoard","category":"profiling-tools","package_type":"os","use_cases":["inference","training"],"pin_major":"true"}, {"name":"libnrt.so","component":"Runtime Library","category":"libnrt","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"torch_xla","component":"PyTorch","category":"helper-lib","package_type":"pip","use_cases":["inference","training"],"pin_major":"false"}, {"name":"aws-neuronx-gpsimd-tools","component":"CustomOps Tools","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"aws-neuronx-gpsimd-customop-lib","component":"CustomOps","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"aws-neuronx-oci-hook","component":"OCI","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"dmlc_nnvm","component":"Compiler","category":"na","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"neuronx_hwm","component":"Compiler","category":"na","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"dmlc_topi","component":"Compiler","category":"na","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"dmlc_tvm","component":"Compiler","category":"na","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"inferentia_hwm","component":"Compiler","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"neuronx_distributed","component":"Neuron Distributed","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"neuronx_distributed_training","component":"Neuron Distributed Training","category":"na","package_type":"os","use_cases":["inference","training"],"pin_major":"false"}, {"name":"neuronx_distributed_inference","component":"Neuron Distributed Inference","category":"na","package_type":"os","use_cases":["inference"],"pin_major":"false"}, {"name":"jax_neuronx","component":"Jax","category":"jax","package_type":"pip","use_cases":["inference"],"pin_major":"true"}, {"name":"nki","component":"NKI","category":"nki","package_type":"pip","use_cases":["inference","training"],"pin_major":"true"} ], "neuron_releases": [ {"neuron_version":"2.29.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.31.24.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.27.4.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.21.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"14.09.x","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.29.147.0","supported_instances":["inf1","trn1","inf2","trn2","trn3"] ,"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.29.147.0","supported_instances":["inf1","trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.15.13.0","supported_instances":["inf1","trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.31.24.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.29.18.0","supported_instances":["inf1","trn1","inf2","trn2","trn3"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.7.0.1.0.8181","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"libneuronxla","version":"2.2.16408.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.24.5133.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx-cc-stubs","version":"2.24.5133.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed","version":"0.18.27753","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_inference","version":"0.9.17334","supported_instances":["inf2","trn2","trn1","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"nki","version":"0.3.0","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.918.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.9.0.2.13.24727","supported_instances":["trn1","inf2","trn2","trn3"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"efa-installer","version":"1.47","supported_instances":["trn1","trn2","trn3"],"supported_python_versions":[]} ]}, {"neuron_version":"2.28.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.30.59.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.26.10.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.20.7.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.20.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.29.71.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.29.71.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.14.102.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.30.51.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.28.23.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.7.0.1.0.7584","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"libneuronxla","version":"2.2.15515.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.23.6484.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx-cc-stubs","version":"2.23.6484.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed","version":"0.17.26814","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_training","version":"1.7.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_inference","version":"0.8.16251","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"nki","version":"0.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.918.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.9.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuronx","version":"2.7.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"efa-installer","version":"1.47","supported_instances":["trn1","trn2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.28.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.30.59.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.26.5.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.20.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.20.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.29.71.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.29.71.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.14.102.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.30.51.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.28.23.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.7.0.1.0.7584","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"libneuronxla","version":"2.2.15515.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.23.6484.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx-cc-stubs","version":"2.23.6484.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed","version":"0.17.26814","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_training","version":"1.7.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_inference","version":"0.8.16251","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"nki","version":"0.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.918.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.9.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuronx","version":"2.7.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.12.22436","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"efa-installer","version":"1.47","supported_instances":["trn1","trn2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.27.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.29.41.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.25.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.19.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.19.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.29.16.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.29.16.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.13.52.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.29.40.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.27.33.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.7.0.1.0.7377","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"libneuronxla","version":"2.2.14584.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.22.12471.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx-cc-stubs","version":"2.22.12471.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed","version":"0.16.25997","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_training","version":"1.7.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_inference","version":"0.7.15063","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"nki","version":"0.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.918.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.9.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuronx","version":"2.7.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.27.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.29.41.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.25.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.19.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.19.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.29.16.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.29.16.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.13.52.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.29.40.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.27.33.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.7.0.1.0.7377","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"libneuronxla","version":"2.2.14584.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.22.12471.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx-cc-stubs","version":"2.22.12471.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed","version":"0.16.25997","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_training","version":"1.7.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"neuronx_distributed_inference","version":"0.7.14366","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"nki","version":"0.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.918.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.9.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"torch-neuronx","version":"2.7.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.11.19912","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11","3.12"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.26.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.28.27.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.24.7.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.18.0.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.18.0.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.28.4.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.28.4.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.12.36.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.28.23.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.26.14.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.6.2.1.0.6446","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.12677.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.21.33363.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.21.33363.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.15.22404","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.6.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.6.10598","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.837.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.6.0.2.10.16998","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.7.0.2.10.16998","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.10.16998","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.1315","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.26.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.28.27.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.24.7.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.18.0.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.18.0.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.28.4.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.28.4.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.12.36.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.28.23.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.26.14.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.6.2.1.0.6446","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.12677.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.21.18209.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.21.18209.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.15.22404","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.6.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.6.10598","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.837.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.6.0.2.10.13553","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.7.0.2.10.13553","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.8.0.2.10.13553","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.1315","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.25.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.27.34.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.27.34.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.23.9.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.17.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.17.0.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.27.7.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.27.7.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.11.42.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.27.23.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.25.145.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.6.1.1.0.3499","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.8201.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.20.9961.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.20.9961.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.14.18461","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.5.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.5.9230","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.813.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.6.0.2.9.9357","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.7.0.2.9.9357","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.1216","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.24.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.26.43.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.26.43.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.22.2.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.16.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.16.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.26.7.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.26.7.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.10.56.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.26.42.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.24.54.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.6.0.1.0.1296","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.4410.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.19.8089.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.19.8089.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.13.14393","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.4.1","supported_instances":["trn1","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.4.7422","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.760.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.5.1.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.6.0.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.7.0.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.985","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.24.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.26.43.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.26.43.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.22.2.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.16.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.16.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.26.7.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.26.7.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.10.56.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.26.42.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.24.54.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"jax_neuronx","version":"0.6.0.1.0.1296","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.4410.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.19.8089.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.19.8089.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.13.14393","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.4.0","supported_instances":["trn1","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.4.7422","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.760.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.5.1.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.6.0.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.7.0.2.8.6734","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.985","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.23.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.25.65.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.21.37.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.15.12.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.15.1.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.25.24.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.25.24.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.9.88.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.25.57.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.23.9.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"jax_neuronx","version":"0.5.3.1.0.719","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.3493.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.18.121.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.18.121.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.12.12111","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.3.0","supported_instances":["trn1"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.3.5591","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.0.670.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.5.1.2.7.5413","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.6.0.2.7.5413","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.798","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.22.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.24.59.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.24.59.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.20.28.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.14.12.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.14.6.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.24.23.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.24.23.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.7.5.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.24.53.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.24.53.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.22.61.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.2.1630.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"libneuronxla","version":"0.5.3396","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx-cc","version":"2.17.194.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.17.194.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.11.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.2.0","supported_instances":["inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.117.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.9","3.10"]}, {"name":"torch-neuronx","version":"2.5.1.2.6.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.470","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.21.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.23.135.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.19.64.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.13.16.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.13.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.23.45.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.23.45.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.6.36.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.23.112.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.20.204.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.1.714.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"libneuronxla","version":"0.5.3396","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.16.372.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.16.372.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.10.1","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.1.1","supported_instances":["trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.1.1","supported_instances":["inf2","trn2","trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.52.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.17.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.1.2.2.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.5.1.2.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"1.13.1+torchneurong","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.380","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.21.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.23.133.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.19.64.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.13.16.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.13.2.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.23.30.0","supported_instances":["inf1","trn1","inf2","trn2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.23.30.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.6.36.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.23.110.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.20.204.0","supported_instances":["inf1","trn1","inf2","trn2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.1.681.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"libneuronxla","version":"0.5.3388","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.16.345.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.16.345.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.10.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.1.0","supported_instances":["trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_inference","version":"0.1.0","supported_instances":["inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.52.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.17.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.1.2.2.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.5.1.2.4.0","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"1.13.1+torchneurong","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.6","supported_instances":["trn1","inf2","trn2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.13.322","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.20.2", "packages": [ {"name":"aws-neuronx-collectives","version":"2.22.33.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.18.20.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.12.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.12.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.22.20.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.22.20.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.5.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.22.19.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.19.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.0.5347.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"libneuronxla","version":"0.5.3278","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.15.143.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.15.143.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.0.1","supported_instances":["trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.63.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuron","version":"1.9.1.2.11.13.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.3.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"1.13.1+torchneurong","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.5","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.12.313","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.20.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.22.26.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.18.12.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.12.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.12.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.22.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.22.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.5.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.22.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.19.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.0.4986.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"libneuronxla","version":"0.5.2978","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.15.141.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.15.141.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.0.0","supported_instances":["trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.63.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.1.2.2.3.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"1.13.1+torchneurong","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.4","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.12.313","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.20.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.22.26.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.18.12.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.12.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.12.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.22.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.22.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.5.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.22.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.19.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"jax_neuronx","version":"0.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.9"]}, {"name":"libneuronxla","version":"2.0.4115.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.2978","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.24.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.15.128.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx-cc-stubs","version":"2.15.128.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"neuronx_distributed_training","version":"1.0.0","supported_instances":["trn1"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.63.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.12.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.12.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.12.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.12.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.12.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.12.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch-neuronx","version":"2.1.2.2.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"1.13.1+torchneurong","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"torch_xla","version":"2.1.4","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"transformers-neuronx","version":"0.12.313","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10","3.11"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.19.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.21.46.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.17.17.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.11.4.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.11.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.21.14.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.21.14.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.4.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.21.41.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.2335","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.1795","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.23.5.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.14.227.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.63.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.15.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronf","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.11.351","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.19.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.21.46.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.17.17.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.11.4.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.11.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.21.14.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.21.14.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.4.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.21.41.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.2335","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.1795","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.147.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.23.5.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.93.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.14.213.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.63.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.11.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.11.4.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.10.12.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.15.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronf","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.11.351","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.18.2", "packages": [ {"name":"aws-neuronx-collectives","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.16.7.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.3.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.17.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.965","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.971","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.50.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.22.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.55.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.13.72.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurone","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.10.0.360","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.18.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.16.7.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.3.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.17.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.965","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.971","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.50.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.22.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.55.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.13.68.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurone","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.10.0.360","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.18.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.16.7.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.20.13.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.3.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.20.22.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.17.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.965","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.971","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.50.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.22.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.55.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.13.66.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.19.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.19.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.74.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.2.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurone","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.10.0.21","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.17.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.20.11.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.15.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.45.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.20.11.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.17.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.755","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.809","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.40.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.21.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.15.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.12.68.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.6.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.12.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.13.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.1.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurond","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.9.474","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.16.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.19.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.15.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.45.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.19.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.16.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.498","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.669","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.40.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.21.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.15.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.12.68.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.6.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.12.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.1.2.0.0b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurond","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.9.474","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.16.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.19.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.15.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.9.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.9.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.19.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.45.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.19.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.16.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"2.0.498","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.669","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.40.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.21.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.15.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.12.54.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.6.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.12.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.8.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.8.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.17.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.1.1.2.0.0b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurond","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.1.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.9.474","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.15.2", "packages": [ {"name":"aws-neuronx-collectives","version":"2.18.19.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.14.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.8.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.8.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.27.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.18.15.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.15.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"1.0.680","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.570","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.25.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.20.3.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.11.0.35","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.11.0.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.43.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.12.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronc","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.0.0+torchneuron0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.8.268","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.18.15","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.15.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.18.19.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.14.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.8.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.8.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.27.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.18.15.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.15.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"1.0.680","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.570","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.25.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.20.3.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.11.0.34","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.11.0.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.43.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.12.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.1b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronc","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.0.0+torchneuron0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.8.268","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.18.15","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.15.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.18.18.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.14.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.8.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.8.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-plugin","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.18.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.27.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"aws-neuronx-runtime-lib","version":"2.18.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.15.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_topi","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"dmlc_tvm","version":"1.18.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"inferentia_hwm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"1.0.663","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"libneuronxla","version":"0.5.538","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mx_neuron","version":"1.8.0.2.4.25.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.20.3.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronperf","version":"1.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx-cc","version":"2.11.0.34","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_distributed","version":"0.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"neuronx_hwm","version":"2.11.0.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.43.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.2.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.10.2.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.6.0","supported_instances":["inf1"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"1.13.1.1.12.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch-neuronx","version":"2.0.0.2.0.0b0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronc","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"torch_xla","version":"2.0.0+torchneuron0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.8.268","supported_instances":["trn1","inf2"],"supported_python_versions":["3.8","3.9","3.10"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.18.14","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.14.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.17.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.7.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.7.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.17.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.14.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.476","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.10.0.5","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronb","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.10.0.35","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.17.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.17.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.22.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.11.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.7.84","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.15.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.4.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.14.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.17.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.7.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.7.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.17.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.14.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.476","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.10.0.5","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuronb","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.10.0.34","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.19.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.17.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.17.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.22.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.11.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.7.84","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.17.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.15.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.4.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.13.2", "packages": [ {"name":"aws-neuronx-collectives","version":"2.16.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.12.18.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.6.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.16.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.440","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.9.0.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurona","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.9.0.40","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.18.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.16.18.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.16.18.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.25.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.10.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.6.106","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.13.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.16.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.12.11.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.6.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.16.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.425","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.9.0.2","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurona","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.9.0.40","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.18.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.16.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.16.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.21.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.10.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.6.106","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.13.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.16.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.12.11.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.6.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.6.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.16.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.425","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.9.0.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneurona","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.9.0.16","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.18.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.16.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.16.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.21.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"2.10.1.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.10.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf."],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.9.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.10.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.6.106","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.12.2", "packages": [ {"name":"aws-neuronx-collectives","version":"2.15.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.11.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.5.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.5.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.15.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.413","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.8.0.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuron8","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.8.0.25","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.9.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.5.58","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.14.4.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.12.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.15.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.11.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.5.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.5.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.15.14.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.413","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.8.0.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuron8","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.8.0.25","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.5.58","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.14.4.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.12.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.15.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.11.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.5.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.5.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.15.11.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.12.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.391","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.8.0.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7,","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuron8","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.8.0.25","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.17.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.15.6.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.16.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.9.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.9.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.39.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.5.58","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.7.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.14.4.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.2.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.11.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.14.9.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.10.11.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.4.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.4.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.14.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.11.10.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.326","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.7.0.3","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1+torchneuron7","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.7.0.40","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.16.2.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.14.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.14.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.8.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.8.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.8.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.8.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.8.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.8.9.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.37.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.7.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.7.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.7.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.7.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.7.10.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.9.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.8.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.4.60","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.6.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.16.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.16.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.16.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.14.2.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"islpy","version":"2021.1+aws2021.x.169.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_distributed","version":"0.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.10.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.13.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.9.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop-lib","version":"0.3.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.3.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.13.6.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.10.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.207","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.6.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch_xla","version":"1.13.1","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.6.0.19","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"neuron-cc","version":"1.15.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"aws-neuronx-k8-plugin","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.13.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hook","version":"2.2.0.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.8.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.8.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.8.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.8.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.8.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.7.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.8.4.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-neuronx","version":"2.9.3.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.8.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.8.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.8.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.8.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.8.1.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.26.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.7.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.11.0.2.7.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.12.1.2.7.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.13.1.2.7.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"torch-neuron","version":"1.9.1.2.7.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.39.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.4.1.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.1.1.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9","3.10"]}, {"name":"transformers-neuronx","version":"0.3.32","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.8.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.23.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_nnvm","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_topi","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"dmlc_tvm","version":"1.15.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"inferentia_hwm","version":"1.14.1","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"islpy","version":"2021.1","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]} ]}, {"neuron_version":"2.9.1", "packages": [ {"name":"aws-neuronx-collectives","version":"2.12.35.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.8.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop","version":"0.2.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.23.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.9.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.205","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.5.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch_xla","version":"1.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.5.0.28","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8"]}, {"name":"neuron-cc","version":"1.14.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.12.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.12.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.97.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.7.4.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.7.4.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.7.4.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.7.4.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.7.4.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.7.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.7.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.7.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.7.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.7.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.25.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.6.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.11.0.2.6.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.12.1.2.6.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.13.1.2.6.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.9.1.2.6.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.37.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.2.127.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.0.1.6.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.7.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.16.0","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.9.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.12.27.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.8.4.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop","version":"0.2.3.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.2.1.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.12.16.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.9.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.173","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.5.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch_xla","version":"1.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.5.0.28","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"neuron-cc","version":"1.14.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.12.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.12.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.97.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.7.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.7.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.7.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.7.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.7.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.2.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.7.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.7.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.7.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.7.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.7.3.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.25.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.10.2.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.11.0.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.12.1.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.13.1.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.9.1.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.37.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.2.127.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.0.1.6.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.7.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.12.16.0","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.8.0", "packages": [ {"name":"aws-neuronx-collectives","version":"2.11.47.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.7.33.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-customop","version":"0.1.23.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-gpsimd-tools","version":"0.1.7.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-discovery","version":"2.9","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.11.43.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.8.2.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"libneuronxla","version":"0.5.144","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx_hwm","version":"2.4.0.1","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch_xla","version":"1.13.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"neuronx-cc","version":"2.4.0.21","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8"]}, {"name":"neuron-cc","version":"1.13.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.1.12.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.1.12.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.81.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.7.4.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.4.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.9.3.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.10.1.2.6.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.10.1.1.0.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.6.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.4.2.6.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.4.2.6.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.9.3.2.6.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.10.1.2.6.5.0","supported_instances":["inf1","trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.19.0","supported_instances":["trn1","inf2"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuron","version":"2.4.6.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.11.0.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.12.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.10.2.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuron","version":"1.9.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.11.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mx_neuron","version":"1.8.0.2.2.43.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-neuronx","version":"1.13.0.1.5.0","supported_instances":["trn1","inf2"],"supported_python_versions":["3.7","3.8","3.9"]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.6.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.10.30.0","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.7.0", "packages": [ {"name":"neuronx-cc","version":"2.4.0.21","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.60.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-neuronx","version":"2.8.2.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.5.4.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.6.3.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.15.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronx-gpsimd-customop","version":"0.1.23.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronx-gpsimd-tools","version":"0.1.7.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"torch-neuronx","version":"1.13.0.1.4.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-xla","version":"1.13.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.7.15.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.11.47.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.11.43.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.7.2.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.6.0", "packages": [ {"name":"neuronx-cc","version":"2.3.0.4","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.14.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-neuronx","version":"2.8.2.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.5.4.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.6.3.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuronx","version":"2.5.3.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"torch-neuronx","version":"1.12.0.1.4.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"torch-xla","version":"1.12.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-dkms","version":"2.6.33.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.10.37.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.10.30.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.6.1.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.5.0", "packages": [ {"name":"neuron-cc","version":"1.13.5.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"neuronx-cc","version":"2.2.0.73","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.1.12.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.14.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.5.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.5.3.2.5.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.6.5.2.5.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.7.3.2.5.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.2.2.5.6.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.8.2.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-model-server-neuronx","version":"1.15.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.5.4.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.6.3.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.7.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuronx","version":"2.8.0.2.5.6.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuron","version":"2.4.6.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.11.0.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.12.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.10.2.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.7.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.8.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.9.1.2.5.8.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuronx","version":"1.11.0.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.11.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"mx_neuron","version":"1.8.0.2.2.43.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"aws-neuronx-dkms","version":"2.6.33.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.10.34.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.10.27.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.5.19.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.6.1.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.10.27.0","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.4.0", "packages": [ {"name":"neuron-cc","version":"1.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"neuronx-cc","version":"2.2.0.73","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.1.2.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.1.2.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.1.2.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.5.3.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.6.3.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.7.1.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.0.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.8.2.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-model-server-neuron","version":"1.15.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.5.4.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.6.3.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.7.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.8.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuron","version":"2.4.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.7.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.8.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.9.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.10.2.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.11.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuronx","version":"1.11.0.1.2.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"mx_neuron","version":"1.8.0.2.2.2.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"aws-neuronx-dkms","version":"2.6.5.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.10.17.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.10.15.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.5.16.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.2.51.0","supported_instances":["inf1"],"supported_python_versions":[]} ]}, {"neuron_version":"2.3.0", "packages": [ {"name":"neuron-cc","version":"1.11.7.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"neuronx-cc","version":"2.1.0.76","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"aws-neuronx-k8-plugin","version":"2.0.1.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-k8-scheduler","version":"2.0.1.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-oci-hooks","version":"2.0.1.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"tensorflow-neuron","version":"1.15.5.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"tensorflow-neuron","version":"2.5.3.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.6.3.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.7.1.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuron","version":"2.8.0.2.3.0","supported_instances":["inf1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-neuronx","version":"2.8.2.1.1.0","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"tensorflow-model-server-neuron","version":"1.15.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.5.4.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.6.3.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.7.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorflow-model-server-neuron","version":"2.8.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"tensorboard-plugin-neuron","version":"2.4.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"torch-neuron","version":"1.7.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.8.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.9.1.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.10.2.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuron","version":"1.11.0.2.3.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"torch-neuronx","version":"1.11.0.1.1.1","supported_instances":["trn1"],"supported_python_versions":["3.7","3.8"]}, {"name":"mxnet_neuron","version":"1.5.1.1.10.0.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"mx_neuron","version":"1.8.0.2.2.2.0","supported_instances":["inf1"],"supported_python_versions":["3.7"]}, {"name":"aws-neuronx-dkms","version":"2.5.41.0","supported_instances":["inf1","trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-collectives","version":"2.9.86.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"efa-installer","version":"na","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuronx-runtime-lib","version":"2.9.64.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"aws-neuron-tools","version":"2.1.4.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"aws-neuronx-tools","version":"2.4.14.0","supported_instances":["trn1"],"supported_python_versions":[]}, {"name":"neuronperf","version":"1.3.0.0","supported_instances":["inf1"],"supported_python_versions":[]}, {"name":"libnrt.so","version":"2.2.51.0","supported_instances":["inf1"],"supported_python_versions":[]} ]} ] } ================================================ FILE: src/helperscripts/neuron-releases-manifest.json ================================================ { "repos": { "whl": "https://pip.repos.neuron.amazonaws.com/", "rpm": "https://yum.repos.neuron.amazonaws.com/", "deb": "https://apt.repos.neuron.amazonaws.com/" }, "manifest_date": "2022-12-12", "manifest_version": "1.0.1", "dlami_conda_env": { "tensorflow": { "1.15.5": [ "aws_neuron_tensorflow_p36", "aws_neuron_tensorflow_p36" ], "2.1.4": [ "None", "None" ], "2.2.3": [ "None", "None" ], "2.3.4": [ "None", "None" ], "2.4.3": [ "None", "None" ], "2.5.1": [ "None", "None" ], "2.5.2": [ "None", "None" ], "2.5.3": [ "None", "None" ], "2.6.3": [ "None", "None" ], "2.6.5": [ "None", "None" ], "2.7.1": [ "None", "None" ], "2.7.3": [ "None", "None" ], "2.8.0": [ "None", "None" ], "2.8.2": [ "None", "None" ] }, "pytorch": { "1.5.1": [ "None", "aws_neuron_pytorch_p36" ], "1.6.0": [ "None", "aws_neuron_pytorch_p36" ], "1.7.1": [ "None", "aws_neuron_pytorch_p36" ], "1.8.1": [ "aws_neuron_pytorch_p36", "aws_neuron_pytorch_p36" ], "1.9.1": [ "None", "aws_neuron_pytorch_p36" ], "1.10.1": [ "None", "aws_neuron_pytorch_p36" ], "1.10.2": [ "None", "None" ], "1.11.0": [ "None", "None" ] }, "mxnet": { "1.5.1": [ "aws_neuron_mxnet_p36", "aws_neuron_mxnet_p36" ], "1.8.0": [ "None", "aws_neuron_mxnet_p36" ] } }, "latest_version_of_maintained_packages": { "runtime-server": { "framework": false, "package-name": "aws-neuron-runtime", "package-version": "1.6.24.0", "neuron-version": "1.15.2" }, "mxnet-1.5.1": { "framework": true, "package-name": "mxnet_neuron", "package-version": "1.5.1.1.6.5.1", "neuron-version": "1.16.0" } }, "fal_supported_runtime": { "tensorflow": { "1.15.5": { "neuron-rtd": [ "0.0.0.0", "1.15.5.1.6.10.0" ], "libnrt": [ "1.15.5.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.1.4": { "neuron-rtd": [ "0.0.0.0", "2.1.4.1.6.10.0" ], "libnrt": [ "2.1.4.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.2.3": { "neuron-rtd": [ "0.0.0.0", "2.2.3.1.6.10.0" ], "libnrt": [ "2.2.3.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.3.3": { "neuron-rtd": [ "0.0.0.0", "99.99.99.99.99.99.99" ], "libnrt": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ] }, "2.3.4": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.3.4.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.4.2": { "neuron-rtd": [ "0.0.0.0", "99.99.99.99.99.99.99" ], "libnrt": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ] }, "2.4.3": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.4.3.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.5.0": { "neuron-rtd": [ "0.0.0.0", "99.99.99.99.99.99.99" ], "libnrt": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ] }, "2.5.1": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.5.2": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.5.3": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.6.3": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.6.5": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.7.1": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.7.3": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.8.0": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] }, "2.8.2": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "2.5.1.2.0.0.0", "99.99.99.99.99.99.99" ] } }, "pytorch": { "1.5.1": { "neuron-rtd": [ "0.0.0.0", "1.5.1.1.5.21.0" ], "libnrt": [ "1.5.1.1.5.21.1", "99.99.99.99.99.99.99" ] }, "1.7.1": { "neuron-rtd": [ "0.0.0.0", "1.7.1.1.5.21.0" ], "libnrt": [ "1.7.1.1.5.21.1", "99.99.99.99.99.99.99" ] }, "1.8.1": { "neuron-rtd": [ "0.0.0.0", "1.8.1.1.5.21.0" ], "libnrt": [ "1.8.1.1.5.21.1", "99.99.99.99.99.99.99" ] }, "1.9.1": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "1.9.1.0.0.0.0", "99.99.99.99.99.99.99" ] }, "1.10.1": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "1.9.1.0.0.0.0", "99.99.99.99.99.99.99" ] }, "1.10.2": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "1.9.1.0.0.0.0", "99.99.99.99.99.99.99" ] }, "1.11.0": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "1.9.1.0.0.0.0", "99.99.99.99.99.99.99" ] }, "1.12.1": { "neuron-rtd": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ], "libnrt": [ "1.9.1.0.0.0.0", "99.99.99.99.99.99.99" ] } }, "mxnet": { "1.5.1": { "neuron-rtd": [ "0.0.0.0", "99.99.99.99.99.99.99" ], "libnrt": [ "99.99.99.99.99.99.99", "99.99.99.99.99.99.99" ] }, "1.8.0": { "neuron-rtd": [ "0.0.0.0", "1.8.0.1.3.4.0" ], "libnrt": [ "1.8.0.1.3.4.1", "99.99.99.99.99.99.99" ] } } }, "latest_release": { "inf1": { "version": "2.8.0" } }, "neuron_versions": { "2.6.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.6.33.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.10.27.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuronx-tools": { "install_on_compute_instance": false, "versions": { "2.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.13.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.12.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.3.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.2.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuronx": { "install_on_compute_instance": false, "versions": { "1.15.0.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.11.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.43.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "2.7.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.7.15.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.10.27.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuronx-tools": { "install_on_compute_instance": false, "versions": { "2.7.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.13.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.12.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.3.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.2.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuronx": { "install_on_compute_instance": false, "versions": { "1.15.0.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.11.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.43.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "2.8.0": { "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.7.33.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.10.30.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuronx-tools": { "install_on_compute_instance": false, "versions": { "2.8.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.13.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.12.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.6.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.4.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.4.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.9.3.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.10.1.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuronx": { "install_on_compute_instance": false, "versions": { "1.15.0.2.6.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.4.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.4.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.9.3.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.10.1.2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.11.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.43.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } }, "instance_support": [ "inf1" ], "python_ver": [ "3.7" ] }, "2.5.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.6.33.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.10.27.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.1.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuronx-tools": { "install_on_compute_instance": false, "versions": { "2.5.19.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.13.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.5.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.12.1.2.5.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.5.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.3.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.2.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuronx": { "install_on_compute_instance": false, "versions": { "1.15.0.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.5.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.11.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.43.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "2.4.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.1.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.1.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.5.16.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.11.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "2.3.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuronx-dkms": { "install_on_compute_instance": false, "versions": { "2.5.41.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuronx-k8-plugin": { "install_on_compute_instance": false, "versions": { "2.0.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuronx-k8-scheduler": { "install_on_compute_instance": false, "versions": { "2.0.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.1.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.11.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.19.2": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.3.26.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.9.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.9.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.1.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.11.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.19.1": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.3.11.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.9.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.9.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.1.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.11.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.19.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.3.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.9.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.9.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.1.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.11.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.7.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.2.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.11.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.3.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.8.0.2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.10.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.18.0": { "python_ver": [ "3.7" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.14.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.51.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.8.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.8.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.790.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.10.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.1.2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.3.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.6.3.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.7.1.2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.4.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.6.3.2.2.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.7.0.2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.3.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.9.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.2.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3.us-west-2.amazonaws.com/1.8.0/aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx-1.8.0.2-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.17.2": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.31.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.623.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.9.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.2.2.1.14.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.1.14.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.3.2.1.14.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.8.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.1.5.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.17.1": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.31.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.623.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.9.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.1.13.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.2.2.1.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.1.13.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.3.2.1.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.8.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.1.5.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.17.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.31.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.623.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.9.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.10.1.2.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.1.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.2.2.1.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.1.6.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.3.2.1.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.8.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.1.5.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.16.3": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.18.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.494.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.8.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.0.85.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.0.536.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.0.536.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.0.536.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.0.536.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.1.2.0.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.2.2.0.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.7.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.0.290.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.16.2": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.18.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.327.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.8.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.0.85.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.0.468.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.0.468.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.0.468.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.0.468.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.1.2.0.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.2.2.0.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.7.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.0.276.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.16.1": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.18.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.327.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.0.85.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.0.392.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.0.392.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.0.392.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.0.392.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.1.2.0.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.2.2.0.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.7.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.0.276.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.16.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "libnrt": { "framework": false, "packages": { "libnrt": { "install_on_compute_instance": false, "versions": { "2.2.15.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "lib" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "2.0.277.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.7.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "neuronperf": { "framework": false, "packages": { "neuronperf": { "install_on_compute_instance": false, "versions": { "1.0.85.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.2.0.318.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.2.0.318.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.2.0.318.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.9.1.2.0.318.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.4.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.3.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.1.2.0.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.3.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.4.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.3.2.0.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.2.2.0.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.7.0.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.2.0.271.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.15.2": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.1.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.6.24.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.22.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.22.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.6.21.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.7.25.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.6.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.5.21.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.3.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.2.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.0.1.6.10.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.2.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.0.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.1.1.6.10.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.1.1.6.10.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.6.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.3.4.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.15.1": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.1.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.6.24.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.22.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.22.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.6.21.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.7.25.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.6.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.5.21.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.3.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.2.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.0.1.6.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.2.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.0.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.1.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.1.1.6.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.6.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.3.4.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.15.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.0.450.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.6.19.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.17.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.17.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.6.16.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.7.20.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.6.13.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.5.21.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.5.21.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.1.4.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.2.3.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.3.3.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.4.2.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] }, "2.5.0.1.6.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.1.4.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.2.2.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.3.0.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.4.1.1.6.8.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] }, "2.5.1.1.6.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.6.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.3.4.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.14.2": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "2.0.386.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.6.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.6.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.7.10.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.5.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.12.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.5.12.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.5.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.5.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.5.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.6.1.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.3.0.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.14.1": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.5.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.6.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.7.4.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.5.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.12.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.5.12.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.5.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.5.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.5.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.6.1.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.3.0.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.14.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.5.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.5.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.6.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.6.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.5.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.6.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.4.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.4.1.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.4.1.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.8.1.1.4.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.1.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.4.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.5.1.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.2.1.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.13.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.4.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.4.17.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.5.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.5.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.4.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.5.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.3.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.3.5.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.3.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.3.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-plugin-neuron": { "install_on_compute_instance": false, "versions": { "2.0.29.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.3.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet_neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.4.4.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } }, "mx_neuron": { "install_on_compute_instance": true, "versions": { "1.8.0.1.1.2.0": { "main_version": true, "pre_install_cmds": [ "wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl", "pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl" ], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.12.3": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.4.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.4.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.4.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.2.11.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.2.24.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.2.24.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.2.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.3.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.12.2": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.4.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.4.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.4.12.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.2.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.2.16.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.2.16.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.2.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.3.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.12.1": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.4.9.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.4.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.4.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.4.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.2.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.2.15.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.2.15.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.2.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.6.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.8.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.3.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.12.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.4.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.4.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.4.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.4.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.4.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.4.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.2.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.2.3.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.2.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.5.1.2.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.2.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.3.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.11.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.3.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.3.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.3.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.3.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.3.2.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.3.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.1.7.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] }, "1.7.1.1.1.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.4.1.1.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.1.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.1.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.2.1.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } }, "1.10.0": { "python_ver": [ "3.6" ], "instance_support": [ "inf1" ], "arch": [ "x86_64" ], "components": { "driver": { "framework": false, "packages": { "aws-neuron-dkms": { "install_on_compute_instance": false, "versions": { "1.2.3.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin", "src" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-server": { "framework": false, "packages": { "aws-neuron-runtime": { "install_on_compute_instance": false, "versions": { "1.2.5.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-plugin": { "framework": false, "packages": { "aws-neuron-k8-plugin": { "install_on_compute_instance": false, "versions": { "1.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "k8-scheduler": { "framework": false, "packages": { "aws-neuron-k8-scheduler": { "install_on_compute_instance": false, "versions": { "1.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "runtime-base": { "framework": false, "packages": { "aws-neuron-runtime-base": { "install_on_compute_instance": false, "versions": { "1.2.0.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-rtd" ], "package_type": [ "deb", "rpm" ] } } } } }, "tools": { "framework": false, "packages": { "aws-neuron-tools": { "install_on_compute_instance": false, "versions": { "1.2.7.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "neuron-monitor", "neuron-cli", "neuron-top", "neuron-htop" ], "package_type": [ "deb", "rpm" ] } } } } }, "compiler": { "framework": false, "packages": { "neuron-cc": { "install_on_compute_instance": true, "versions": { "1.0.24045.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": "whl" } } } } }, "pytorch": { "framework": true, "packages": { "torch-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.0.1978.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "torch-neuron" ], "package_type": [ "whl" ] } } } } }, "tensorflow": { "framework": true, "packages": { "tensorflow-neuron": { "install_on_compute_instance": true, "versions": { "1.15.4.1.0.2168.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorboard": { "framework": false, "packages": { "tensorboard-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.0.615.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } }, "tensorflow-model-server": { "framework": false, "packages": { "tensorflow-model-server-neuron": { "install_on_compute_instance": false, "versions": { "1.15.0.1.0.2168.0": { "main_version": true, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "deb", "rpm" ] } } } } }, "mxnet": { "framework": true, "packages": { "mxnet-neuron": { "install_on_compute_instance": true, "versions": { "1.5.1.1.1.88.0": { "main_version": false, "pre_install_cmds": [], "post_install_cmds": [], "format": [ "bin" ], "content": [ "tbd" ], "package_type": [ "whl" ] } } } } } } } } } ================================================ FILE: src/helperscripts/neuron-setup-example.py ================================================ from neuronsetuphelper import neuron_setup_helper nr_setup=neuron_setup_helper(manifest_file='default',neuron_version='latest') setup_cmd = nr_setup.instructions(framework='tensorflow',action='Install',os='ubuntu',ami='non-dlami',mode='develop',framework_version='latest') print (setup_cmd) ================================================ FILE: src/helperscripts/neuronsetuphelper.py ================================================ import json import argparse from packaging.version import Version, parse ######################################## # neuron_setup_helper ######################################## class neuron_release_info: def __init__(self): self.release_frameworks_all = {} self.release_frameworks_main = {} self.release_packages_all ={} self.release_package_main={} self.release_frameworks_list=[] self.release_components_list = [] self.release_tf_package_to_model_server_package={} self.release_os_install_list =[] self.python_ver="" # release_frameworks_all # Desc: Dictionary - all framewors included in the release # example: 'pytorch-1.5.1': {'framework': 'pytorch', 'package': 'torch-neuron', 'version': '1.5.1.1.5.3.0', 'main': False, 'framework_version': '1.5.1', 'package_name': 'torch-neuron-1.5.1.1.5.3.0', 'pre_install_cmds': [], 'post_install_cmds': []} # release_frameworks_all = {} # release_frameworks_main # Desc: Dictionary - the main frameworks in each rlease (single version of the same framework) # example: 'mxnet': {'framework': 'mxnet-1.8.0', 'package': 'mx_neuron', 'version': '1.8.0.1.3.0.0', 'framework_version': '1.5.1', 'full_package_name': 'mx_neuron-1.8.0.1.3.0.0', 'pre_install_cmds': ['wget https://aws-mx-pypi.s3-us-west-2.amazonaws.com/1.8.0/aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl', 'pip install aws_mx_cu110-1.8.0-py2.py3-none-manylinux2014_x86_64.whl'], 'post_install_cmds': []} # release_frameworks_main = {} # release_packages_all # Desc: Dictionary - all packages included in the release # example: 'aws-neuron-dkms-1.5.0.0': {'component': 'driver', 'package': 'aws-neuron-dkms', 'version': '1.5.0.0', 'main': True, 'pre_install_cmds': [], 'post_install_cmds': []} # release_packages_all ={} # release_package_main # Desc: Dictionary - only single package from each component # example: 'driver': {'package': 'aws-neuron-dkms', 'version': '1.5.0.0', 'full_package_name': 'aws-neuron-dkms-1.5.0.0', 'pre_install_cmds': [], 'post_install_cmds': []} # release_package_main={} # list of all framewoks included in the specific neuron release # release_frameworks_list=[] # list of all neuron components included in the specific neuron release # release_components_list = [] # dictionary to correlate tf version with model server version # release_tf_package_to_model_server_package = {} # list of all Neuron versions included in the manifest neuron_ver_list = [] # release_os_install_list =[] dlami_conda_env= {} package_formal_name= { "compiler":"Neuron Compiler", "tensorflow":"Neuron TensorFlow", "pytorch":"Neuron PyTorch", "mxnet":"Neuron MXNet", "runtime-server":"Neuron Runtime server", "libnrt":"Neuron Runtime library", "runtime-base":"Neuron Runtime base", "driver":"Neuron Driver", "tools":"Neuron Tools", "tensorboard":"Neuron TensorBoard", "tensorflow-model-server":"Neuron TensorFlow model server" } ######################################## # parse_arguments ######################################## def cli_parse_arguments(): __name__='neuron-install-helper.py' parser = argparse.ArgumentParser(prog=__name__ ,usage='\npython3 %(prog)s --list {neuron_versions,packages,components,frameworks} [--neuron-version=X.Y.Z] [--file FILE] \n' +'python3 %(prog)s --install {pytorch,tensorflow,mxnet} [--neuron-version=X.Y.Z] [--framework-version=FRAMEWORK-X.Y.Z] [options]\n' +'python3 %(prog)s --install {driver,runtime,tools} [--neuron-version=X.Y.Z] [options]\n' +'python3 %(prog)s --update {pytorch,tensorflow,mxnet} [--framework-version=framework-X.Y.Z] [options]\n' +'python3 %(prog)s --update {driver,runtime,tools} [options]\n' +'options= [--file FILE] [--ami {dlami,non-dlami}] [--os {ubuntu,amazonlinux}]\n' ,description='Installer helper for Neuron SDK') group = parser.add_mutually_exclusive_group(required=True) parser.add_argument("--neuron-version",metavar='X.Y.Z') group.add_argument("--list",choices=['neuron_versions','packages','components','frameworks']) group.add_argument("--install",choices=['pytorch','tensorflow','mxnet']) group.add_argument("--update",choices=['pytorch','tensorflow','mxnet']) parser.add_argument("--mode",choices=['develop','compile','deploy'],default='develop') parser.add_argument("--framework-version",metavar='framework-X.Y.Z') parser.add_argument("--os",choices=['ubuntu','amazonlinux'],default='ubuntu',help='default=ubuntu') parser.add_argument("--ami",choices=['dlami','non-dlami'],default='non-dlami',help='default=non-dlami') parser.add_argument("--file",default='neuron-releases-manifest.json',help='default=neuron-releases-manifest.json') return parser.parse_args() def enumerate_release_manifest(nr_setup, in_neuron_version): ######################################## # Enumerate the Json file ######################################## if nr_setup.file==None: nr_setup.file='neuron-releases-manifest.json' try: read_file = open(nr_setup.file, "r") except: print(__name__,": error:","Can't open " + nr_setup.file + " ") exit(-1) neuron_releases = json.load (read_file) latest_neuron_version = neuron_releases["latest_release"]["inf1"]["version"] nr_setup.dlami_conda_env = neuron_releases["dlami_conda_env"] nr_setup.fal_supported_runtime = neuron_releases["fal_supported_runtime"] if (in_neuron_version == None) | (in_neuron_version == 'latest'): neuron_version=latest_neuron_version else: neuron_version = in_neuron_version for n_ver in neuron_releases["neuron_versions"]: neuron_ver_list.append(n_ver) for neuron_release_ver in neuron_releases["neuron_versions"]: m_release=neuron_releases["neuron_versions"][neuron_release_ver]["components"] n_info=neuron_release_info() n_info.python_ver= neuron_releases["neuron_versions"][neuron_release_ver]["python_ver"][0] for component_name in m_release: if m_release[component_name]["framework"]==False: n_info.release_components_list.append(component_name) m_packages=m_release[component_name]["packages"] for package_name in m_packages: for package_ver in m_packages[package_name]["versions"]: m_package_ver=m_packages[package_name]["versions"][package_ver] full_package_name=package_name+'-'+package_ver n_info.release_packages_all[full_package_name]= {"component":component_name,"package":package_name,"version":package_ver,"main":m_package_ver["main_version"],"pre_install_cmds":m_package_ver["pre_install_cmds"],"post_install_cmds":m_package_ver["post_install_cmds"],"package_type":m_package_ver["package_type"]} if m_package_ver["main_version"]: n_info.release_package_main[component_name]={"package":package_name,"version":package_ver,"full_package_name":full_package_name,"pre_install_cmds":m_package_ver["pre_install_cmds"],"post_install_cmds":m_package_ver["post_install_cmds"],"package_type":m_package_ver["package_type"]} if m_release[component_name]["framework"]: ver_digits = package_ver.rsplit('.') fw_ver=ver_digits[0]+'.'+ver_digits[1]+'.'+ver_digits[2] fw_name_ver=component_name+'-'+fw_ver if m_release[component_name]["framework"]: n_info.release_components_list.append(fw_name_ver) n_info.release_frameworks_list.append(fw_name_ver) if m_package_ver["main_version"]: n_info.release_frameworks_main[component_name]={"framework":fw_name_ver,"package":package_name,"version":package_ver,"framework_version":fw_ver,"package_name":full_package_name,"full_package_name":full_package_name,"pre_install_cmds":m_package_ver["pre_install_cmds"],"post_install_cmds":m_package_ver["post_install_cmds"],"package_type":m_package_ver["package_type"]} n_info.release_frameworks_all[fw_name_ver]={"framework":component_name,"package":package_name,"version":package_ver,"main":m_package_ver["main_version"],"framework_version":fw_ver,"package_name":full_package_name,"pre_install_cmds":m_package_ver["pre_install_cmds"],"post_install_cmds":m_package_ver["post_install_cmds"],"package_type":m_package_ver["package_type"]} if 'driver' in n_info.release_components_list: n_info.release_os_install_list.append('driver') if 'runtime-server' in n_info.release_components_list: n_info.release_os_install_list.append('runtime-server') if 'tools' in n_info.release_components_list: n_info.release_os_install_list.append('tools') if 'tensorflow-model-server' in n_info.release_components_list: n_info.release_os_install_list.append('tensorflow-model-server') # correlate TF and TF model server versions for pkg in n_info.release_packages_all.keys(): if n_info.release_packages_all[pkg]['component'] == 'tensorflow': package_ver=n_info.release_packages_all[pkg]['version'] ver_digits = package_ver.rsplit('.') tf_small_ver=ver_digits[0]+'.'+ver_digits[1] for pkg2 in n_info.release_packages_all.keys(): if n_info.release_packages_all[pkg2]['component'] == 'tensorflow-model-server': package_ver=n_info.release_packages_all[pkg2]['version'] ver_digits = package_ver.rsplit('.') tf_model_server_small_ver=ver_digits[0]+'.'+ver_digits[1] if tf_model_server_small_ver==tf_small_ver: n_info.release_tf_package_to_model_server_package[pkg]=pkg2 break nr_setup.releases_info[neuron_release_ver]=n_info try: m_release=neuron_releases["neuron_versions"][neuron_version]["components"] except: print(__name__,": error: ","Version " + neuron_version + " is not a Neuron version or it is not supported") exit(-1) return (neuron_version,latest_neuron_version) ################ # Sanity Checks ################ def cli_validate(update,neuron_version,framework_version,is_latest_neuron,ami): # --update_cmd Sanity check # When choosing update, it always updating to latest , should not provide neuron_version if (update!=None) & (is_latest_neuron == False): print (__name__,": error: ","--update always update to latest Neuron versions, can't specify Neuron version") exit(-1) #if neuron_version != None: # if ami == 'dlami': # print (__name__,": error: ","--neuron_version should not be specified together with --ami=dlami") # exit(-1) if (framework_version != None): if (framework_version not in nr_setup.releases_info[neuron_version].release_frameworks_list): print (__name__,": error: "," " + framework_version + " is not a supported framework") exit(-1) ######################################## # version to tuple ######################################## def versiontuple(v): filled = [] for point in v.split("."): filled.append(point.zfill(8)) return tuple(filled) ######################################## # --list command ######################################## def cli_list_cmd(nr_setup, neuron_version, list): str ='' if (list == 'neuron_versions'): str += '\nList of Neuron release versions supported by this helper:\n' + '\n' for ver in neuron_ver_list: str += 'neuron-'+ver + '\n' #TODO: add "[main]" label to main packages if (list == 'packages'): str += '\nList of Neuron packages included in Neuron release version ' + neuron_version + ':\n' + '\n' for package in nr_setup.releases_info[neuron_version].release_packages_all: if len( nr_setup.releases_info[neuron_version].release_packages_all[package]['package_type']): #FIXME Runtime library hardcode print if (nr_setup.releases_info[neuron_version].release_packages_all[package]["component"] == 'libnrt'): str += nr_setup.releases_info[neuron_version].release_packages_all[package]["component"] +' : \t' + \ "libnrt.so (version "+ \ nr_setup.releases_info[neuron_version].release_packages_all[package]["version"] + ")" + '\n' else: str += nr_setup.releases_info[neuron_version].release_packages_all[package]["component"] +' : \t' + package + '\n' if (list == 'components'): str += '\nList of Neuron components included in Neuron release version ' + neuron_version + ':\n' + '\n' for comp in nr_setup.releases_info[neuron_version].release_components_list: str += comp + '\n' #TODO: add "[main]" label to main frameworks if (list == 'frameworks'): str += '\nList of frameworks included in Neuron release version ' + neuron_version + ':\n' + '\n' for fw in nr_setup.releases_info[neuron_version].release_frameworks_all: str += nr_setup.releases_info[neuron_version].release_frameworks_all[fw]["framework"] +' : \t' + fw + '\n' return str ######################################## # Print configuration ######################################## def hlpr_print_config(nr_setup, neuron_version): str = '' str += '\n' str += '###########################################################################' + '\n' str += '# ' + nr_setup.action + ' ' + nr_setup.framework + ' ' if (nr_setup.framework_version != 'latest') & (nr_setup.framework_version != None): str += '(' + nr_setup.framework_version + ')' + ' ' if nr_setup.action == 'Update': str += 'from latest Neuron version ' + neuron_version else: str += 'from Neuron version ' + neuron_version str += '\n# ' str += 'On ' if (nr_setup.os == 'ubuntu'): str += 'Ubuntu ' elif (nr_setup.os == 'amazonlinux'): str += 'Amazon Linux ' if (nr_setup.ami == 'dlami'): str += 'DLAMI' else: str += 'AMI' str += ' for ' if (nr_setup.mode == 'compile'): str += 'compilation on compute instance' elif (nr_setup.mode == 'develop'): str += 'development on inf1 instance' elif (nr_setup.mode == 'deploy'): str += 'deployment on inf1 instance' str += '\n' str += '###########################################################################' + '\n' str += '\n' return str ################################### # Build Pip command ################################### def hlpr_build_pip_command(nr_setup, neuron_version, component,include_compiler,optional): package_dict= nr_setup.releases_info[neuron_version].release_package_main if (nr_setup.framework_version==None): fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_main fw_comp=component else: fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_all fw_comp=nr_setup.framework_version pip_cmd_prefix='' pip_cmd ='' if nr_setup.action=='Install': pip_cmd_prefix = 'pip install ' else: pip_cmd_prefix = 'pip install --upgrade ' cmd=pip_cmd_prefix if (component == 'mxnet') | (component == 'pytorch') | (component == 'tensorflow'): # Framework installation if (component == 'mxnet') | (component == 'pytorch'): pip_cmd += cmd + fw_package_dict[fw_comp]['package'] if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions == True): pip_cmd += '=='+fw_package_dict[fw_comp]['version'] elif (nr_setup.is_latest_neuron==True)&(nr_setup.framework_version!=None): pip_cmd += '=='+fw_package_dict[fw_comp]['framework_version']+'.*' elif (component == 'tensorflow'): if ((parse(neuron_version)=parse('2.99.99')): os_cmd += '\n' os_cmd += '################################################################################################################\n' os_cmd += '# To install or update to Neuron versions 2.99.99 and newer from previous releases:'+ '\n' if (nr_setup.os=='ubuntu'): os_cmd += '# - Unstall aws-neuron-dkms by calling \`sudo dnf remove aws-neuron-dkms -y\` -y'+ '\n' elif (nr_setup.os=='amazonlinux'): os_cmd += '# - Unstall aws-neuron-dkms by calling \`sudo apt-get remove aws-neuron-dkms\` -y'+ '\n' os_cmd += '# - DO NOT skip \'aws-neuronx-dkms\' install or upgrade step, you MUST install or upgrade to latest Neuron driver'+ '\n' os_cmd += '################################################################################################################\n' elif (parse(neuron_version)>=parse('1.19.1')): os_cmd += '\n' os_cmd += '################################################################################################################\n' os_cmd += '# To install or update to Neuron versions 1.19.1 and newer from previous releases:'+ '\n' os_cmd += '# - DO NOT skip \'aws-neuron-dkms\' install or upgrade step, you MUST install or upgrade to latest Neuron driver'+ '\n' os_cmd += '################################################################################################################\n' # Update header files if driver should be installed or updated if (comp=='driver'): os_cmd += hlpr_os_headers_update(nr_setup) if nr_setup.os=='ubuntu': os_cmd_prefix = 'sudo apt-get install ' elif (nr_setup.action=='Install')&(nr_setup.os=='amazonlinux'): os_cmd_prefix = 'sudo dnf install ' elif (nr_setup.action=='Update')&(nr_setup.os=='amazonlinux'): os_cmd_prefix = 'sudo dnf update ' if comp in nr_setup.releases_info[neuron_version].release_os_install_list: # install only if there is a package associated with the component if (len(pkg_dict[key]['package_type']) != 0): #os_cmd = build_os_command(cmd=os_cmd_prefix,component=comp,is_latest_release=is_latest_neuron) os_cmd += '\n' if (optional==False): os_cmd += '# ' + nr_setup.action + ' ' + package_formal_name[comp] else: os_cmd += '# Optional: ' + nr_setup.action + ' ' + package_formal_name[comp] if (nr_setup.is_latest_neuron==False)&(nr_setup.os=='ubuntu'): os_cmd += '\n' os_cmd += '# If you are downgrading from newer version, please add \'--allow-downgrades\' option to \'sudo apt-get install\' ' if (nr_setup.is_latest_neuron==False)&(nr_setup.os=='amazonlinux'): os_cmd += '\n' os_cmd += '# If you are downgrading from newer version , please remove existing package using \'sudo dnf remove\' before installing the older package' os_cmd += '\n' # Amazon Linux DLAMI will not allow updating tensorflow-model-server and aws-neuron-dkms without adding sudo dnf versionlock delete if ((comp=='tensorflow-model-server') | (comp=='driver')) & (nr_setup.ami == 'dlami') & (nr_setup.os == 'amazonlinux'): os_cmd += 'sudo dnf versionlock delete ' os_cmd += pkg_dict[key]['package'] os_cmd += '\n' os_cmd += os_cmd_prefix + pkg_dict[key]['package'] # Amazon Linux yum installation packaging versioning is set via hyphen not equals version_key = "=" if (nr_setup.os=='amazonlinux'): version_key = "-" if (nr_setup.is_latest_neuron==False) | (nr_setup.force_versions): os_cmd += version_key + pkg_dict[key]['version'] elif (pkg!=None): if ( nr_setup.releases_info[neuron_version].release_package_main[comp]['version']!= nr_setup.releases_info[neuron_version].release_packages_all[pkg]['version']): os_cmd += version_key + pkg_dict[key]['version'] # Ubuntu DLAMI will not allow updating tensorflow-model-server and aws-neuron-dkms without adding --allow-change-held-packages if ((comp=='tensorflow-model-server') | (comp=='driver')) & (nr_setup.ami == 'dlami') & (nr_setup.os == 'ubuntu'): os_cmd += ' --allow-change-held-packages' os_cmd += ' -y' os_cmd += '\n' # Update header files if driver should be installed or updated if (comp=='driver'): os_cmd += '\n' os_cmd += '####################################################################################\n' os_cmd += '# Warning: If Linux kernel is updated as a result of OS package update'+ '\n' if (parse(neuron_version)>=parse('2.99.99')): os_cmd += '# Neuron driver (aws-neuronx-dkms) should be re-installed after reboot'+ '\n' else: os_cmd += '# Neuron driver (aws-neuron-dkms) should be re-installed after reboot'+ '\n' os_cmd += '####################################################################################\n' if (comp=='tools'): if (parse(neuron_version)>=parse('2.99.99')): os_cmd += '\n' os_cmd += '################################################################################################################\n' os_cmd += '# To install or update to Neuron versions 2.99.99 and newer from previous releases:'+ '\n' if (nr_setup.os=='ubuntu'): os_cmd += '# - Unstall aws-neuron-tools by calling \`sudo dnf remove aws-neuron-tools -y\` -y'+ '\n' elif (nr_setup.os=='amazonlinux'): os_cmd += '# - Unstall aws-neuron-tools by calling \`sudo apt-get remove aws-neuron-tools\` -y'+ '\n' os_cmd += '################################################################################################################\n' return os_cmd ######################################## ## installation / Update instructions ######################################## def hlpr_instructions(nr_setup, neuron_version): cmd_string = '' setup_mode=nr_setup.mode # look for conda environment for this framework version for fw_env in nr_setup.dlami_conda_env: if fw_env != nr_setup.framework: continue fw_ver_conda_env=nr_setup.dlami_conda_env[fw_env] for conda_env_fw_ver in fw_ver_conda_env: if (conda_env_fw_ver == nr_setup.fw_package_dict[nr_setup.fw_comp]['framework_version']): nr_setup.conda_env=nr_setup.dlami_conda_env[fw_env][conda_env_fw_ver][0] nr_setup.generic_conda_env=nr_setup.dlami_conda_env[fw_env][conda_env_fw_ver][1] break # look what runtime works with this framework version fal_rtd=False fal_libnrt=False for fw in nr_setup.fal_supported_runtime: if fw != nr_setup.framework: continue if fw == nr_setup.framework: if (nr_setup.framework_version == None): fw_ver= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['framework_version'] fal_version= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['version'] else: fw_ver= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['framework_version'] fal_version= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['version'] fal_supported_rtd=nr_setup.fal_supported_runtime[fw][fw_ver]['neuron-rtd'] fal_supported_libnrt=nr_setup.fal_supported_runtime[fw][fw_ver]['libnrt'] if (parse(fal_version) >= parse(fal_supported_rtd[0])) & \ (parse(fal_version) <= parse(fal_supported_rtd[1])): fal_rtd=True elif (parse(fal_version) >= parse(fal_supported_libnrt[0])) & \ (parse(fal_version) <= parse(fal_supported_libnrt[1])): fal_libnrt=True if nr_setup.conda_env == "None": dlami_ev_exists=False else: dlami_ev_exists=True #cmd_string += hlpr_print_config(nr_setup, neuron_version) if (nr_setup.framework_version==None): fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_main fw_comp=nr_setup.framework else: fw_package_dict= nr_setup.releases_info[neuron_version].release_frameworks_all fw_comp=nr_setup.framework_version if (nr_setup.framework !=None): #if install or update # If we are not using DLAMI if (nr_setup.ami=='non-dlami') | \ ((nr_setup.ami=='dlami') & \ ( (nr_setup.action == 'Update') | \ (dlami_ev_exists==False) | \ (nr_setup.is_latest_neuron==False)) \ ): if (nr_setup.ami=='dlami') & (dlami_ev_exists==False): cmd_string += '\n' cmd_string += '# Note: There is no DLAMI Conda environment for this framework version'+ '\n' cmd_string += '# Framework will be installed/updated inside a Python environment'+ '\n' if (setup_mode == 'develop') | (setup_mode == 'deploy'): if (nr_setup.action =='Install')&(nr_setup.ami!='dlami'): # For First install, setup Neuron OS packagaes repo (dnf or apt) cmd_string += hlpr_os_packages_first_setup(nr_setup) # Always update to latest OS packages cmd_string += hlpr_os_packages_update(nr_setup) cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='driver',optional=False,pkg=None) #FIXME Temporary check for MXNET 1.5 in maintenance mode if (neuron_version == "1.16.0") & (nr_setup.framework=="mxnet")& \ (fw_package_dict[fw_comp]['framework_version']=="1.5.1"): cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version="1.15.2", comp='runtime-server',optional=False,pkg=None) elif (fal_rtd): cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='runtime-server',optional=False,pkg=None) #if mode = develop, install tools if (setup_mode == 'develop'): cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='tools',optional=False,pkg=None) if (nr_setup.framework == 'tensorflow'): cmd_string += hlpr_build_pip_command(nr_setup, neuron_version, component='tensorboard',include_compiler=False,optional=False) if (nr_setup.action =='Install'): cmd_string += hlpr_os_export_path(nr_setup) if (nr_setup.ami=='non-dlami') | \ ((nr_setup.ami=='dlami')&(nr_setup.generic_conda_env=="None")): if (nr_setup.action =='Install'): # For first install , install python venv and activate a venv cmd_string += hlpr_pip_install_create_python_venv(nr_setup, neuron_version) elif (nr_setup.action =='Update'): # For nect times, activate the venv used for initial install cmd_string += hlpr_pip_activate_python_venv(nr_setup, neuron_version) elif (nr_setup.ami=='dlami'): cmd_string += hlpr_framework_dlami_activate(nr_setup) # Setup Neuron pip packages cmd_string += hlpr_pip_repos_setup() # Now install framework if (setup_mode == 'deploy'): # do not install compiler when deploying cmd_string += hlpr_framework_compiler_setup(nr_setup, neuron_version, include_compiler=False) else: # install compiler when mode = developer or mode = compile cmd_string += hlpr_framework_compiler_setup(nr_setup, neuron_version, include_compiler=True) #if mode = deploy, install model server if (setup_mode != 'compile'): if (nr_setup.framework == 'tensorflow'): if (nr_setup.framework_version==None): tf_package= nr_setup.releases_info[neuron_version].release_frameworks_main[nr_setup.framework]['package_name'] else: tf_package= nr_setup.releases_info[neuron_version].release_frameworks_all[nr_setup.framework_version]['package_name'] cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='tensorflow-model-server',optional=True,pkg= nr_setup.releases_info[neuron_version].release_tf_package_to_model_server_package[tf_package]) # if running DLAMI elif (nr_setup.ami=='dlami'): if (nr_setup.action =='Install'): cmd_string += '\n' cmd_string += '# Neuron is pre-installed on Deep Learning AMI (DLAMI), latest DLAMI version may not include latest Neuron versions '+ '\n' cmd_string += '# To update to latest Neuron version, follow "Update to latest release" instruction on Neuron documentation'+ '\n' # WARNING: Exception # Starting Neuron 1.16.0 , new kernel is needed to work with Runtime 2.x (library mode) if (parse(neuron_version)>=parse('1.16.0')): if (setup_mode == 'develop') | (setup_mode == 'deploy'): cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version, comp='driver',optional=False,pkg=None) #FIXME Temporary check for MXNET 1.5 in maintenance mode if (neuron_version == "1.16.0") & (nr_setup.framework=="mxnet")& \ (fw_package_dict[fw_comp]['framework_version']=="1.5.1"): cmd_string += hlpr_os_comp_setup_cmd(nr_setup, neuron_version="1.15.2", comp='runtime-server',optional=False,pkg=None) cmd_string += '\n' cmd_string += hlpr_framework_dlami_activate(nr_setup) return cmd_string ######################################## # neuron_setup_helper ######################################## class neuron_setup_helper: def __init__(self, manifest_file,neuron_version): # All Neuron releases self.releases_info = {} if (manifest_file== None) | (manifest_file== 'default') : self.file = 'neuron-releases-manifest.json' else: self.file = manifest_file ver_tuple = enumerate_release_manifest(nr_setup=self,in_neuron_version=neuron_version) self.neuron_version = ver_tuple[0] self.latest_neuron_version = ver_tuple[1] self.conda_env="" self.python_ver="" self.generic_conda_env="" if self.neuron_version == self.latest_neuron_version: self.is_latest_neuron=True else: self.is_latest_neuron=False if (self.is_latest_neuron) & (neuron_version !=None) & (neuron_version !='latest'): # User explicitly specified the version, although it is the latest version # in this case the instructions will include the exact versions of the packages self.force_versions=True else: self.force_versions=False def instructions(self,framework,action,framework_version,os,ami,mode): self.framework=framework self.action=action self.mode=mode self.os=os self.ami=ami if (framework_version=='latest'): self.framework_version=None else: self.framework_version=framework_version setup_cmd = "" if (self.framework_version==None): self.fw_package_dict= self.releases_info[self.neuron_version].release_frameworks_main self.fw_comp=self.framework else: self.fw_package_dict= self.releases_info[self.neuron_version].release_frameworks_all self.fw_comp=self.framework_version setup_cmd=hlpr_instructions(self,self.neuron_version) return setup_cmd if __name__ == '__main__': setup_cmd ='' args = cli_parse_arguments() nr_setup=neuron_setup_helper(manifest_file=args.file,neuron_version=args.neuron_version) cli_validate(update=args.update,neuron_version=nr_setup.neuron_version,framework_version=args.framework_version,is_latest_neuron=nr_setup.is_latest_neuron,ami=args.ami) if (args.list): setup_cmd += cli_list_cmd(nr_setup=nr_setup,neuron_version=nr_setup.neuron_version, list=args.list) else: if (args.install != None)|(args.update !=None): if args.install: framework=args.install action = 'Install' elif args.update: framework=args.update action = 'Update' else: action = None framework=None setup_cmd += nr_setup.instructions(framework=framework,action=action,framework_version=args.framework_version,os=args.os,ami=args.ami,mode=args.mode) print (setup_cmd) ================================================ FILE: src/helperscripts/release-manifest-def.py ================================================ neuron_releases={ "repos":{ "whl"="_url", # url of the wheel repo "rpm"="_url", # url of the rpm repo (yum) "deb"="_url", # url of the debian repo (apt) } "manifest_date": "_date", "manifest_version":"_ver" # Will increment when format change "latest_release":{ "_instance":{ # can be "inf1", "trn1", etc.. "version":"_ver" # latest neuron release that support the _instance } } "neuron_versions"={ # all neuron release versions supported by this manifest "_neuron_version":{ # Neuron release version entry e.g. "1.14.0" "python_ver": ["_ver"] # list of python versions supported by this neuron release, e.g. "3.6" "instance_support": ["_instance"] # list of instances supported by this neuron release "arch":["_arch"] # list of architectures supported by this neuron release (e.g. x86) "components":{ # all components included in this neuron release # (e.g. compiler, driver , pytorch ...) "_component_name":{ # component entry (e.g. driver, compiler) "framework":_boolean # is this component a framework ? # needed since there is a differces in versioning and content etc .. "packages":{ # all packages of this component that included in this release # e.g. mxnet support mx_neuron and mxnet-neuron "_package_name":{ # package entry (e.g. mx_neuron) "install_on_compute_instance":_booolean # can this package installed on compute instance? "versions":{ # all versions of the specific package # e.g. torch-neuron may include multiple versions "_ver":{ # package version entry (e.g. 1.4.1.0) "pre_install_cmds":["_cmd"] # a list of commands to call before installing # the package, e.g. when a plugin need to install the # framework first , as in mx_neuron "post_install_cmds":["_cmd"] # a list of commands to call after installaing the package "format":["_format"] # package format (e.g. bin or src) "content":["_content"] # package content # (e.g. tools include neuron-top, neuron monitor etc .. ) "package_type":["_type"] # list of package type supported ( e.g. whl, rpm, deb) } } } } } } } }, "softwarelifecycle":{ # Status of neuron software releases (supported, maintained, deprecated) # Releases that are not under "supported" or "maintained" should be "supported" "maintained":{ # Releases that are being maintained, no active development, bug fixes can be provided # releases can be Neuron release, component (e.g. runtime), or a framework (e.g. pytorch-1.5.x) "neuron_versions":{ # Neuron versions that are under maintanance status "from":"_ver" # from neuron release version "to":"_ver" # to neuron release version }, "components":{ # Components that are under maintanance status "_component_name":{ # packages in that component "_package_name":{ # package entry "from":"_ver" # from version "to":"_ver" # to version } } }, "frameworks":{ # Frameworks that are under maintanance status "pytorch":{ # Pytorch versions that are under maintanance status "from":"_ver" # from version "to":"_ver" # to version }, "tensorflow":{ # Pytorch versions that are under maintanance status "from":"_ver" # from version "to":"_ver" # to version }, "mxnewt":{ # MXNet versions that are under maintanance status "from":"_ver" # from version "to":"_ver" # to version } } }, "deprecated":{ # Releases that are deprecated, no bug fixes # format similar to "maintained" section }, }, "compatability": { # compatability section "_component_name": { # component entry "_package_name": { # package entry "_ver_to__ver": { # compatability entry "from": "_ver", # from version "to": "_ver", # to version "instance_support": [ # instance compatability "_instance" ], "arch": [ # arch compatability "_arch" ], "components": { # components compatability section "_component_name": { # component entry "_package_name": { # package entry "from": "_ver", # from version "to": "_ver" # to version } } } } } } } } ================================================ FILE: src/k8/bert_service.yml ================================================ --- kind: Service apiVersion: v1 metadata: name: inf-k8s-test labels: app: inf-k8s-test spec: ports: - name: http-tf-serving port: 8500 targetPort: 8500 - name: grpc-tf-serving port: 9000 targetPort: 9000 selector: app: inf-k8s-test role: master type: ClusterIP --- kind: Deployment apiVersion: apps/v1 metadata: name: inf-k8s-test labels: app: inf-k8s-test role: master spec: replicas: 1 # Number of desired replicas. Increase to desired number. selector: matchLabels: app: inf-k8s-test role: master template: metadata: labels: app: inf-k8s-test role: master spec: volumes: - name: sock emptyDir: {} containers: - name: inf-k8s-test image: tf-serving-ctr imagePullPolicy: IfNotPresent command: ["/bin/sh","-c"] # Pull model from s3, then start tensorflow_model_server_neuron with the model. args: - "aws s3 sync s3:///bert /tmp/bert && \ tensorflow_model_server_neuron --port=9000 --rest_api_port=8500 --model_name=bert_mrpc_hc_gelus_b4_l24_0926_02 --model_base_path=/tmp//bert/" # Open grpc and rest API ports ports: - containerPort: 8500 - containerPort: 9000 # Arbitrary resource requirements resources: limits: cpu: 4 memory: 4Gi aws.amazon.com/neuron: 1 # desired number of Inferentia devices. requests: cpu: "1" memory: 1Gi aws.amazon.com/neuron: 1 # desired number of Inferentia devices. ================================================ FILE: src/k8/k8s-neuron-device-plugin-rbac.yml ================================================ # rbac.yaml --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: neuron-device-plugin rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - "" resources: - pods verbs: - update - patch - get - list - watch - apiGroups: - "" resources: - nodes/status verbs: - patch - update --- apiVersion: v1 kind: ServiceAccount metadata: name: neuron-device-plugin namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: neuron-device-plugin namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: neuron-device-plugin subjects: - kind: ServiceAccount name: neuron-device-plugin namespace: kube-system ================================================ FILE: src/k8/k8s-neuron-device-plugin.yml ================================================ # https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ apiVersion: apps/v1 kind: DaemonSet metadata: name: neuron-device-plugin-daemonset namespace: kube-system spec: selector: matchLabels: name: neuron-device-plugin-ds updateStrategy: type: RollingUpdate template: metadata: # Uncomment the annotation below if k8s version is 1.13 or lower # annotations: # scheduler.alpha.kubernetes.io/critical-pod: "" labels: name: neuron-device-plugin-ds spec: serviceAccount: neuron-device-plugin tolerations: - key: CriticalAddonsOnly operator: Exists - key: aws.amazon.com/neuron operator: Exists effect: NoSchedule # Mark this pod as a critical add-on; when enabled, the critical add-on # scheduler reserves resources for critical add-on pods so that they can # be rescheduled after a failure. # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/ priorityClassName: "system-node-critical" affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: # Uncomment following matchExpressions if using k8s 1.16 or lower #- matchExpressions: # - key: "beta.kubernetes.io/instance-type" # operator: In # values: # - inf1.xlarge # - inf1.2xlarge # - inf1.6xlarge # - inf1.24xlarge # - inf2.xlarge # - inf2.8xlarge # - inf2.24xlarge # - inf2.48xlarge # - trn1.2xlarge # - trn1.32xlarge # - trn1n.32xlarge - matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: - inf1.xlarge - inf1.2xlarge - inf1.6xlarge - inf1.24xlarge - inf2.xlarge - inf2.8xlarge - inf2.24xlarge - inf2.48xlarge - trn1.2xlarge - trn1.32xlarge - trn1n.32xlarge containers: # Find all neuron-device-plugin images at https://gallery.ecr.aws/neuron/neuron-device-plugin - image: public.ecr.aws/neuron/neuron-device-plugin:2.22.4.0 imagePullPolicy: Always name: neuron-device-plugin env: - name: KUBECONFIG value: /etc/kubernetes/kubelet.conf - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] volumeMounts: - name: device-plugin mountPath: /var/lib/kubelet/device-plugins - name: infa-map mountPath: /run volumes: - name: device-plugin hostPath: path: /var/lib/kubelet/device-plugins - name: infa-map hostPath: path: /run ================================================ FILE: src/k8/k8s-neuron-monitor-daemonset.yml ================================================ apiVersion: apps/v1 kind: DaemonSet metadata: name: neuron-monitor namespace: neuron-monitor labels: app: neuron-monitor version: v1 spec: selector: matchLabels: app: neuron-monitor template: metadata: labels: app: neuron-monitor version: v1 spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux - key: node.kubernetes.io/instance-type operator: In values: - trn1.2xlarge - trn1.32xlarge - trn1n.32xlarge - inf1.xlarge - inf1.2xlarge - inf1.6xlarge - inf2.xlarge - inf2.8xlarge - inf2.24xlarge - inf2.48xlarge containers: - name: neuron-monitor image: public.ecr.aws/neuron/neuron-monitor:1.3.0 ports: - containerPort: 8000 command: - "/opt/bin/entrypoint.sh" args: - "--port" - "8000" - "--neuron-monitor-config" - "/opt/aws/neuron/bin/neuron-monitor.conf" resources: limits: cpu: 500m memory: 256Mi requests: cpu: 256m memory: 128Mi env: - name: GOMEMLIMIT value: 160MiB securityContext: privileged: true ================================================ FILE: src/k8/k8s-neuron-scheduler-configmap.yml ================================================ apiVersion: v1 data: policy.cfg: | { "kind": "Policy", "apiVersion": "v1", "extenders": [ { "urlPrefix": "http://127.0.0.1:32700", "filterVerb": "filter", "bindVerb": "bind", "enableHttps": false, "nodeCacheCapable": true, "managedResources": [ { "name": "aws.amazon.com/neuron", "ignoredByScheduler": false }, { "name": "aws.amazon.com/neurondevice", "ignoredByScheduler": false }, { "name": "aws.amazon.com/neuroncore", "ignoredByScheduler": false } ], "ignorable": false } ] } kind: ConfigMap metadata: name: scheduler-policy namespace: kube-system ================================================ FILE: src/k8/k8s-neuron-scheduler-eks.yml ================================================ # rbac.yaml --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: k8s-neuron-scheduler rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch - apiGroups: - "" resources: - nodes/status verbs: - update - patch - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - "" resources: - pods verbs: - update - patch - get - list - watch - apiGroups: - "" resources: - bindings - pods/binding verbs: - create --- apiVersion: v1 kind: ServiceAccount metadata: name: k8s-neuron-scheduler namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: k8s-neuron-scheduler namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: k8s-neuron-scheduler subjects: - kind: ServiceAccount name: k8s-neuron-scheduler namespace: kube-system # deployment yaml --- kind: Deployment apiVersion: apps/v1 metadata: name: k8s-neuron-scheduler namespace: kube-system spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: neuron-scheduler component: k8s-neuron-scheduler template: metadata: labels: app: neuron-scheduler component: k8s-neuron-scheduler annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: serviceAccount: k8s-neuron-scheduler schedulerName: my-scheduler containers: - name: neuron-scheduler-exp # Find all neuron-scheduler images at https://gallery.ecr.aws/neuron/neuron-scheduler image: public.ecr.aws/neuron/neuron-scheduler:2.22.4.0 env: - name: PORT value: "12345" # service.yaml --- apiVersion: v1 kind: Service metadata: name: k8s-neuron-scheduler namespace: kube-system labels: app: neuron-scheduler component: k8s-neuron-scheduler spec: ports: - port: 12345 name: http targetPort: 12345 selector: # select app=ingress-nginx pods app: neuron-scheduler component: k8s-neuron-scheduler ================================================ FILE: src/k8/k8s-neuron-scheduler.yml ================================================ # rbac.yaml --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: k8s-neuron-scheduler rules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - "" resources: - pods verbs: - update - patch - get - list - watch - apiGroups: - "" resources: - bindings - pods/binding verbs: - create --- apiVersion: v1 kind: ServiceAccount metadata: name: k8s-neuron-scheduler namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: k8s-neuron-scheduler namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: k8s-neuron-scheduler subjects: - kind: ServiceAccount name: k8s-neuron-scheduler namespace: kube-system # deployment yaml --- kind: Deployment apiVersion: apps/v1 metadata: name: k8s-neuron-scheduler namespace: kube-system spec: replicas: 1 strategy: type: Recreate selector: matchLabels: app: neuron-scheduler component: k8s-neuron-scheduler template: metadata: labels: app: neuron-scheduler component: k8s-neuron-scheduler annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: hostNetwork: true tolerations: - effect: NoSchedule operator: Exists key: node-role.kubernetes.io/master - effect: NoSchedule operator: Exists key: node.cloudprovider.kubernetes.io/uninitialized nodeSelector: node-role.kubernetes.io/master: "" serviceAccount: k8s-neuron-scheduler containers: - name: neuron-scheduler # Find all neuron-scheduler images at https://gallery.ecr.aws/neuron/neuron-scheduler image: public.ecr.aws/neuron/neuron-scheduler:2.22.4.0 env: - name: PORT value: "12345" # service.yaml --- apiVersion: v1 kind: Service metadata: name: k8s-neuron-scheduler namespace: kube-system labels: app: neuron-scheduler component: k8s-neuron-scheduler spec: type: NodePort ports: - port: 12345 name: http targetPort: 12345 nodePort: 32700 selector: # select app=ingress-nginx pods app: neuron-scheduler component: k8s-neuron-scheduler ================================================ FILE: src/k8/k8s-ultraserver-init-script.sh ================================================ #!/bin/bash MPI_HOST_FILE=/etc/mpi/hostfile NEURON_ULTRASERVER_MODE_UNSET=0 NEURON_ULTRASERVER_MODE_X4=1 NEURON_ULTRASERVER_MODE_X2H=2 NEURON_ULTRASERVER_MODE_X2V=3 NEURON_ULTRASERVER_MODE_X1=4 ULTRASERVER_INIT_DIR=/root/ultraserver_init SORTED_NODES_FILE=$ULTRASERVER_INIT_DIR/sorted_nodes.txt FQDN_MODE_FILE=$ULTRASERVER_INIT_DIR/fqdn_mode.txt ENV_VARS_FILE=$ULTRASERVER_INIT_DIR/us_env_vars.txt NEW_HOST_FILE=$ULTRASERVER_INIT_DIR/new_hostfile export NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE="0000000000000000" export NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE=-1 export NEURON_GLOBAL_TOPOID0_HOST="" export NUM_WORKERS=0 cat /dev/null > $SORTED_NODES_FILE cat /dev/null > $FQDN_MODE_FILE cat /dev/null > $ENV_VARS_FILE cat /dev/null > $NEW_HOST_FILE save_sorted_node_list() { # Gather ultraserver information from each worker node mpirun --allow-run-as-root \ --mca orte_keep_fqdn_hostnames 1 \ -host $ip_list \ -x NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE \ -x NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE \ -x NEURON_ULTRASERVER_NODE_CONFIG \ sh -c ' if [ -f "/sys/class/neuron_device/server_id_${NEURON_ULTRASERVER_NODE_CONFIG}" ]; then NEURON_ULTRASERVER_SERVER_ID=$(cat /sys/class/neuron_device/server_id_${NEURON_ULTRASERVER_NODE_CONFIG}) else NEURON_ULTRASERVER_SERVER_ID=$NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE fi if [ -f "/sys/class/neuron_device/node_id_${NEURON_ULTRASERVER_NODE_CONFIG}" ]; then NEURON_ULTRASERVER_NODE_ID=$(cat /sys/class/neuron_device/node_id_${NEURON_ULTRASERVER_NODE_CONFIG}) else NEURON_ULTRASERVER_NODE_ID=$NEURON_ULTRASERVER_NODE_ID_DEFAULT_VALUE fi FQDN=$(hostname --fqdn) echo $NEURON_ULTRASERVER_SERVER_ID:$NEURON_ULTRASERVER_NODE_ID:$FQDN ' | sort -t':' -k1,1 -k2,2 -k3,3 > $SORTED_NODES_FILE # Set the topology ids for each worker node local i=0 while IFS= read -r line; do echo "${i}:${line}" ((i++)) done < $SORTED_NODES_FILE > temp && mv temp $SORTED_NODES_FILE NEURON_GLOBAL_TOPOID0_HOST=$(head -n1 $SORTED_NODES_FILE | cut -d: -f4) } validate_node_config() { while read -r server_id; do # Server id and node id are only valid for node configs > 1 if [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 1 ]; then # Validate server id exists if [ "$server_id" = "$NEURON_ULTRASERVER_SERVER_ID_DEFAULT_VALUE" ]; then echo "$NEURON_ULTRASERVER_NODE_CONFIG-node config is not supported" exit 1 fi # Validate there is the correct amount of nodes that share the same server id count=$(grep "$server_id" "$SORTED_NODES_FILE" | wc -l) if [ $count -ne $NEURON_ULTRASERVER_NODE_CONFIG ]; then echo "Error: Incorrect number of nodes with server id $server_id, need $NEURON_ULTRASERVER_NODE_CONFIG nodes but saw $count" exit 1 fi # Validate all the node ids are unique node_ids_count=$(grep "$server_id" "$SORTED_NODES_FILE" | cut -d':' -f3 | sort | uniq | wc -l) if [ $node_ids_count -ne $NEURON_ULTRASERVER_NODE_CONFIG ]; then echo "Error: Found $node_ids_count unique node IDs, expected $NEURON_ULTRASERVER_NODE_CONFIG" exit 1 fi fi while IFS=':' read -r tid sid nid fqdn; do # Validate mode is valid for each node modes="${fqdn_modes_map[$fqdn]}" if [ $NEURON_ULTRASERVER_NODE_CONFIG -eq 4 ]; then if echo "$modes" | grep -q "\b$NEURON_ULTRASERVER_MODE_X4\b"; then mode=$NEURON_ULTRASERVER_MODE_X4 else echo "Error: Node $fqdn does not support 4-node config" exit 1 fi elif [ $NEURON_ULTRASERVER_NODE_CONFIG -eq 2 ]; then if echo "$modes" | grep -q "\b$NEURON_ULTRASERVER_MODE_X2V\b"; then mode=$NEURON_ULTRASERVER_MODE_X2V elif echo "$modes" | grep -q "\b$NEURON_ULTRASERVER_MODE_X2H\b"; then mode=$NEURON_ULTRASERVER_MODE_X2H else echo "Error: Node $fqdn does not support 2-node config" exit 1 fi else mode=$NEURON_ULTRASERVER_MODE_X1 fi # Save each worker node's environments variables to a file echo "${tid}:${mode}:${sid}:${nid}:${fqdn}" >> "$ENV_VARS_FILE" done < <(grep "$server_id" "$SORTED_NODES_FILE") done < <(cut -d':' -f2 "$SORTED_NODES_FILE" | sort | uniq) } reorder_hostfile() { # Check if files exist if [ ! -f "$MPI_HOST_FILE" ] || [ ! -f "$SORTED_NODES_FILE" ]; then echo "Error: One or both input files do not exist" exit 1 fi # Extract FQDNs from SORTED_NODES_FILE and reorder entries while IFS=: read -r _ _ _ fqdn; do # Remove .cluster.local suffix clean_fqdn=${fqdn%.cluster.local} # Find the matching line in original file while read -r line; do if [[ "$line" == "$clean_fqdn"* ]]; then echo "$line" >> "$NEW_HOST_FILE" break fi done < "$MPI_HOST_FILE" done < "$SORTED_NODES_FILE" } # Validate node config if [ -z "${NEURON_ULTRASERVER_NODE_CONFIG}" ]; then NEURON_ULTRASERVER_NODE_CONFIG=4 fi if [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 1 ] && [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 2 ] && [ $NEURON_ULTRASERVER_NODE_CONFIG -ne 4 ]; then echo "Error: Invalid ultraserver node config: $NEURON_ULTRASERVER_NODE_CONFIG. Must be 1, 2, or 4." exit 1 fi echo "Using $NEURON_ULTRASERVER_NODE_CONFIG-node config" echo -e "\nCurrent hostfile:" cat $MPI_HOST_FILE # Read the file, extract the first column, resolve IPs, and build the comma-separated string ip_list="" while read line; do ip=$(getent hosts "$line" | awk '{print $1}') if [ -z "$ip" ]; then echo "error: Unable to resolve IP address for host: $line" exit 1 fi if [ -z "$ip_list" ]; then ip_list="$ip" else ip_list="${ip_list},${ip}" fi done < <(cut -d' ' -f1 $MPI_HOST_FILE) echo "Worker pod IPs:" "$ip_list" # Count unique IPs from ip_list and store in NUM_WORKERS NUM_WORKERS=$(echo "$ip_list" | tr -cd ',' | wc -c) NUM_WORKERS=$((NUM_WORKERS + 1)) echo "Number of worker nodes: $NUM_WORKERS" # Validate that the number of workers is a multiple of the node config if [ $((NUM_WORKERS % NEURON_ULTRASERVER_NODE_CONFIG)) -ne 0 ]; then echo "Error: Invalid number of worker nodes for $NEURON_ULTRASERVER_NODE_CONFIG-node config: $NUM_WORKERS." exit 1 fi # Create a map of workers to their possible ultraserver modes mpirun --allow-run-as-root \ --mca orte_keep_fqdn_hostnames 1 \ -host $ip_list \ sh -c ' FQDN=$(hostname --fqdn) NEURON_ULTRASERVER_MODE=$(cat /sys/class/neuron_device/ultraserver_mode) echo $FQDN:$NEURON_ULTRASERVER_MODE ' | sort -t':' -k1 > $FQDN_MODE_FILE declare -A fqdn_modes_map while IFS=':' read -r fqdn mode; do fqdn_modes_map["$fqdn"]="$mode" done < $FQDN_MODE_FILE (echo "FQDN:Modes" && cat $FQDN_MODE_FILE) | tr ':' ' ' # Validate worker nodes echo -e "\nSorted nodes:" save_sorted_node_list (echo "TOPO_ID:SERVER_ID:NODE_ID:FQDN" && cat $SORTED_NODES_FILE) | tr ':' ' ' echo -e "\nNEURON_GLOBAL_TOPOID0 node will be: $NEURON_GLOBAL_TOPOID0_HOST" validate_node_config # Update hostlist echo -e "\nUpdated hostfile:" reorder_hostfile cat $NEW_HOST_FILE # Write environment variables to each worker node for line in `cat $ENV_VARS_FILE`; do IFS=':' read -r topo_id mode server_id node_id fqdn <<< "$line" export mode server_id node_id fqdn topo_id mpirun --allow-run-as-root \ --mca orte_keep_fqdn_hostnames 1 \ -host $fqdn \ -x topo_id \ -x NEURON_GLOBAL_TOPOID0_HOST \ -x mode \ -x server_id \ -x node_id \ sh -c ' sed -i "/^NEURON_GLOBAL_TOPOID=/d" /etc/environment sed -i "/^NEURON_GLOBAL_TOPOID0_HOST=/d" /etc/environment sed -i "/^NEURON_RT_ULTRASERVER_MODE=/d" /etc/environment sed -i "/^NEURON_RT_ULTRASERVER_SERVER_ID=/d" /etc/environment sed -i "/^NEURON_RT_ULTRASERVER_NODE_ID=/d" /etc/environment echo "NEURON_GLOBAL_TOPOID=$topo_id" >> /etc/environment echo "NEURON_GLOBAL_TOPOID0_HOST=$NEURON_GLOBAL_TOPOID0_HOST" >> /etc/environment echo "NEURON_RT_ULTRASERVER_MODE=$mode" >> /etc/environment echo "NEURON_RT_ULTRASERVER_SERVER_ID=$server_id" >> /etc/environment echo "NEURON_RT_ULTRASERVER_NODE_ID=$node_id" >> /etc/environment echo "Node $(hostname --fqdn): Variables set and persisted" echo "NEURON_GLOBAL_TOPOID=$topo_id" echo "NEURON_GLOBAL_TOPOID0_HOST=$NEURON_GLOBAL_TOPOID0_HOST" echo "NEURON_RT_ULTRASERVER_MODE=$mode" echo "NEURON_RT_ULTRASERVER_SERVER_ID=$server_id" echo "NEURON_RT_ULTRASERVER_NODE_ID=$node_id" ' done ================================================ FILE: src/k8/my-scheduler.yml ================================================ apiVersion: v1 kind: ServiceAccount metadata: name: my-scheduler namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: my-scheduler-as-kube-scheduler subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system roleRef: kind: ClusterRole name: system:kube-scheduler apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: my-scheduler-as-volume-scheduler subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system roleRef: kind: ClusterRole name: system:volume-scheduler apiGroup: rbac.authorization.k8s.io --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: my-scheduler rules: - apiGroups: - "" resources: - configmaps verbs: - get - list - watch - apiGroups: - coordination.k8s.io resources: - leases verbs: - create - get - list - update --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: my-scheduler namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: my-scheduler subjects: - kind: ServiceAccount name: my-scheduler namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: my-scheduler-config namespace: kube-system data: my-scheduler-config.yaml: | apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: my-scheduler extenders: - urlPrefix: 'http://k8s-neuron-scheduler.kube-system.svc.cluster.local:12345' filterVerb: filter bindVerb: bind enableHTTPS: false nodeCacheCapable: true managedResources: - name: 'aws.amazon.com/neuron' ignoredByScheduler: false - name: 'aws.amazon.com/neuroncore' ignoredByScheduler: false - name: 'aws.amazon.com/neurondevice' ignoredByScheduler: false ignorable: false leaderElection: leaderElect: true resourceNamespace: kube-system resourceName: my-scheduler --- apiVersion: apps/v1 kind: Deployment metadata: labels: component: scheduler tier: control-plane name: my-scheduler namespace: kube-system spec: selector: matchLabels: component: scheduler tier: control-plane replicas: 1 template: metadata: labels: component: scheduler tier: control-plane version: second spec: serviceAccountName: my-scheduler containers: - args: - --config=/etc/kubernetes/my-scheduler/my-scheduler-config.yaml - --leader-elect=true - --v=2 command: - /usr/local/bin/kube-scheduler image: public.ecr.aws/eks-distro/kubernetes/kube-scheduler:v1.28.5-eks-1-28-latest # or use below for your version of k8s # image: registry.k8s.io/kube-scheduler: livenessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS initialDelaySeconds: 15 name: kube-second-scheduler readinessProbe: httpGet: path: /healthz port: 10259 scheme: HTTPS resources: requests: cpu: '0.1' securityContext: privileged: false volumeMounts: - name: config-volume mountPath: /etc/kubernetes/my-scheduler hostNetwork: false hostPID: false volumes: - name: config-volume configMap: name: my-scheduler-config ================================================ FILE: src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-config.yml ================================================ apiVersion: v1 data: kernel-monitor.json: | { "plugin": "kmsg", "logPath": "/dev/kmsg", "lookback": "5m", "bufferSize": 10, "source": "kernel-monitor", "conditions": [ { "type": "NeuronHealth", "reason": "NeuronHasNoError", "message": "Neuron has no error" } ], "rules": [ { "type": "permanent", "condition": "NeuronHealth", "reason": "NeuronHasError_SRAM_UNCORRECTABLE_ERROR", "pattern": ".* NEURON_HW_ERR=SRAM_UNCORRECTABLE_ERROR .*" }, { "type": "permanent", "condition": "NeuronHealth", "reason": "NeuronHasError_NC_UNCORRECTABLE_ERROR", "pattern": ".* NEURON_HW_ERR=NC_UNCORRECTABLE_ERROR .*" }, { "type": "permanent", "condition": "NeuronHealth", "reason": "NeuronHasError_HBM_UNCORRECTABLE_ERROR", "pattern": ".* NEURON_HW_ERR=HBM_UNCORRECTABLE_ERROR .*" }, { "type": "permanent", "condition": "NeuronHealth", "reason": "NeuronHasError_DMA_ERROR", "pattern": ".* NEURON_HW_ERR=DMA_ERROR .*" } ] } kind: ConfigMap metadata: name: node-problem-detector-config namespace: neuron-healthcheck-system ================================================ FILE: src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery-rbac.yml ================================================ apiVersion: v1 kind: ServiceAccount metadata: name: node-problem-detector namespace: neuron-healthcheck-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: npd-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:node-problem-detector subjects: - kind: ServiceAccount name: node-problem-detector namespace: neuron-healthcheck-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: kubernetes.io/bootstrapping: rbac-defaults name: system:node-problem-detector rules: - apiGroups: - "" resources: - nodes verbs: - get - apiGroups: - "" resources: - nodes/status verbs: - patch - apiGroups: - "" - events.k8s.io resources: - events verbs: - create - patch - update ================================================ FILE: src/k8/neuron-problem-detector/k8s-neuron-problem-detector-and-recovery.yml ================================================ apiVersion: apps/v1 kind: DaemonSet metadata: name: node-problem-detector namespace: neuron-healthcheck-system labels: app: node-problem-detector spec: selector: matchLabels: app: node-problem-detector template: metadata: labels: app: node-problem-detector spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "node.kubernetes.io/instance-type" operator: In values: - inf1.xlarge - inf1.2xlarge - inf1.6xlarge - inf1.24xlarge - inf2.xlarge - inf2.8xlarge - inf2.24xlarge - inf2.48xlarge - trn1.2xlarge - trn1.32xlarge - trn1n.32xlarge containers: - name: node-problem-detector command: - /node-problem-detector - --logtostderr - --config.system-log-monitor=/config/kernel-monitor.json image: registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19 ports: - containerPort: 20257 resources: limits: cpu: 10m memory: 80Mi requests: cpu: 10m memory: 80Mi imagePullPolicy: Always securityContext: privileged: true env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName volumeMounts: - name: log mountPath: /var/log readOnly: true - name: kmsg mountPath: /dev/kmsg readOnly: true # Make sure node problem detector is in the same timezone # with the host. - name: localtime mountPath: /etc/localtime readOnly: true - name: config mountPath: /config readOnly: true - name: node-recovery command: - /bin/sh - -c - "sleep 60 && /scripts/check-health.py" image: public.ecr.aws/neuron/neuron-node-recovery:1.3.0 resources: limits: cpu: 10m memory: 150Mi requests: cpu: 10m memory: 150Mi imagePullPolicy: Always env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: ENABLE_RECOVERY value: "false" serviceAccountName: node-problem-detector volumes: - name: log # Config `log` to your system log directory hostPath: path: /var/log/ - name: kmsg hostPath: path: /dev/kmsg - name: localtime hostPath: path: /etc/localtime - name: config configMap: name: node-problem-detector-config defaultMode: 0555 items: - key: kernel-monitor.json path: kernel-monitor.json tolerations: - effect: NoSchedule operator: Exists - effect: NoExecute operator: Exists ================================================ FILE: src/libnrt/README.md ================================================ # NeuronX Runtime Header Files ## Overview The NeuronX Runtime Library provides C APIs for initializing the Neuron hardware, loading models and input data, executing iterations on loaded models, and retrieving output data. This library is provided to customers via a shared object (libnrt.so) that is installed through the `aws-neuronx-runtime-lib` package. This directory exposes the header files that customers can use to write custom applications utilizing the NeuronX Runtime Library. ## File Location These header files will be installed to the user's system under `/opt/aws/neuron/include` when installing the `aws-neuronx-runtime-lib` package and the `libnrt.so` library is installed under the `/opt/aws/neuron/lib` directory. ## Experimental Headers The following files contain experimental function declarations and are subject to change in future releases. - nrt_async.h - nrt_async_sendrecv.h - nrt_experimental.h ## Documentation https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-api-guide.html ================================================ FILE: src/libnrt/include/ndl/ndl.h ================================================ /* * Copyright 2020-2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include #include #include "neuron_driver_shared.h" #ifdef __cplusplus extern "C" { #endif typedef enum NQ_DEV_TYPE { NQ_DEV_TYPE_NEURON_CORE = 0, NQ_DEV_TYPE_TOPSP, NQ_DEV_TYPE_MAX, } ndl_nq_dev_t; #define NEURON_MAX_DEVICES MAX_NEURON_DEVICE_COUNT #define NEURON_DEVICE_PREFIX "/dev/neuron" #define NEURON_DRIVER_LIBRARY_MAJOR 1 #define NEURON_DRIVER_LIB_MINOR 0 #define MAX_HBM_PER_DEVICE 4 #define DRIVER_VERSION_MAX_SIZE 32 typedef struct ndl_version_info { uint16_t driver_major_version; // Major version of the driver uint16_t driver_minor_version; // Minor version of the driver char driver_full_version[DRIVER_VERSION_MAX_SIZE]; uint16_t library_major_version; // Major version of the library uint16_t library_minor_version; // Minor version of the library } ndl_version_info_t; /** Get version info. * * @param[out] version - Buffer to store the version information. * * @return 0 on success. * -1 on failed to read driver version. */ int ndl_get_version(ndl_version_info_t *version); /** Gets the range of compatible version * * @param min_compatible_version_min [out] - Lowest supported version * @param max_compatible_version_max [out] - Highest supported version * * @return 0 on success. * */ int ndl_get_compatible_version(uint32_t *min_compatible_version, uint32_t *max_compatible_version); typedef struct ndl_device_init_param { bool initialize_device; // if set to true, device is initialized as part of open() int num_dram_regions; // splits device DRAMs into given number of regions. bool map_hbm; // if set to true, HBM will be mapped during device open } ndl_device_init_param_t; #define NDL_COPY_BUF_SIZE (2ull * 1024 * 1024) typedef struct ndl_copy_buf { uint64_t mem_handle; void *mmap_va; pthread_mutex_t lock; } ndl_copy_buf_t; // Maximum neuron devices supported on a system. #define MAX_NEURON_DEVICE_COUNT 64 // Maximum neuron cores per device #define MAX_NC_PER_DEVICE 8 typedef struct ndl_device { uint8_t device_index; // Device Index uint8_t device_type; // Device Type (V1, V2..) uint16_t device_revision; // Revision id of board uint8_t connected_device_count; // Number of devices connected to this device uint8_t connected_devices[MAX_NEURON_DEVICE_COUNT]; // Array of devices(IDs) connected to this device uint64_t csr_base[2]; // BAR0/BAR2 base uint64_t csr_size[2]; // BAR0/BAR2 size ndl_copy_buf_t cpy_bufs[MAX_NC_PER_DEVICE]; // MMAP buffers for efficiently copying data in/out of the device void *hbm_va[MAX_HBM_PER_DEVICE]; // HBM virtual addresses size_t hbm_size; // HBM sizes uint32_t hbm_va_cnt; // Number of active HBM regions uint32_t shift_hbm_size; // Cached number of bits to shift uint64_t hbm_offset[MAX_HBM_PER_DEVICE]; // HBM offsets uint8_t context[]; // Library reserved fields } ndl_device_t; typedef struct ndl_device_nc { ndl_device_t *device; uint32_t nc_id; } ndl_device_nc_t; typedef struct ndl_device_context { int nd_fd; } ndl_device_context_t; typedef struct ndl_mem_info { ndl_device_t *device; __u64 driver_handle; uint64_t pa; uint64_t mmap_offset; uint64_t size; uint32_t align; void *mmap_va; uint32_t host_memory; int nc_id; } ndl_mem_info_t; typedef struct ndl_notification_context { union { uint8_t nc_id; // neuron core index uint8_t nq_dev_id; // notification device index }; ndl_nq_dev_t nq_dev_type; // notification device type uint8_t nq_type; // type of the notification queue uint8_t engine_index; // engine index uint32_t size; // size of the NQ in bytes int fd; // file descriptor of /dev/ndX/ncY/nqZ uint64_t offset; //mmap offset in the nd uint64_t mem_handle; void *va; // mmapped address ndl_mem_info_t *mem_info; // NQ memory info } ndl_notification_context_t; /** * Called by app the first time when it accesses the device. * * @param[in] device_index - device index that is to be opened * @param[in] num_tdram_regions - number of tdram regions * @param[out] device - device specific information * * @return 0 on success. * -1 on failure */ int ndl_open_device(int device_index, ndl_device_init_param_t *params, ndl_device_t **device); /** * Called by app when it is done. After this, device cannot be accessed * * @param[in] device - Device to close. * * @return 0 on success. * -1 on failure */ int ndl_close_device(ndl_device_t *device); /** * Get all the device index * * @param[out] device_indexes - Buffer to store device indexes. * @param[in] device_indexes_size - Size of the buffer in dwords. * * @return Number of devices found. */ int ndl_available_devices(int *device_indexes, int device_indexes_size); /** Read from one or more registers. * * @param device[in] - Device handle. * @param bar[in] - BAR to read. * @param addresses[in] - Array of register addresses. * @param count[in] - Number of registers in the array. * @param buffer[out] - Buffer to store read data. * * @return 0 on success. */ int ndl_bar_read(ndl_device_t *device, uint8_t bar, uint64_t *addresses, uint32_t count, uint32_t *buffer); /** Write to one or more registers. * * @param device[in] - Device handle. * @param bar[in] - BAR to write. * @param addresses[in] - Array of register addresses. * @param count[in] - Number of registers in the array. * @param data[in] - Data to write. * * @return 0 on success. */ int ndl_bar_write(ndl_device_t *device, uint8_t bar, uint64_t *addresses, uint32_t count, uint32_t *data); /** Read hw counters from one or more addresses * * @param device[in] - Device handle. * @param addresses[in] - Array of register addresses. * @param count[in] - Number of registers in the array. * @param buffer[out] - Buffer to store read data. * * @return 0 on success. */ int ndl_read_hw_counters(ndl_device_t *device, uint64_t *addresses, uint32_t count, uint32_t *data); /** * Retrieves the cached HBM virtual address for the specified device. * * @param device[in] - Device handle. * @param hbm_idx[in] - HBM index. * @param va[out] - Resulting virtual address. * @param size[out] - Size of the HBM * * @return 0 on success, -EINVAL on failure, and -ENOENT when there are no more entries to be found. */ int ndl_get_hbm_va(ndl_device_t *device, int hbm_idx, void **va, size_t *size); /** Allocates memory. * * @param device[in] - Device to be associated with the allocation. * @param size[in] - Number of bytes to allocate. * @param host_memory[in] - If true allocate from host memory instead of using device memory. * @param dram_channel[in] - DRAM channel to use in the device memory. * @param dram_region[in] - DRAM region to use in the device memory. * @param nc_id[in] - NC ID to use in the device * @param mem_alloc_type[in]- Type of memory allocation * @param mem_handle[out] - Allocated memory handle would be stored here. * * @return 0 on success. */ int ndl_memory_alloc(ndl_device_t *device, size_t size, uint64_t align, uint32_t host_memory, uint32_t dram_channel, uint32_t dram_region, uint32_t nc_id, uint32_t mem_alloc_type, uint64_t *mem_handle); /** Given a mem handle gets it PA - HACK to be removed * @param mem_handle[in] - Memory handle * @parama pa[out] - Physical address of handle * * @return the PA */ int ndl_memory_get_pa(uint64_t mem_handle, uint64_t *pa); /** Map given m memory handle into virtual address space. * * @param mem_handle[in] - Handle to map. * @param va[out] - Resulting virtual address. * * @return 0 on success */ int ndl_memory_map(uint64_t mem_handle, void **va); /** Unmap given memory handle from virtual address space. * * @param mem_handle[in] - Handle to unmap. * * @return 0 on success */ int ndl_memory_unmap(uint64_t mem_handle); /** Frees already allocated memory. * * @param mem_handle[in] - Memory handle to be freed. * * @return 0 on success. */ int ndl_memory_free(uint64_t mem_handle); /** Copy data from buffer to mem_handle. * * @param mem_handle[in] - Handle on which data needs to be copied in. * @param buffer - Buffer from which data needs to be copied. * @param offset - Offset in the mem handle. * @param size - Size in bytes to be copied. * * @return 0 on success. */ int ndl_memory_buf_copyin(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size); /** Copy data from mem_handle to buffer. * * @param mem_handle[in] - Handle from which data needs to be copied out. * @param buffer - Buffer to which data needs to be copied. * @param offset - Offset in the mem handle. * @param size - Size in bytes to be copied. * * @return 0 on success. */ int ndl_memory_buf_copyout(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size); /** Copy data from buffer to mem_handle (zero copy, buffer is pinned and used directly). * * @param mem_handle[in] - Handle on which data needs to be copied in. * @param buffer - Buffer from which data needs to be copied. * @param offset - Offset in the mem handle. * @param size - Size in bytes to be copied. * * @return 0 on success. */ int ndl_memory_buf_zerocopyin(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size, int qid, uint32_t bar4_wr_threshold); /** Copy data from mem_handle to buffer (zero copy, buffer is pinned and used directly). * * @param mem_handle[in] - Handle from which data needs to be copied out. * @param buffer - Buffer to which data needs to be copied. * @param offset - Offset in the mem handle. * @param size - Size in bytes to be copied. * @param qid - H2T queue to use. NEURON_DMA_H2T_DEFAULT_QID is default * * @return 0 on success. */ int ndl_memory_buf_zerocopyout(uint64_t mem_handle, void *buffer, uint64_t offset, size_t size, int qid); /** Batch transfer data between host buffers and device memory. * * @param mem_handle[in] - Device memory handle * @param ops[in] - Array of batch operations * @param num_ops[in] - Number of operations in batch * @param direction[in] - Transfer direction (0=write to device, 1=read from device) * @param qid[in] - H2T queue to use (-1 for default) * * @return 0 on success. */ int ndl_memory_buf_batch_copy(neuron_memcpy_batch_t *batches, uint64_t num_batches, uint32_t direction, int qid); /** Copy data from buffer to addr in engine. * * @param device[in] - Device information. * @param nc_id [in] - Neuron core id. * @param dst [in] - Address on which data needs to be copied in. * @param buffer - Buffer from which data needs to be copied. * @param offset - Offset in the mem handle. * @param size - Size in bytes to be copied. * @param qid - H2T queue to use. NEURON_DMA_H2T_DEFAULT_QID is default * * @return 0 on success. */ int ndl_program_engine(ndl_device_t *device, uint32_t nc_id, uint64_t dst, void *buffer, uint64_t offset, size_t size); /** Memset the given memhandle with passed byte value * * @param src_mem_handle[in]- Handle which needs to be filled with byte value * @param offset - Src Offset in the mem handle. * @param value - Byte value to fill the memory with * @param size - Size in bytes to be copied. * * @return 0 on success. */ int ndl_memset(const uint64_t addr, uint64_t offset, const int value, const size_t size); /** Copy data between mem_handles. * * @param src_mem_handle[in]- Handle from which data needs to be copied out. * @param dst_mem_handle[in]- Handle from which data needs to be copied to. * @param src_offset - Src Offset in the mem handle. * @param dst_offset - Dest Offset in the mem handle. * @param size - Size in bytes to be copied. * * @return 0 on success. */ int ndl_memory_copy(uint64_t src_mem_handle, uint64_t dst_mem_handle, uint64_t src_offset, uint64_t dst_offset, size_t size); /** Copy data between mem_handles asynchronously. * * @param src_mem_handle[in] - Handle from which data needs to be copied out. * @param dst_mem_handle[in] - Handle from which data needs to be copied to. * @param src_offset - Src Offset in the mem handle. * @param dst_offset - Dest Offset in the mem handle. * @param size - Size in bytes to be copied. * @param prefetch_addr - Host destination address associate with copy out operation to prefetch * @param wait_handle [in/out] - wait_handle [in] is for prev request, [out] is handle for this request * * @return 0 on success. */ int ndl_memory_copy_as(uint64_t src_mem_handle, uint64_t dst_mem_handle, uint64_t src_offset, uint64_t dst_offset, size_t size, uint64_t prefetch_addr, int * wait_handle); /** Copy data between mem_handles. * * @param mem_handle[in] - Handle from which data for this tran (either src or dst) * @param wait_handle - wait_handle for an async dma * * @return 0 on success. */ int ndl_memory_copy_as_wait(uint64_t mem_handle, int wait_handle); /** Set the dma engine state * * @param device_index[in] - Device index. * @param eng_id[in] - Eng ID that is initialized. * @param state[in] - State that is set UDMA_NORMAL/UDMA_DISABLE etc * * @return 0 on success. */ int ndl_dma_eng_set_state(int device_index, uint32_t eng_id, uint32_t state); /** Get the dma engine state * * @param device_index[in] - Device index. * @param eng_id[in] - Engine index which status needs to be collected. * @param state[out] - Buffer to store engine state. * * @return 0 on success. */ int ndl_dma_eng_get_state(int device_index, uint32_t eng_id, struct neuron_dma_eng_state *state); /** Get DMA queue state * * @param device_index[in] - Device index. * @param eng_id [in] - DMA engine index. * @param qid [in] - DMA queue index. * @param tx [out] - Tx queue state. * @param rx [out] - Rx queue state. * * @return 0 on success. */ int ndl_dma_queue_get_state(int device_index, uint8_t eng_id, uint8_t qid, struct neuron_dma_queue_state *tx, struct neuron_dma_queue_state *rx); /** Copy DMA descriptors to userspace. * * This API needs root privilege. * * @param device_index[in] - Device index. * @param eng_id [in] - DMA engine index. * @param qid [in] - DMA queue index. * @param type [in] - Type of the queue. * @param index [in] - Start descriptor index. * @param count [in] - Number of descriptor needs to be copied. * @param buffer [out] - Buffer to store the descriptors. * * @return 0 on success. */ int ndl_dma_descriptor_copyout(int device_index, uint8_t eng_id, uint8_t qid, enum neuron_dma_queue_type type, uint32_t start_index, uint32_t count, void *buffer); /** Initialize the dma queue for a given engine * * @param device_index[in] - Device index * @param eng_id[in] - Engine for which the queue is initialized * @param qid[in] - Queue id that needs to be initialized * @param tx_desc_count[in] - number of tx desc's need to be allocated * @param rx_desc_count[in] - number of rx desc's need to be allocated * @param tx_handle[in] - TX mem handle * @param rx_handle[in] - RX mem handle * @param rxc_handle[in] - Completion mem handle * * @return 0 on success. */ int ndl_dma_queue_init(int device_index, uint32_t eng_id, uint32_t qid, uint32_t tx_desc_count, uint32_t rx_desc_count, uint64_t tx_handle, uint64_t rx_handle, uint64_t rxc_handle, uint32_t axi_port); struct ndl_queue_init { __u32 eng_id; // [in] DMA engine index __u32 qid; // [in] Queue index in the DMA engine __u32 tx_desc_count; // [in] number of tx desc's need to be allocated __u32 rx_desc_count; // [in] number of rx desc's need to be allocated __u64 tx_handle; // [in] mem handle for the tx ring __u64 rx_handle; // [in] mem handle for the rx ring __u64 rxc_handle; // [in] mem handle for the rxc ring __u32 axi_port; // [in] axi port }; #define MAX_NDL_QUEUE_INIT_BATCH 256 struct ndl_queue_init_batch { __u32 count; struct ndl_queue_init entries[MAX_NDL_QUEUE_INIT_BATCH]; }; /** Initialize a batch of dma queues * * @param device_index[in] - Device index * @param batch[in] - Batch of dma queue initialization requests * * @return 0 on success. */ int ndl_dma_queue_init_batch(int device_idx, struct ndl_queue_init_batch *batch); /** Release the dma queue for a given engine - only used in tests * * @param device_index[in] - Device index * @param eng_id[in] - Engine for which the queue is initialized * @param qid[in] - Queue id that needs to be initialized * * @return 0 on success. */ int ndl_dma_queue_release(int device_index, uint32_t eng_id, uint32_t qid); /** Starts DMA by copying the given number of descriptors or prefetch s2m * * @param device_index[in] - Device index * @param eng_id[in] - Engine for which the queue is initialized * @param qid[in] - Queue id that needs to be initialized * @param tx_desc_count[in] - number of tx desc's need to be copied, could be 0 if called for s2m prefetch * @param rx_desc_count[in] - number of rx desc's need to be copied * * @return 0 on success. */ int ndl_dma_queue_copy_start(int device_index, uint32_t eng_id, uint32_t qid, uint32_t tx_desc_count, uint32_t rx_desc_count); /** Acks the completed desc count for the eng/queue - only used in tests * * @param device_index[in] - Device index * @param eng_id[in] - Engine for which the queue is initialized * @param qid[in] - Queue id that needs to be initialized * @param count[in] - Number of desc's to ack * * @return 0 on success. */ int ndl_dma_ack_completed_desc(int device_index, uint32_t eng_id, uint32_t qid, uint32_t count); /** Copy data from buffer to mem_handle. Buffer has dma desc * * @param mem_handle[in] - Handle on which data needs to be copied in. * @param buffer[in] - Buffer from which data needs to be copied. Buffer has dma desc * @param offset[in] - Offset in the mem handle. * @param num_descs[in] - Number of descriptors to copy * @param queue_type[in] - From which queue copy descriptors. * * @return 0 on success. */ int ndl_dma_copy_descriptors(uint64_t mem_handle, void *buffer, uint64_t offset, uint32_t num_descs, enum neuron_dma_queue_type queue_type); /** Reset given NCs within a device. * * @param device_index[in] - Device to reset. * @param nc_map[in] - NCs to reset (-1 to reset entire device) * @param request_id[out] - ID for this reset request * * @return 0 on success. */ int ndl_reset_ncs(int device_index, int nc_map, uint32_t *request_id); /** Register the callback to NRT to warn/nudge users when hitting soft incompatibility * * @param callback - the call back function * @return int - 0 on success, otherwise on failure */ int ndl_register_soft_incompat_callback(void (*callback)(const char *)); /** Waits for readiness of given NCs within a device. * * @param device_index[in] - Device index. * @param request_id[in] - ID for the reset request to wait on * @param result[out] - Buffer to store the result. * If the device is ready then this would be set to 1. * * @return 0 on success. * */ int ndl_ready_ncs(int device_index, uint32_t request_id, uint8_t *result); /** Get info on all the apps that are currently using the device, caller needs to free returned info (*info) * * @param device_index[in] - Device index. * @param info[out] - Pointer to a pointer which will hold app data, needs to be deallocated by caller * @param size[out] - Number of entries in neuron_app_info * * @return 0 - on success */ int ndl_get_all_apps_info(ndl_device_t *device, struct neuron_app_info **info, size_t *count, uint16_t apps_info_flags); /** Increment a semaphore in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param semaphore_index[in] - Semaphore which needs to be incremented. * @param value[in] - Value to decrement. * * @return 0 on success */ int ndl_nc_semaphore_increment(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value); /** Decrement a semaphore in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param semaphore_index[in] - Semaphore which needs to be decremented. * @param value[in] - Value to increment. * * @return 0 on success */ int ndl_nc_semaphore_decrement(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value); /** Get semaphore value in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param semaphore_index[in] - Semaphore index. * @param value[out] - Buffer where read value would be stored. * * @return 0 on success */ int ndl_nc_semaphore_read(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t *value); /** Write given value into the semaphore in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param semaphore_index[in] - Semaphore index. * @param value[in] - Value to write. * * @return 0 on success */ int ndl_nc_semaphore_write(ndl_device_t *device, int nc_index, uint32_t semaphore_index, uint32_t value); /** Get event value in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param semaphore_index[in] - Semaphore index. * @param value[out] - Buffer where read value would be stored. * * @return 0 on success */ int ndl_nc_event_get(ndl_device_t *device, int nc_index, uint32_t event_index, uint32_t *value); /** Set a event in Neuron Core. * * @param device[in] - Device * @param nc_index[in] - Neuron Core index * @param event_index[in] - Event index. * @param value[in] - Value to write. * * @return 0 on success */ int ndl_nc_event_set(ndl_device_t *device, int nc_index, uint32_t event_index, uint32_t value); /** Configure notification queue * * Neuron device has multiple of neuron cores and TOP_SPs. If nq_dev_type is * NQ_DEV_TYPE_NEURON_CORE, nq_dev_index conveys neuron core index. In case of * NQ_DEV_TYPE_NEURON_TOPSP, nq_dev_index means TOP_SP index. * * @param device[in] - Device * @param nq_dev_id[in] - Notification device index * @param nq_dev_type[in] - Notification device type * @param nq_type[in] - Notification queue type * @param engine_index[in] - Engine index * @param size[in] - Size in bytes * @param on_host_memory[in] - If true, NQ is created on host memory * @param dram_channel - If NQ is created on device, DRAM channel to use * @param dram_region - If NQ is created on device, DRAM region to use * @param force_alloc_mem - If true, force allocate new memory (and delete already allocated memory, if any) * @param context[out] - Resulting NQ context. * * @return 0 on success. */ int ndl_notification_init(ndl_device_t *device, int nq_dev_id, ndl_nq_dev_t nq_dev_type, uint8_t nq_type, uint8_t engine_index, uint32_t size, bool on_host_memory, uint32_t dram_channel, uint32_t dram_region, uint64_t *notification_context); /** Configure notification queue with option to force re-allocate/re-size * * Neuron device has multiple of neuron cores and TOP_SPs. If nq_dev_type is * NQ_DEV_TYPE_NEURON_CORE, nq_dev_index conveys neuron core index. In case of * NQ_DEV_TYPE_NEURON_TOPSP, nq_dev_index means TOP_SP index. * * @param device[in] - Device * @param nq_dev_id[in] - Notification device index * @param nq_dev_type[in] - Notification device type * @param nq_type[in] - Notification queue type * @param engine_index[in] - Engine index * @param size[in] - Size in bytes * @param on_host_memory[in] - If true, NQ is created on host memory * @param dram_channel - If NQ is created on device, DRAM channel to use * @param dram_region - If NQ is created on device, DRAM region to use * @param force_alloc_mem - If true, force allocate new memory (and delete already allocated memory, if any) * @param context[out] - Resulting NQ context. * * @return 0 on success. */ int ndl_notification_init_with_realloc(ndl_device_t *device, int nq_dev_id, ndl_nq_dev_t nq_dev_type, uint8_t nq_type, uint8_t engine_index, uint32_t size, bool on_host_memory, uint32_t dram_channel, uint32_t dram_region, bool force_alloc_mem, uint64_t *notification_context); /** Returns mem_handle associated with the NQ * * @param notification_context[in] - Notification context * @param mem_handle[out] - Notification's memory handle would be stored here. * * @return 0 on success, 1 on failure */ int ndl_notification_get_mem_handle(uint64_t notification_context, uint64_t *mem_handle); /** Returns size associated with the NQ * * @param notification_context[in] - Notification context * @param size[out] - Notification's size would be stored here. * * @return 0 on success, 1 on failure */ int ndl_notification_get_size(uint64_t notification_context, uint32_t *size); /** Maps NQ to virtual address. * * @param notification_context[in] - Notification context. * @param va [out] - Virtual address where the mapping is done. * @return 0 on success */ int ndl_notification_map(uint64_t notification_context, void **va); /** Stops and destroys already configured notification queue. * * @param notification_context[in] - Notification context. * * @return 0 on success. */ int ndl_notification_destroy(uint64_t notification_context); /** Makes neuron ds available for use and returns a valid pointer in **data and a valid size in *size * * @param device[in] - Device * @param pid[in] - PID for this NDS (if 0 it allocates a new one) * @param data[out] - Will contain a valid pointer to the datastore * @param size[out] - Will contain a valid size for the datastore * * @return 0 on success. */ int ndl_nds_open(ndl_device_t *device, int32_t pid, void **data, size_t *size); /** Decreases ref count for the given pid * * @param device - Device * @param pid - PID owning the datastore * @param data - Pointer to datastore raw data (returned by ndl_nds_open) * @param size - Size of datastore (returned by ndl_nds_open) * * @return 0 on success. */ int ndl_nds_close(ndl_device_t *device, int32_t pid, void *data, size_t size); /** Enter inference critical section. * * @param device[in] - Device * @param nc_index[in] - Neuron core index * @param uuid[in] - UUID of the model expected to be loaded * * This function would fail if the UUID is different or PID * which loaded the UUID is different. * * @return 0 on success, -1 on failure. */ int ndl_crwl_reader_enter(ndl_device_t *device, int nc_index, struct neuron_uuid uuid); /** Exit inference critical section. * * @param device[in] - Device * @param nc_index[in] - Neuron core index * @param uuid[in] - UUID of the model expected to be loaded * * @return 0 on success, -1 on failure. */ int ndl_crwl_reader_exit(ndl_device_t *device, int nc_index, struct neuron_uuid uuid); /** Enter model load critical section. * * @param device[in] - Device * @param nc_index[in] - Neuron core index * @param uuid[in] - UUID of the model to be loaded * * @return 0 on success, -1 on failure. */ int ndl_crwl_writer_enter(ndl_device_t *device, int nc_index, struct neuron_uuid uuid); /** Exit model load critical section and enter inference critical section. * * @param device[in] - Device * @param nc_index[in] - Neuron core index * @param uuid[in] - UUID of the loaded model * * @return 0 on success, -1 on failure. */ int ndl_crwl_writer_downgrade(ndl_device_t *device, int nc_index, struct neuron_uuid uuid); /** Find given number of free NCs and mark them as used. * * @param nc_count[in] - Number of free neuron cores needed. * @param start_nc[in] - From where to start the free core search. * @param end_nc[in] - Last NC where to stop the free core search. * @param max_nc_available[out] - Maximum number of free cores available. * @param bitmap[out] - Bitmap of marked neuron core indexes. * @param size[in] - size of the bitmap in bytes * * @return 0 on success, -1 on failure. */ int ndl_crwl_nc_range_mark(uint32_t nc_count, uint32_t start_nc, uint32_t end_nc, uint32_t *max_nc_available, uint64_t *bitmap, size_t size); /** Unmark neuron cores as free. * * @param bitmap[in] - Bitmap of marked neuron core indexes. * @param size[in] - size of the bitmap in bytes * * @return 0 on success, -1 on failure. */ int ndl_crwl_nc_range_unmark(uint64_t *bitmap, size_t size); /** Gets the info for the copy buffer for copying data to/from device * * To dma data in and out of the device, app needs a host dram buffer allocated * by the driver. Allocating this every-time is expensive especially if we want * a bigger copy size. To avoid this performance penalty, applications can use * this preallocated buffer. * * @param device[in] - Device * @param nc_id[in] - nc id the copy buffer is from * @param cpy_buf[out] - Pointer to copy buffer * * @return 0 on success */ int ndl_get_copy_buf(ndl_device_t *device, uint32_t nc_id, ndl_copy_buf_t **cpy_buf); /** Set the neuron core init state * Initially the state is set to started and then app intializes the neuron core. Then * it sets the state to completed. If any other app tries to set the state to started when it * is already started then this routine will block until the init is done or timeout * * @param device[in] - Device * @param state[in] - State that will be state * @param new_state[out] - State after the set is done * * @return 0 on success, -1 on failure. */ int ndl_nc_init_set_state(ndl_device_t *device, uint32_t nc_id, uint32_t state, uint32_t *new_state); /** Gets the state of model start. If this is the first model that will be loaded in the nc. * * @param device[in] - Device * @param nc_id[in] - nc id * @param started_count[out] - number of times model started in that nc * * @return 0 on success, -1 on failure. */ int ndl_nc_model_started_count(ndl_device_t *device, uint32_t nc_id, uint64_t *started_count); /** Gets the architecture & revision of the board * * @param architecture[out] - Architecture of the board * @param revision[out] - Revision of the board * * @return 0 on success */ int ndl_get_board_info(uint32_t *architecture, uint32_t *revision); /** Gets BDF for a device - only for devices opened by the calling process - DEPRECATED don't use * * @param bus_num[out] - Bus number for this device * @param pci_slot[out] - PCI slot for this device * @param dev_func[out] - Device function for this device * * @return 0 on success */ int ndl_get_device_bdf(int device_index, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func); /** * @brief Get the anonymous file-descriptor of dma-buf associated with * a Neuron device memory region if it was registered for EFA peer direct * * @param addr[in] - Device buffer virtual address * @param size[in] - Device buffer size (in bytes) * @param fd[out] - dma-buf fd * * @return 0 on success */ int ndl_get_dmabuf_fd(uint64_t addr, uint64_t size, int* fd); /** Gets BDF for a device * * @param device_index[in] - Neuron device index * @param domain[out] - PCIe domain for the device * @param bus_num[out] - Bus number for the device * @param pci_slot[out] - PCI slot for the device * @param dev_func[out] - Device function for the device * * @return 0 on success */ int ndl_get_device_bdf_ext(int device_index, uint32_t *domain, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func); /** retrieve offset/size where to mmap around a physical address * * @param device[in] - Neuron device * @param pa[in] - physical address in device mem to retrieve mc mmap info for * @param mmap_offset[out] - mmap offset * @param mem_handle[out] - The handle for the given physical address. * Set to 0 when using backwards compatible interface with old drivers. * @param size[out] - size * */ int ndl_mem_get_mc_mmap_info(ndl_device_t *device, uint64_t pa, uint64_t *mmap_offset, uint64_t *size, uint64_t *mem_handle); /** mmap a bar region into user address * * @param device[in] - Neuron device * @param block[in] - block type containing the resource * @param block_id[in] - id of the block if is more than one block * @param resource[in] - resource the caller wants to mmap * @param va[out] - virual address of the resource * @param size[out] - size of the resource * */ int ndl_mmap_bar_region( ndl_device_t *device, enum neuron_dm_block_type block, uint32_t block_id, enum neuron_dm_resource_type resource, void ** va, uint64_t * size); /** Close all cached FDs * */ void ndl_device_cached_fd_close_all(void); /** Log an error message to kernel messages/serial console * * @param str[in] - The error message * @param size[in] - The size of the error message including null terminator * @param action[in] - Additional action to perform * * @return On success: 0 * On failure: -1 and: * * errno == EFAULT when size is too large * * errno == EBADMSG when str is not null terminated */ int ndl_printk(char *str, uint32_t size, uint32_t action); /** get the host device id for an open device (for containers) * * @param device[in] - Neuron device * @param host_device_id[out] - host device id * */ int ndl_get_host_device_id(ndl_device_t *device, uint32_t *host_device_id); /** return device id to routing id mapping table along with number of entries in the table * * @param count[in/out] - [in] size of map in entries. [out] # entries returned * @param host_did_to_rid_map[out] - map of host device id to routing ids * */ int ndl_get_host_device_id_to_rid_map(uint32_t *count, uint32_t *host_did_to_rid_map); int ndl_dump_device_allocation_info(ndl_device_t *device, uint32_t hbm_index, struct neuron_ioctl_mem_chunk_info *data, uint32_t *num_entries); /** ask the driver to dump neuron core process info * * @param nc_id[in] - [in] neuron core to dump process info for * @param filter_log_owner[in] - [in] only dump log entries for the owner pid of the neuron core * @param log_dump_limit[in] - [in] max number of log entries to dump * */ int ndl_dump_nc_pid_info(uint32_t nc_id, bool filter_log_owner, uint32_t log_dump_limit); /** write a value to entire HBM accessible to Neuron (so excludes firmware carveout) * * @param hbm_index - HBM to write to * @param init_val - value to write */ int ndl_hbm_scrub_start(ndl_device_t *device, uint32_t nc_id, uint32_t hbm_index, uint32_t axi_port, uint32_t init_val); int ndl_hbm_scrub_wait(ndl_device_t *device, uint32_t nc_id, uint32_t hbm_index); /** Gets the tpb mapping. * * @param map[out] - Location to store the mapping information * @param max_num_entries[in] - Maximum number of entries we can store in `map` * @param mapping_version[in] - Flavor of mapping to get from the driver * * @return 0 on success */ int ndl_get_logical_to_physical_nc_map(struct neuron_ioctl_nc_map *map, uint32_t max_num_entries, enum neuron_ioctl_nc_mapping_type mapping_version); /** return pod information * * @param pod_type[out] - type of pod * @param pod_sz[out] - size of the pod * */ int ndl_pod_info(uint32_t * pod_type, uint32_t * pod_sz); /** return pod election state * * @param state[out] - election state * */ int ndl_pod_election_state(uint32_t * state); /** return pod mapping information. * * @param node_id[out] - node id of the pod node. -1 if the node is not part of a configured pod * */ int ndl_pod_mapping_info(int * node_id); /** return pod status * * @param pod_id[out] - pod id. Only valid it the pod is configured as a pod * @param state[out] - state of the pod election * @param pod_type[out] - type of pod * @param pod_sz[out] - size of the pod. 0 if the node is not part of a pod * @param node_id[out] - node id of the pod node. -1 if the node is not part of a configured pod * @param mode[out] - current operating mode * @param modes_supported[out] - supported operating modes * */ int ndl_pod_status(uint8_t *pod_id, uint32_t *state, uint32_t *pod_type, uint32_t *pod_sz, int *node_id, enum neuron_ultraserver_mode *mode, uint32_t *modes_supported); /** control pod election state * * @param ctrl[in] - control request. (enum neuron_pod_ctrl_req) * @param mode[in] - requested operating mode * @param timeout[in] - timeout for control operation * @param state[out] - state of the pod election * */ int ndl_pod_ctrl(uint32_t ctrl, enum neuron_ultraserver_mode mode, uint32_t timeout, uint32_t *state); int ndl_alloc_contiguous_scratchpad(ndl_device_t *device, uint64_t size, uint32_t hbm_index, uint32_t nc_id, uint64_t *mem_handle); /** Similar to ndl_memory_map - only difference is that a contiguous scratchpad var may span multiple contiguous memchunks. So size of memory mapping is different from just the size of the first contiguous memchunk. * * @param mem_handle[in] - Handle to map. * @param va[out] - Resulting virtual address. * @param size[in] - Size to map * * @return 0 on success */ int ndl_memory_map_contiguous_scratchpad(uint64_t mem_handle, void **va, uint64_t size); /** Set performance profile * * @param device[in] - Device handle. * @param profile[in] - Performance profile to set. * * @return 0 on success. */ int ndl_set_performance_profile(ndl_device_t *device, uint32_t profile); /** Enable or disable throttling notifications * * @param device[in] - Device handle. * @param enable[in] - true to enable, false to disable. * * @return 0 on success. */ int ndl_enable_throttling_notifications(ndl_device_t *device, bool enable); bool ndl_feature_supported(int nd_fd, uint64_t feature); /** dynamically allocate h2t queues (rings) * * @param device[in] - Neuron device * @param nc_id[in] - neuron core to allocate h2t queues for * @param copy_queue_cnt[in] - number of h2t copy queues to allocate * @param service_queue_cnt[in] - number of service queues to allocate * @param copy_queue_bmap[out] - bitmap of the allocated copy queues * @param servic_equeue_bmap[out] - bitmap of the allocated service queues * @param copy_default_queue[out] - default h2t copy queue * */ int ndl_h2t_dma_queue_alloc(ndl_device_t *device, uint32_t nc_id, uint32_t copy_queue_cnt, uint32_t service_queue_cnt, uint32_t *copy_queue_bmap, uint32_t *service_queue_bmap, uint32_t *copy_default_queue); /** free dynamically allocated h2t queues * * @param device[in] - Neuron device * @param nc_id[in] - [in] neuron core to free queues for * @param queue_bmap[in] - [in] bitmap of queues to free * */ int ndl_h2t_dma_queue_free(ndl_device_t *device, uint32_t nc_id, uint32_t queue_bmap); /** control metrics posting behavior * * @param device[in] - Neuron device * @param mode[in] - how to modify posting behavior (enable or disable periodic posting) */ int ndl_metrics_ctrl(ndl_device_t *device, enum neuron_metrics_mode mode); /** get Neuron device and HBM index pointed by VA * * @param va[in] - VA of Neuron memory * @param device_index[out] - Neuron device * @param hbm_index[out] - HBM index */ int ndl_get_va_placement(const void *va, int *device_index, int *hbm_index); /** * arbitrary size bitmap support * */ #define NBM_NR_BITS(t) (sizeof(t)*8) #define NBM_NR_ENT(nr,t) (((nr)+NBM_NR_BITS(t)-1) / NBM_NR_BITS(t)) static inline uint32_t nbitmap_test_bit(uint32_t nr, uint64_t *addr) { return (addr[nr/NBM_NR_BITS(*addr)] & (1ull << (nr % NBM_NR_BITS(*addr)))) != 0ull; } static inline void nbitmap_set_bit(uint32_t nr, uint64_t *addr) { addr[nr/NBM_NR_BITS(*addr)] |= (1ull << (nr % NBM_NR_BITS(*addr))); } static inline uint32_t nbitmap_ffs1(uint32_t nr, uint64_t *addr) { int i; for (i=0; i < NBM_NR_ENT(nr, *addr); i++) { uint32_t x = __builtin_ffsl(addr[i]); if (x) return i * NBM_NR_BITS(*addr) + x; } return 0; } static inline uint32_t nbitmap_popcount(uint32_t nr, uint64_t *addr) { int i; uint32_t cnt = 0; for (i=0; i < NBM_NR_ENT(nr, *addr); i++) { cnt += __builtin_popcountll(addr[i]); } return cnt; } static inline void nbitmap_clr_bit(uint32_t nr, uint64_t *addr) { addr[nr/NBM_NR_BITS(*addr)] &= ~(1ull << (nr % NBM_NR_BITS(*addr))); } #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/ndl/neuron_driver_shared.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #ifndef NEURON_DRIVER_SHARED_H #define NEURON_DRIVER_SHARED_H #include #include "neuron_driver_shared_tensor_batch_op.h" enum neuron_driver_feature_flag { NEURON_DRIVER_FEATURE_DMABUF = 1ull << 0, NEURON_DRIVER_FEATURE_ASYNC_DMA = 1ull << 1, NEURON_DRIVER_FEATURE_BATCH_DMAQ_INIT = 1ull << 2, NEURON_DRIVER_FEATURE_BIG_CORE_MAPS = 1ull << 3, NEURON_DRIVER_FEATURE_MEM_ALLOC_TYPE = 1ull << 4, NEURON_DRIVER_FEATURE_HBM_SCRUB = 1ull << 5, NEURON_DRIVER_FEATURE_MEM_ALLOC64 = 1ull << 6, NEURON_DRIVER_FEATURE_CONTIGUOUS_SCRATCHPAD = 1ull << 7, NEURON_DRIVER_FEATURE_ZEROCOPY = 1ull << 8, }; // FIXME this should be more generic - like node type. enum { NEURON_POD_TYPE_NONE = 0, NEURON_POD_TYPE_P2P, NEURON_POD_TYPE_SWITCH }; enum { NEURON_POD_E_STATE_NOT_STARTED= 0, NEURON_POD_E_STATE_IN_PROGRESS, NEURON_POD_E_STATE_ULTRASERVER, NEURON_POD_E_STATE_FAILED, // TODO we currently don't discriminate between failed and single node (todo for diagnostic/debug purposes) NEURON_POD_E_STATE_SINGLE_NODE, }; enum neuron_pod_ctrl_req { NEURON_NPE_POD_CTRL_REQ_POD = 0, // request pod state to pod (on-demand election request) NEURON_NPE_POD_CTRL_REQ_SINGLE_NODE = 1, // request pod state to single node NEURON_NPE_POD_CTRL_REQ_KILL = 2, // request to kill the election NEURON_NPE_POD_CTRL_SET_MODE = 3, // request to ultraserver mode }; enum neuron_ultraserver_mode { NEURON_ULTRASERVER_MODE_UNSET = 0, // no configuration set NEURON_ULTRASERVER_MODE_X4 = 1, // 4 node US configuration NEURON_ULTRASERVER_MODE_X2H = 2, // 2 node US configuration using horizontal links NEURON_ULTRASERVER_MODE_X2V = 3, // 2 node US configuration using vertical links NEURON_ULTRASERVER_MODE_X1 = 4, // 1 node US configuration (standalone) }; enum neuron_metrics_mode { NEURON_METRICS_MODE_PERIODIC_ENABLE = 0, // enable periodic posting NEURON_METRICS_MODE_PERIODIC_DISABLE = 1, // disable periodic posting }; #define NEURON_NC_MAP_DEVICE (0xffffffff) enum neuron_dma_queue_type { NEURON_DMA_QUEUE_TYPE_TX = 0, // transmit queue NEURON_DMA_QUEUE_TYPE_RX, // receive queue NEURON_DMA_QUEUE_TYPE_COMPLETION, // completion queue }; enum neuron_cinit_state { NEURON_CINIT_STATE_STARTED = 1, // Core Init is initiated NEURON_CINIT_STATE_COMPLETED, // Core Init is completed successfully NEURON_CINIT_STATE_INVALID // Core Init is not valid }; struct neuron_dma_eng_state { __u32 revision_id; // revision id __u32 max_queues; // maximum queues supported __u32 num_queues; // number of queues configured __u32 tx_state; // Tx statue __u32 rx_state; // Rx state }; struct neuron_dma_queue_state { __u32 hw_status; // hardware status __u32 sw_status; // software status __u64 base_addr; // base address of the queue __u32 length; // size of the queue __u32 head_pointer; // hardware pointer index __u32 tail_pointer; // software pointer index __u64 completion_base_addr; // completion queue base address __u32 completion_head; // completion head }; enum neuron_dma_h2t_ctx_handle_type { NEURON_DMA_H2T_CTX_HANDLE_NONE = -1, // no handle - used as prev handle to start an async dma NEURON_DMA_H2T_CTX_HANDLE_SYNC = 0, // handle for doing synchronous DMA NEURON_DMA_H2T_CTX_HANDLE_ASYNC1 = 1, // first of two async handles NEURON_DMA_H2T_CTX_HANDLE_ASYNC2 = 2, // second of two async handles NEURON_DMA_H2T_CTX_HANDLE_CNT = 3 // number of dma }; /* * H2T DMA Default Queue id */ #define NEURON_DMA_H2T_DEFAULT_QID (-1) /* * NOTE: In runtime version 5, this enum was passed in as a bool instead - * true if top_sp and false if NC. Match the enum values to the bool to * maintain compatibility with older runtime. Do not change these values * until the min compatibility version is updated to >=6. */ enum NQ_DEVICE_TYPE { NQ_DEVICE_TYPE_NEURON_CORE = 0, NQ_DEVICE_TYPE_TOPSP, NQ_DEVICE_TYPE_MAX }; enum NQ_TYPE { NQ_TYPE_TRACE = 0, /**< Implicit notifications generated during execution. */ NQ_TYPE_NOTIFY, /**< Explicit notifications generated by NOTIFY instruction */ NQ_TYPE_EVENT, /**< Notifications triggered by event set/clear operations. */ NQ_TYPE_ERROR, /**< Notifications triggered by an error condition. */ NQ_TYPE_TRACE_DMA, /**< Implicit notifications generated by DMA transfers.*/ NQ_TYPE_THROTTLE, /**< Notifications triggered by throttling activity. */ NQ_TYPE_MAX }; /** * memory mapping enums for selecting what bar0 resources to map. * Bar0 mmapping is restricted to a limited set of regions. * Resources are selected by block type, block id and resource within the block. * TPB 1 State buffer, for example - where type is TPB, block id is 1 and * resource is state buffer. * NEURON_DM_RESOURCE_ALL resource mapping is restricted to read only. * */ enum neuron_dm_block_type { NEURON_DM_BLOCK_INVALID = -1, // invalid - tag last entry in the table NEURON_DM_BLOCK_TPB = 0, NEURON_DM_BLOCK_TOPSP = 1, NEURON_DM_BLOCK_HBM = 2 }; enum neuron_dm_resource_type { NEURON_DM_RESOURCE_SEMAPHORE = 0, // resource to mmap is semaphore region NEURON_DM_RESOURCE_ALL = 1, // resource to mmap is the entire block (read only). Only available for TOPSP NEURON_DM_RESOURCE_SBUF = 2, // resource to mmap is state buffer NEURON_DM_RESOURCE_DMEM = 3 // resource to mmap is device memory }; struct neuron_uuid { __u8 value[32]; }; #define NEURON_MAX_PROCESS_PER_DEVICE 16 // 2 per core (arbitrary but needs to small number for fast lookup) #define APP_INFO_PID_NC_LOCK_INFO (1) #define APP_INFO_PID_MEM_USAGE (1 << 1) #define APP_INFO_ALL (0xF) #define APP_INFO_MAX_MODELS_PER_DEVICE (4) #define NDS_INVALID_ID (-1) struct neuron_app_info { __s32 pid; // PID of this app __u8 nc_lock_map; // NCs which are locked by it (one bit set for each locked NC) struct neuron_uuid uuid_data[APP_INFO_MAX_MODELS_PER_DEVICE]; // UUIDs running for this app for each neuroncore size_t host_mem_size; // Amount of host memory used by this PID size_t device_mem_size; // Amount of device memory used by this PID }; typedef union nmetric_version { struct { __u64 build_num : 32; __u64 minor_ver : 8; __u64 major_ver : 8; __u64 reserved : 16; }; __u64 all; } nmetric_version_t; struct neuron_ioctl_mem_chunk_info { __u64 pa; __u64 size; __u32 mem_type; }; // Max number of entries this version of the driver // will ever give back to the user #define NEURON_NC_MAP_MAX_ENTRIES 128 enum neuron_ioctl_nc_mapping_type { NEURON_IOCTL_NC_MAPPING_TYPE_V0 = 0, // seng swap mapping }; struct neuron_ioctl_nc_map_entry { __u32 device_id; __u32 device_nc_idx; }; struct neuron_ioctl_nc_map { __u32 num_entries; struct neuron_ioctl_nc_map_entry mappings[]; }; /* A batch of copy operations */ typedef struct neuron_memcpy_batch { __u64 mem_handle; // [in] Source or Destination memory handle from/to data needs to be copied. __u64 mem_handle_offset; // [in] Memory offset of the memory handle const nrt_tensor_batch_op_t *ops_ptr; // [in] Pointer to array of operations __u32 num_ops; // [in] Number of neuron_memcpy_op operations. __u16 bar4_wr_threshold; // [in] Threshold below which we will use bar4 direct write vs. DMA. Subject to driver limits. __u16 flags; // [in] TBD. void *context; // [in] TBD. opaque context pointer passed back in completion queue } neuron_memcpy_batch_t; /* * Memory allocation categories for sysfs counters */ typedef enum { NEURON_MEMALLOC_TYPE_UNKNOWN_HOST, // only for old runtimes, do not use elsewhere NEURON_MEMALLOC_TYPE_CODE_HOST, NEURON_MEMALLOC_TYPE_TENSORS_HOST, NEURON_MEMALLOC_TYPE_CONSTANTS_HOST, NEURON_MEMALLOC_TYPE_MISC_HOST, NEURON_MEMALLOC_TYPE_NCDEV_HOST, NEURON_MEMALLOC_TYPE_NOTIFICATION_HOST, NEURON_MEMALLOC_TYPE_UNKNOWN_DEVICE, // only for old runtimes, do not use elsewhere NEURON_MEMALLOC_TYPE_CODE_DEVICE, NEURON_MEMALLOC_TYPE_TENSORS_DEVICE, NEURON_MEMALLOC_TYPE_CONSTANTS_DEVICE, NEURON_MEMALLOC_TYPE_SCRATCHPAD_DEVICE, NEURON_MEMALLOC_TYPE_MISC_DEVICE, NEURON_MEMALLOC_TYPE_NCDEV_DEVICE, NEURON_MEMALLOC_TYPE_COLLECTIVES_DEVICE, NEURON_MEMALLOC_TYPE_SCRATCHPAD_NONSHARED_DEVICE, NEURON_MEMALLOC_TYPE_NOTIFICATION_DEVICE, NEURON_MEMALLOC_TYPE_DMA_RINGS_HOST, NEURON_MEMALLOC_TYPE_DMA_RINGS_DEVICE, NEURON_MEMALLOC_TYPE_CONTIGUOUS_SCRATCHPAD_DEVICE, // uses same sysfs counter as NEURON_MEMALLOC_TYPE_SCRATCHPAD_DEVICE NEURON_MEMALLOC_TYPE_MAX } mem_alloc_category_t; /* * NDS stats * Note: * To add a new counter type inside the enum, * 1. you need to manually decrease NDS_ND_COUNTER_RESERVED or NDS_NC_COUNTER_RESERVED by 1 * 2. you need to update NDS_ND_COUNTER_COUNT or NDS_NC_COUNTER_COUNT * To prevent compatability issues, you need to always append the new counter type to the end of the enum */ #define NDS_ND_COUNTER_RESERVED 18 // Device counter types enum { NDS_ND_COUNTER_RUNTIME_VERSION, NDS_ND_COUNTER_FRAMEWORK_VERSION, NDS_ND_COUNTER_FAL_VERSION, NDS_ND_COUNTER_FEATURE_BITMAP, NDS_ND_COUNTER_MIN_NEFF_VERSION, NDS_ND_COUNTER_MAX_NEFF_VERSION, // memory usage counters NDS_ND_COUNTER_MEM_USAGE_CODE_HOST, NDS_ND_COUNTER_MEM_USAGE_TENSORS_HOST, NDS_ND_COUNTER_MEM_USAGE_CONSTANTS_HOST, NDS_ND_COUNTER_MEM_USAGE_SCRATCHPAD_HOST, NDS_ND_COUNTER_MEM_USAGE_MISC_HOST, NDS_ND_COUNTER_DYNAMIC_SYSFS_METRIC_BITMAP, NDS_ND_COUNTER_DEVICE_CLUSTER_ID, NDS_ND_COUNTER_COUNT = NDS_ND_COUNTER_DEVICE_CLUSTER_ID + NDS_ND_COUNTER_RESERVED + 1 }; #define NDS_NC_COUNTER_RESERVED 0 // Neuroncore counter types enum { NDS_NC_COUNTER_TIME_IN_USE = 0, NDS_NC_COUNTER_INFER_COMPLETED, NDS_NC_COUNTER_INFER_COMPLETED_WITH_ERR, NDS_NC_COUNTER_INFER_COMPLETED_WITH_NUM_ERR, NDS_NC_COUNTER_INFER_TIMED_OUT, NDS_NC_COUNTER_INFER_INCORRECT_INPUT, NDS_NC_COUNTER_INFER_FAILED_TO_QUEUE, // these must be in this specifc order // runtime assumes these are offset by // error code NDS_NC_COUNTER_ERR_GENERIC, NDS_NC_COUNTER_ERR_NUMERICAL, NDS_NC_COUNTER_ERR_MODEL, NDS_NC_COUNTER_ERR_TRANSIENT, NDS_NC_COUNTER_ERR_HW, NDS_NC_COUNTER_ERR_RT, NDS_NC_COUNTER_LATENCY_DEVICE, NDS_NC_COUNTER_LATENCY_TOTAL, NDS_NC_COUNTER_NC_TIME, // these are new counters // these shall be placed at the // end so there offsets are always // greater than old counters // This will ensure // new runtime + old driver will // write to reserved setions and not // break anything NDS_NC_COUNTER_GENERIC_FAIL, NDS_NC_COUNTER_ERR_RESOURCE, NDS_NC_COUNTER_ERR_RESOURCE_NC, NDS_NC_COUNTER_ERR_INVALID, NDS_NC_COUNTER_ERR_UNSUPPORTED_NEFF_VERSION, NDS_NC_COUNTER_CC_TIME, NDS_NC_COUNTER_MEM_USAGE_CODE_DEVICE, NDS_NC_COUNTER_MEM_USAGE_TENSORS_DEVICE, NDS_NC_COUNTER_MEM_USAGE_CONSTANTS_DEVICE, NDS_NC_COUNTER_MEM_USAGE_SCRATCHPAD_DEVICE, NDS_NC_COUNTER_MEM_USAGE_MISC_DEVICE, NDS_NC_COUNTER_MODEL_LOAD_COUNT, NDS_NC_COUNTER_INFERENCE_COUNT, NDS_NC_COUNTER_MAC_COUNT, NDS_NC_COUNTER_OOB, NDS_NC_COUNTER_COUNT = NDS_NC_COUNTER_OOB + NDS_NC_COUNTER_RESERVED + 1 }; #define NDS_MAX_NEURONCORE_COUNT (4) #define NDS_EXT_MAX_NEURONCORE_COUNT (12) // Additional NC storage // | NDS_EXT_NC_COUNTER_COUNT | ... | NDS_EXT_NC_COUNTER_COUNT | (x NDS_MAX_NEURONCORE_COUNT) - this will only store the 'overflow' from the original counters // | NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_COUNT | ... (x NDS_EXT_MAX_NEURONCORE_COUNT) - this will store complete data for additional NCs (up to a max of 16) #define NDS_EXT_NC_COUNTER_ADDED_RESERVED 54 // Index of NC counter extensions start at NDS_NC_COUNTER_COUNT not at 0 enum { NDS_EXT_NC_COUNTER_HW_ERR_COLLECTIVES = NDS_NC_COUNTER_COUNT, NDS_EXT_NC_COUNTER_HW_ERR_HBM_UE, NDS_EXT_NC_COUNTER_HW_ERR_NC_UE, NDS_EXT_NC_COUNTER_HW_ERR_DMA_ABORT, NDS_EXT_NC_COUNTER_ERR_SW_NQ_OVERFLOW, NDS_EXT_NC_COUNTER_ERR_SW_SEMAPHORE_ERROR, NDS_EXT_NC_COUNTER_ERR_SW_EVENT_ERROR, NDS_EXT_NC_COUNTER_ERR_SW_PSUM_COLLISION, NDS_EXT_NC_COUNTER_ERR_SW_SEQUENCER_FATAL, NDS_EXT_NC_COUNTER_HW_ERR_REPAIRABLE_HBM_UE, NDS_EXT_NC_COUNTER_LAST, NDS_EXT_NC_COUNTER_COUNT = NDS_EXT_NC_COUNTER_LAST - NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_ADDED_RESERVED }; #define NDS_TOTAL_NC_COUNTER_COUNT (NDS_NC_COUNTER_COUNT + NDS_EXT_NC_COUNTER_COUNT) // 31 original + 64 extended = 95 counters typedef struct nds_header { char signature[4]; // Fixed signature: 'n', 'd', 's', 0 int version; // Version of the datastore's format } nds_header_t; /* -------------------------------------------- * NDS shared data offsets * -------------------------------------------- */ #define NDS_HEADER_START (0) #define NDS_HEADER_SIZE (sizeof(nds_header_t)) #define NDS_ND_COUNTERS_START (NDS_HEADER_START + NDS_HEADER_SIZE) #define NDS_ND_COUNTERS_SIZE (NDS_ND_COUNTER_COUNT * sizeof(uint64_t)) #define NDS_ND_COUNTERS(base_addr) ((uint64_t *)(base_addr + NDS_ND_COUNTERS_START)) // original NC counter section #define NDS_NEURONCORE_COUNTERS_COUNT (NDS_NC_COUNTER_COUNT) #define NDS_NEURONCORE_COUNTERS_START (NDS_ND_COUNTERS_START + NDS_ND_COUNTERS_SIZE) #define NDS_NEURONCORE_COUNTERS_SIZE (NDS_NEURONCORE_COUNTERS_COUNT * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t)) #define NDS_NEURONCORE_COUNTERS(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_NEURONCORE_COUNTERS_START) + (nc_index * NDS_NEURONCORE_COUNTERS_COUNT)) // additional NC counter section at the end of all existing structures in the datastore (i.e. after NDS_PROCESS_EXT_INFO) // NDS_PROCESS_EXT_INFO_START + NDS_PROCESS_EXT_INFO_SIZE = 44588 (hardcoded because it's easier than to move all the structs here and sizeof them) #define NDS_EXT_NC_COUNTER_COUNT_OLD (65) #define NDS_TOTAL_NC_COUNTER_COUNT_OLD (96) #define NDS_EXT_NEURONCORE_COUNTERS_SIZE_OLD (NDS_EXT_NC_COUNTER_COUNT_OLD * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t)) #define NDS_EXT_NEURONCORE_NC_DATA_SIZE_OLD (NDS_TOTAL_NC_COUNTER_COUNT_OLD * NDS_EXT_MAX_NEURONCORE_COUNT * sizeof(uint64_t)) #define NDS_EXT_SECTION_SIZE_OLD (NDS_EXT_NEURONCORE_COUNTERS_SIZE_OLD + NDS_EXT_NEURONCORE_NC_DATA_SIZE_OLD) #define NDS_EXT_OFFSET_OLD (44588) #define NDS_EXT_ALIGNMENT (64) #define NDS_ALIGN(v) ((v) + (-(v) & (NDS_EXT_ALIGNMENT - 1))) #define NDS_EXT_OFFSET (NDS_ALIGN(NDS_EXT_OFFSET_OLD + NDS_EXT_SECTION_SIZE_OLD)) #define NDS_EXT_NEURONCORE_COUNTERS_COUNT (NDS_EXT_NC_COUNTER_COUNT) // number of extended counters #define NDS_EXT_NEURONCORE_COUNTERS_START (NDS_EXT_OFFSET) #define NDS_EXT_NEURONCORE_COUNTERS_SIZE (NDS_EXT_NC_COUNTER_COUNT * NDS_MAX_NEURONCORE_COUNT * sizeof(uint64_t)) #define NDS_EXT_NEURONCORE_COUNTERS(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_EXT_NEURONCORE_COUNTERS_START) + (nc_index * NDS_EXT_NC_COUNTER_COUNT)) // additional NC data for extra Neuron Cores (12 extra sets which include all 95 counters + 1 for padding) #define NDS_EXT_NEURONCORE_NC_DATA_PADDING (1) // 1 added as padding for 64 byte alignment per NC #define NDS_EXT_NEURONCORE_NC_DATA_COUNT (NDS_TOTAL_NC_COUNTER_COUNT + NDS_EXT_NEURONCORE_NC_DATA_PADDING) // full set of counters (base + extended) + padding #define NDS_EXT_NEURONCORE_NC_DATA_START (NDS_ALIGN(NDS_EXT_NEURONCORE_COUNTERS_START + NDS_EXT_NEURONCORE_COUNTERS_SIZE)) #define NDS_EXT_NEURONCORE_NC_DATA_SIZE (NDS_EXT_MAX_NEURONCORE_COUNT * NDS_EXT_NEURONCORE_NC_DATA_COUNT * sizeof(uint64_t)) #define NDS_EXT_NEURONCORE_NC_DATA(base_addr, nc_index) ((uint64_t *)(base_addr + NDS_EXT_NEURONCORE_NC_DATA_START) + (nc_index * NDS_EXT_NEURONCORE_NC_DATA_COUNT)) #endif // NEURON_DRIVER_SHARED_H ================================================ FILE: src/libnrt/include/ndl/neuron_driver_shared_tensor_batch_op.h ================================================ /* * Shared tensor batch operation between runtime and driver. */ #ifndef NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H #define NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H #ifdef __KERNEL__ #include typedef __u64 nrt_tensor_batch_offset_t; typedef __u64 nrt_tensor_batch_size_t; #else #include typedef uint64_t nrt_tensor_batch_offset_t; typedef uint64_t nrt_tensor_batch_size_t; #endif typedef struct nrt_tensor_batch_op { nrt_tensor_batch_offset_t offset; nrt_tensor_batch_size_t size; void *buffer; } nrt_tensor_batch_op_t; #endif // NEURON_DRIVER_SHARED_TENSOR_BATCH_OP_H ================================================ FILE: src/libnrt/include/nrt/ndebug_stream.h ================================================ /* * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved */ /** * Overview: * The `ndebug_stream` APIs provide applications a way to consume debug events from the runtime (see * `ndebug_stream_event_type_t` for the different event types). These debug events are emitted by the * runtime per Logical Neuron Core and can be used by applications to get information on events that * occured on the device (ie prints, breakpoints, etc.). * * Connecting, polling, and consuming: * Applications that want to consume debug events will first need to connect to a Logical Neuron Core's debug stream via a call to * `nrt_debug_client_connect`. Once a client is connected to a core's debug stream, the runtime will will push debug events emitted * by the Logical Neuron Core to the stream for clients to consume. To be notified of emitted debug events, clients can utilize the * polling APIs provided by the Linux kernel. The `stream_fd` handle obtained from the `nrt_debug_client_connect` is a typical Linux * file descriptor and can be passed into any Linux polling API. It is important to note though, that while the `stream_fd` is pollable, * all other non-polling related functionality must go through the provided `nrt_debug_client*` APIs. For example, the stream contents * can only be accessed from the `nrt_debug_client_read*` API(s) and any other methods of accessing the stream data leads undefined/undesireable * behavior. * * Closing a Connection: * Once a connection is not needed anymore, clients can close the connection using the `nrt_debug_client_connect_close` API. * * Events: * Events consist of a header describing the payload type, and a payload representing the contents of the event. Events can be consumed by * clients via the `nrt_debug_client_read*` API(s). * * Notes: * * These APIs do not allow for interprocess communication. Debug events are only pushed to the process that owns the Logical Neuron Core. * * These APIs do not provide thread safety for multiple threads accessing the SAME stream (thread safety for different streams is guarenteed). * * There can only be one outstanding connection per stream. Any attempts to initialize multiple connectiongs will result in an error. * * Events are only emitted AFTER a client connects to a Logical Neuron Core's stream. Any event that would have been emitted before connectioning * to the stream is dropped. * * Events will be dropped if the number of unconsumed events in a stream exceeds the stream's buffer size. Clients must consume events fast * enough to prevent dropped events. Additionally, Clients can configure the stream's buffer size via the `NEURON_RT_DEBUG_STREAM_BUFFER_SIZE` * environment variable. The buffer size currently defaults to 64K debug events. */ #pragma once #include #include #include #include #ifdef __cplusplus extern "C" { #endif typedef enum ndebug_stream_event_type { NDEBUG_STREAM_EVENT_TYPE_INVALID = 0, NDEBUG_STREAM_EVENT_TYPE_DEBUG_TENSOR_READ = 1, } ndebug_stream_event_type_t; typedef struct ndebug_stream_event_header { uint64_t data_size; uint32_t type; char reserved[52]; } ndebug_stream_event_header_t; typedef struct ndebug_stream_payload_debug_tensor_read { char prefix[512]; uint32_t logical_nc_id; uint32_t pipe; char tensor_dtype[16]; uint64_t tensor_shape[8]; uint64_t tensor_data_size; char reserved0[416]; char tensor_data[]; } ndebug_stream_payload_debug_tensor_read_t; /** Establish a connection to a specified Logical Neuron Core's debug stream. * * @param logical_nc_idx[in] - Core's debug stream to connect to. * @param stream_fd[out] - Connection handle to reference and interact with the stream. * * @return NRT_SUCCESS on success. * * @note Only one client can connect to a Logical Neuron Core's stream at any given time. * Attempts to connect to a stream with multiple clients will result in a NRT_INVALID * return status. * */ NRT_STATUS nrt_debug_client_connect(int logical_nc_idx, int *stream_fd); /** Closes connection created by `nrt_debug_client_connect` * * @param stream_fd[in] - Connection handle to close. * */ void nrt_debug_client_connect_close(int stream_fd); /** Consumes a single event from the stream. * * @param stream_fd[in] - Stream to consume an event from * @param header[out] - Comsuned event's header. See `ndebug_stream_event_header_t`. * @param payload[out] - Consumed event's payload. See `ndebug_stream_payload*` and `ndebug_stream_event_type_t`. * **IMPORTANT**: it is the user's responsibility to free this payload pointer. * * @return NRT_SUCCESS on success. * * @note This function must be called from the same process that owns the Logical Neuron Core. Calling this * function from any other process results in undefined behavior. * */ NRT_STATUS nrt_debug_client_read_one_event(int stream_fd, ndebug_stream_event_header_t *header, void **payload); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nds/neuron_ds.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include #include #include #ifdef __cplusplus extern "C" { #endif // Main NDS object handle typedef void *nds_obj_handle_t; // NDS object types #define OBJECT_TYPE_MODEL_NODE_INFO (0) #define OBJECT_TYPE_PROCESS_INFO (1) #define OBJECT_TYPE_PROCESS_INFO_EXT (2) // Model-related structs #define MODEL_MEM_USAGE_LOCATION_COUNT 2 /* * Number of slots for mem_usage_type in Neuron Datastore (also used by tools) * * In the current version of the neuron datastore's format, there are only 12 slots for storing * memory usage type, so we aggregate them using the same logic as for the 'per NC' memory tracker. * Monitor always aggregated them even further by adding them together, so we aren't breaking any feature. * * For usage types definiton, go to "inc/tdrv/dma_mem_usage_type.h" * */ enum { NDS_DMA_MEM_USAGE_SLOT_CODE, NDS_DMA_MEM_USAGE_SLOT_TENSORS, NDS_DMA_MEM_USAGE_SLOT_CONSTANTS, NDS_DMA_MEM_USAGE_SLOT_SCRATCHPAD, NDS_DMA_MEM_USAGE_SLOT_MISC, NDS_DMA_MEM_USAGE_SLOT_COUNT = 12 // do not change }; // Aggregated data for all chunks of the same type/location typedef struct nds_mem_usage_info { size_t total_size; // Total size uint32_t chunk_count; // Number chunks that make up the total size } nds_mem_usage_info_t; // Loaded model node information typedef struct nds_model_node_info { uint32_t model_id; // parent model id uint32_t model_node_id; // node id char name[256]; // model name char uuid[16]; // uuid uint8_t nc_index; // nc index uint8_t sg_index; // subgraph index } nds_model_node_info_t; // Loaded model node memory usage information typedef struct nds_model_node_mem_usage_info { // MODEL_MEM_USAGE_LOCATION_COUNT per each usage type nds_mem_usage_info_t model_mem_usage[MODEL_MEM_USAGE_LOCATION_COUNT][NDS_DMA_MEM_USAGE_SLOT_COUNT]; } nds_model_node_mem_usage_info_t; // Version information typedef struct nds_version_info { uint8_t major; uint8_t minor; uint32_t build; } nds_version_info_t; // Process information-related struct typedef struct nds_process_info { int8_t framework_type; char tag[32]; nds_version_info_t framework_version; nds_version_info_t fal_version; nds_version_info_t runtime_version; } nds_process_info_t; // Extended process information typedef struct nds_process_info_ext { char tag[256]; } nds_process_info_ext_t; typedef struct nds_instance nds_instance_t; typedef struct ndl_device ndl_device_t; // Feature bitmap's bit index information typedef enum feature_bitmap_bit_index { BIT_INDEX_TEST_FEATURE = 0, BIT_INDEX_MULTICORE_FEATURE = 1, BIT_INDEX_COUNT = BIT_INDEX_MULTICORE_FEATURE + 1 } feature_bitmap_bit_index_t; /** Opens NDS for the given pid. If pid == 0, it acquires it for the current PID * and it's opened in read-write mode. If pid != 0, it acquires it for the provided PID * and it's opened as read-only. * * @param device[in] - ndl_device used to open this NDS * @pid pid[in] - pid for which to open the NDS, if 0 - it's opened as r/w for the current process * @inst[out] - address of a pointer which will contain the instance handle * * @return non zero in case of error */ int nds_open(ndl_device_t *device, pid_t pid, nds_instance_t **inst); /** Releases the NDS instance and frees the data associated with it (mandatory for readers) * * @param inst[in] - NDS instance to close * * @return non zero in case of error, the pointer gets deleted regardless */ int nds_close(nds_instance_t *inst); /* -------------------------------------------- * NDS Neuroncore Counters * -------------------------------------------- */ /** Increments a simple per-nc counter * * @param inst[in] - NDS instance * @param pnc_index[in] - Neuroncore index * @param counter_index[in] - Counter index * @param increment[in] - Amount to increment * * @return 0 on success. */ int nds_increment_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t increment); /** Decrements a simple per-nc counter * * @param inst[in] - NDS instance * @param pnc_index[in] - Neuroncore index * @param counter_index[in] - Counter index * @param increment[in] - Amount to increment * * @return 0 on success. */ int nds_decrement_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t decrement); /** Gets a simple per-nc counter * * @param inst[in] - NDS instance * @param pnc_index[in] - Neuroncore index * @param counter_index[in] - Counter index * @param value[out] - Counter value * * @return 0 on success. */ int nds_get_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t *value); /** Sets a simple per-nc counter * * @param inst[in] - NDS instance * @param pnc_index[in] - Neuroncore index * @param counter_index[in] - Counter index * @param value[in] - Value to set the counter to * * @return 0 on success. */ int nds_set_nc_counter(nds_instance_t *inst, int pnc_index, uint32_t counter_index, uint64_t *value); /* -------------------------------------------- * NDS Neuron Device Counters * -------------------------------------------- */ /** Increments a simple per-nd counter - may overflow * * @param inst[in] - NDS instance * @param counter_index[in] - Counter index * @param increment[in] - Amount to increment * * @return 0 on success. */ int nds_increment_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t increment); /** Decrements a simple per-nd counter - may overflow * * @param inst[in] - NDS instance * @param counter_index[in] - Counter index * @param decrement[in] - Amount to decrement * * @return 0 on success. */ int nds_decrement_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t decrement); /** Bitwise inclusive OR operation on counter * * @param inst[in] - NDS instance * @param counter_index[in] - Counter index * @param 1ull << bit_index - bit mask on the feature bitmap * * @return 0 on success. */ int nds_or_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t bit_index); /** Gets a simple per-nd counter * * @param inst[in] - NDS instance * @param counter_index[in] - Counter index * @param value[out] - Counter value * * @return 0 on success. */ int nds_get_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t *value); /** Sets a simple per-nd counter * * @param inst[in] - NDS instance * @param counter_index[in] - Counter index * @param value[in] - Value to set the counter to * * @return 0 on success. */ int nds_set_nd_counter(nds_instance_t *inst, uint32_t counter_index, uint64_t *value); /* -------------------------------------------- * NDS objects * -------------------------------------------- */ /** Writes an NDS object to the NDS memory * * @param obj[in] - NDS object handle * * @return 0 on success. */ int nds_obj_commit(nds_obj_handle_t obj); /** Creates a new NDS object with the given type * * @param inst[in] - NDS instance * @param type[in] - type of object to create * * @return handle for newly created object */ nds_obj_handle_t nds_obj_new(nds_instance_t *inst, int type); /** Deletes a NDS object from NDS (and local memory) * * @param obj[in] - NDS object handle * * @return 0 on success. */ int nds_obj_delete(nds_obj_handle_t obj); /** Casts this NDS object to a mode_node_info_t which can be used for r/w * * @param obj[in] - NDS object handle * * @return non-NULL on success. */ nds_model_node_info_t *nds_obj_handle_to_model_node_info(nds_obj_handle_t obj); /** Casts this NDS object to a nds_model_node_mem_usage_info_t which can be used for r/w * * @param obj[in] - NDS object handle * * @return non-NULL on success. */ nds_model_node_mem_usage_info_t *nds_obj_handle_to_model_node_mem_usage(nds_obj_handle_t obj); /** Reads all model info data and returns it as an array (needs to be deleted by caller) * * @param inst[in] - NDS instance * @param models[out] - Pointer where to write the address of an array of length count containing object handles * @param count[out] - Number of models loaded (present in the models array) * * @return non-NULL on success. */ int nds_read_all_model_nodes(nds_instance_t *inst, nds_obj_handle_t **models, size_t *count); /** Casts this NDS object to a nds_process_info_t which can be used for r/w * * @param obj[in] - NDS object handle * * @return non-NULL on success. */ nds_process_info_t *nds_obj_handle_to_process_info(nds_obj_handle_t obj); /** Casts this NDS object to a nds_process_info_ext_t which can be used for r/w * * @param obj[in] - NDS object handle * * @return non-NULL on success. */ nds_process_info_ext_t *nds_obj_handle_to_process_info_ext(nds_obj_handle_t obj); /** Reads process info and returns a nds_obj_handle * * @param inst[in] - NDS instance * * @return non-NULL on success. */ nds_obj_handle_t nds_read_process_info(nds_instance_t *inst); /** Reads extended process info and returns a nds_obj_handle * * @param inst[in] - NDS instance * * @return non-NULL on success. */ nds_obj_handle_t nds_read_process_info_ext(nds_instance_t *inst); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nec.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include #include #include "nrt/nrt_status.h" #include #ifdef __cplusplus extern "C" { #endif #define NEC_MAX_CHANNELS 32 /* matches MAXCHANNELS in NCCL */ #define NEC_MAX_NR_CHANNEL_CHUNKS 32 /* Channel buffers for reduce operation */ #define NEC_MAX_FOLD_N 16 /* * We can set max communicator to anything here but ultimately we will be * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM * space etc) get used up as number of communicators go up. */ #define NEC_MAX_COMM_N 12 /* Max supported replica-groups in NEFF */ #define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */ /* * We can set max communicator to anything here but ultimately we will be * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM * space etc) get used up as number of communicators go up. */ #define NEC_MAX_COMM_N 12 /* Max supported replica-groups in NEFF */ #define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */ /* * We can set max communicator to anything here but ultimately we will be * limited by how much HW resources (such as TOP_SP semaphores or NX DRAM * space etc) get used up as number of communicators go up. */ #define NEC_MAX_COMM_N 12 /* Max supported replica-groups in NEFF */ #define NEC_MAX_NET_BUFFERS (2 * NEC_MAX_COMM_N) /* 2(hier & ring) x (# replica groups) */ #define NEC_CACHE_LINE_SIZE 128 /* Rank ID to denote network connector */ #define NEC_NET_CONNECTOR_RANK -1 /* MLA dev ID to denote network connector */ #define NEC_NET_MLA_DEV -1 /* MLA dev ID to denote POD connector */ #define NEC_POD_MLA_DEV -2 /* Rank ID to denote an unknown connector -> possibly not reachable */ #define NEC_UNKNOWN_RANK -3 /* MLA dev ID to denote an unknown connector -> possibly not reachable */ #define NEC_UNKNOWN_MLA_DEV -3 /* the number of hierarchical cc pipeline stage */ #define NEC_HIER_CC_PIPELINE_STAGE_N (3) /* the max number of outgoing requests in the recv/send proxy */ #define NCCL_NET_NEURON_MAX_REQUESTS 128 /** * The maximum number of concurrent cc execution. As NCCL needs this * information, define the size in the common header file. */ #define NEC_MAX_STREAM_N 4 /** * The different types of ofi communicators that are in the netResources * object that is used in the recv/send proxy */ typedef enum ofi_comm_type { NET_SEND_COMM, NET_RECV_COMM, NET_RECV_LISTEN_COMM, LOCAL_RECV_COMM, LOCAL_SEND_COMM } ofi_comm_type_t; enum enc_comm_type { H_COMM_INTRA_ID = 0, H_COMM_INTER_ID = 1, H_COMM_MAX_ID }; /** * Neuron Elastic Collectives (NEC) * * This is the main component for Neuron Elastic Collectives in Neuron Runtime * (NRT). This is to provide collective operations to applications offloaded by * the device including collective comm init, receiving (post) operations, * building resources for the operation, triggering the operation and polling * its completion. * * +-----------------------+ * | Collectives App | * +-----------------------+ * | Collectives Library | * +-----------------------+ * | NEC / NRT | * +-----------------------+ * | DEVICE | * +-----------------------+ * * TODO: ENC will be renamed to NEC */ typedef enum nec_pod_type { NEC_POD_TYPE_NONE, NEC_POD_TYPE_P2P, NEC_POD_TYPE_SWITCH, NEC_POD_TYPE_INVALID } nec_pod_type_t; typedef enum nec_pod_type { NEC_POD_TYPE_NONE, NEC_POD_TYPE_P2P, NEC_POD_TYPE_SWITCH, NEC_POD_TYPE_INVALID } nec_pod_type_t; /* Translated from what KaenaDriver returns */ typedef enum nec_pod_type { NEC_POD_TYPE_NONE, NEC_POD_TYPE_P2P, NEC_POD_TYPE_SWITCH, NEC_POD_TYPE_INVALID } nec_pod_type_t; typedef struct enc_comm* nec_comm_t; typedef struct enc_channel* nec_channel_t; typedef uint64_t dma_addr_t; struct enc_net_host_memory_index { union { volatile uint32_t index; char pad[NEC_CACHE_LINE_SIZE]; /* Avoid false-sharing */ }; }; /** * Host memory structure for network transport * * The proxy-thread progress function first waits for the device to be ready by * polling host index on fold 0 until it is (-1). Once (-1) was polled, the * proxy-thread resets the host index to 0 and notify the device that the * proxy-thread is ready by incrementing the handshake semaphore by 1. * * On the sender side, the device increase the host index to post a buffer to * send to a remote device. The proxy-thread send progress function polls the * host index and send posted buffers to the respective remote device. The * proxy-thread polls for send requests completions and notifies the device on * these completions by increasing the send_complete semaphore by the amount of * completed send requests. The device may in response to this notification * increase the host index further to post additional buffers to send. The * proxy-thread recognize the last entry in the FIFO by the fact it is * specially marked (See mark_fifo_end()) * * On the receiver side, the device increase the host index to post receive * buffers to be filled with data from a remote device. The proxy-thread recv * progress function polls the host index and post the receive buffers to the * network plugin. The proxy-thread polls for receive completions and notifies * the device on these completions by increasing the recv_complete semaphore by * the amount of completed recv requests. The device use this notification to * know that data is available for processing on the device memory. The device * may also in response to this notification increase the host index further to * post additional buffers to post as receive buffers. The proxy-thread * recognize the last entry in the FIFO by the fact it is specially marked. * * For the ring algorithm: * The sender's handshake and send_complete semaphores * are the send-credit semaphore. * The receiver's handshake and recv_complete semaphores are the recv-cnt * semaphore. * * For the mesh algorithm: * The handshake semaphore is the local-handshake event semaphore for both * sender and receiver. * The recevier's recv_complete semaphore is the broadcast event semaphore. * The sender's send_complete semaphore is the sync event semaphore. */ struct enc_net_host_memory { union { struct { struct enc_net_host_memory_index post_recv[NEC_MAX_FOLD_N]; } recv; struct { struct enc_net_host_memory_index post_send[NEC_MAX_FOLD_N]; } send; }; }; typedef struct enc_host_mem { void *mem_handle; void *va; dma_addr_t pa; size_t size; } enc_host_mem_t; typedef struct enc_host_mem_shared { enc_host_mem_t mem; int refcnt; } enc_host_mem_shared_t; /** * Network connector structure containing allocated resources for network transport */ struct enc_net_connector { int fold_n; enc_host_mem_t net_host_mem; /* Used to signal proxy thread */ enc_host_mem_shared_t *dynamic_input_host_mem; /* Used to pass info only available during execution */ /* Network transport buffer, allocated only for sender */ void *devmem_res; void *nccl_mhandle; /* Address and mhandle for event semaphores and pre-registered buffers */ void *inc_recv_sem_nccl_mhandle; uint32_t *inc_recv_sem_values_buffer; void *inc_recv_sem_values_buffer_mhandle; /* * NCCL network connector data structure. When one proxy worker is used for * the same type (recv or send) network operation, connector information * should be included in each transaction. */ void *nccl_connector; }; typedef enum enc_pattern { ENC_PATTERN_RING, ENC_PATTERN_MESH, ENC_PATTERN_INVALID, } enc_pattern_t; typedef enum enc_net_connectivity { ENC_CONNECTIVITY_MESH, ENC_CONNECTIVITY_RDH, ENC_CONNECTIVITY_DEFAULT } enc_net_connectivity_t; struct enc_channel { /* * Application parameters for init */ int id; enc_pattern_t pattern; /* Applicable only in case of remote neighbor */ struct enc_net_connector *net_recv; /* if receving from rank over the network */ struct enc_net_connector *net_send; /* if sending to rank over the network */ /* * Neuron Runtime context */ void *devmem_res; void *two_step_pod_mesh_devmem_res; /* Gateway buffer is allocated only when hybrid ring is supported */ void *devmem_gw_buf_res; void *nccl_mhandle; dma_addr_t gw_recv_buffer; dma_addr_t gw_send_buffer; struct enc_channel_context *ch_ctx; struct encd_dma_channel *drv_channel; }; struct enc_peer_info { int neuron_dev; int rid; int tpb_index; int pod_node_id; }; typedef enum enc_topology_mode { ENC_TOPO_NULL = 0, ENC_TOPO_4_DEVS_IN_ROW, ENC_TOPO_4_DEVS_IN_COLUMN, } enc_topology_mode_t; struct enc_comm_info { int neuron_dev; int rank; int rank_n; int local_rank_n; int local_rack_rank_n; int node; int node_n; enc_topology_mode_t enc_topo_mode; /* Pod information received from NCCL */ bool enable_pod; bool use_net; /* Whether network interface is used or not with the communicator */ int pod; int pod_n; int pod_node; int pod_node_n; struct enc_peer_info *peers; }; struct enc_ring { int prev; int next; int *user_ranks; /* used by one_rank_per_device rings only */ bool duplicate; }; /* Kangaring */ #define NEC_KANGARING_MAX_NUM_RANKS (256) #define KANGARING_NUM_SENG_PER_DEV (4) #define KANGARING_NUM_TPB_PER_DEV (8) #define KANGARING_MAX_SECONDARIES (3) enum SEngine { S0 = 0, S1 = 1, S2 = 2, S3 = 3, SENGS_PER_DIE = 2, SENGS_PER_MLA = 4 }; struct enc_kangaring { int vnc; // virtual neuron core size int logical_path[NEC_KANGARING_MAX_NUM_RANKS]; // the logical kangaring path: p0 s0 p1 s1 ... int prev; // upstream int next; // downstream int port; // port to go to next /* In VNC 2 case, this is the only peer. For primary ranks, it refer to their secondary rank; * for secondary ranks, this refer to their primary rank. * In VNC 1 case, it refers specifically to the peer over rmtv with the same tpb index. */ int peer_rmtv; /* In VNC 1 case, we have these 2 additional peers. * peer_over_rmtv2 refers to the peer over rmtv with a different tpb index. * peer_local refers to the local peer with a different tpb index */ int peer_rmtv2; int peer_local; int next_peer_rmtv; // next's peer over rmtv bool is_primary; // is self rank on data path? bool is_next_pcie; // is next primary reached via pcie or d2d? bool duplicate; // is this a duplicate channel? bool pattern2; // is pattern 2? }; typedef enum metaring_type { RING, KANGARING, SINGLE_CYCLE_RING, RDH, INVALID_METARING } metaring_type_t; struct enc_alg_metaring { int channel_n; struct enc_channel channels[NEC_MAX_CHANNELS]; struct enc_ring ring_ranks[NEC_MAX_CHANNELS]; struct enc_kangaring kangaring_ranks[NEC_MAX_CHANNELS]; metaring_type_t type; /* Does the group contain only on rank per device? This variable is set to true when NCCL * returns device level H-cycles to runtime. In this case, we will parse that device H-cycle * and generate ring paths on runtime side. We do this because we need to enforce certain * pre-defined patterns in the paths so that we avoid dead locks between concurrent groups. */ bool one_rank_per_device; /* Hybrid ring is supported when RG have 4 H-cycles of one_rank_per_device */ bool is_hybrid_ring; bool tokens_exchanged; /* reinitialzed tokens from old metaring config*/ bool deadlock_free_rank_list; struct enc_comm *comm; /* Backward reference to ENC comm */ struct encd_alg_metaring *drv_alg; /* For use by src/tgt pairs only */ bool skip_send; bool skip_recv; }; /* * The order of the events matter here, so while adding a new event make sure the event is added * to the right section of the list * * ENC_COMMON_NUM_EVENT_TYPE: contains all common events between RDH-Mesh or A2A-mesh * ENC_MESH_NUM_EVENT_TYPE-ENC_COMMON_NUM_EVENT_TYPE: contains events used by mesh * ENC_A2A_NUM_EVENT_TYPE-ENC_MESH_NUM_EVENT_TYPE: contains events used by A2A only * ENC_RDH_NUM_EVENT_TYPE-ENC_A2A_NUM_EVENT_TYPE: contains events used by RDH only * */ typedef enum enc_mesh_event_type { EVT_SYNC, EVT_GLOBAL_HNDSHK, EVT_LOCAL_HNDSHK, EVT_INTER_GRP_BRDCST, EVT_FUNCTION_BARRIER_FIRST_COLL, EVT_FUNCTION_BARRIER_LAST_COLL, EVT_REDUCE_LOCAL_HNDSHK, EVT_INTRA_GRP_BRDCST, ENC_COMMON_NUM_EVENT_TYPE, ENC_MESH_NUM_EVENT_START = ENC_COMMON_NUM_EVENT_TYPE, EVT_REDUCE_COPY = ENC_COMMON_NUM_EVENT_TYPE, EVT_REDUCE_COPY_2, EVT_REDUCE_WRITE, EVT_INTER_GRP_BRDCST_2, EVT_LOCAL_AND_POD_GRP_BRDCST, EVT_LOCAL_AND_POD_GRP_BRDCST_2, ENC_MESH_NUM_EVENT_TYPE, ENC_A2A_NUM_EVENT_START = ENC_MESH_NUM_EVENT_TYPE, EVT_LOCAL_HNDSHK_1 = ENC_MESH_NUM_EVENT_TYPE, EVT_LOCAL_HNDSHK_2, EVT_GLOBAL_HNDSHK_1, EVT_INTER_GRP_BRDCST_1, EVT_INTRA_GRP_BRDCST_1, EVT_2DEV_BRDCST, EVT_2DEV_HNDSHK, EVT_COPY_FROM_HOST, ENC_A2A_NUM_EVENT_TYPE, ENC_RDH_NUM_EVENT_START = ENC_A2A_NUM_EVENT_TYPE, EVT_RH_STEP_0 = ENC_A2A_NUM_EVENT_TYPE, EVT_RH_STEP_1, EVT_RH_STEP_2, EVT_RH_STEP_3, EVT_RH_STEP_4, EVT_RH_STEP_5, EVT_RH_STEP_6, EVT_RH_STEP_7, EVT_RH_STEP_8, EVT_RH_STEP_9, EVT_RDH_LOCAL_HANDSHAKE = EVT_RH_STEP_9, EVT_RDH_AXES_HANDSHAKE, EVT_RD_STEP_0, EVT_RD_STEP_1, EVT_RD_STEP_2, EVT_RD_STEP_3, EVT_RD_STEP_4, EVT_RD_STEP_5, EVT_RD_STEP_6, EVT_RDH_AXES_HANDSHAKE_2, EVT_1DEV_RDH_STEP_1, EVT_1DEV_RDH_STEP_2, EVT_1DEV_RD_STEP_1, EVT_1DEV_RD_STEP_2, EVT_1DEV_RH_STEP_1, EVT_2DEV_RD_STEP_0, EVT_2DEV_RD_STEP_1, EVT_2DEV_RD_STEP_2, EVT_2DEV_RD_STEP_3, EVT_2DEV_RD_STEP_4, EVT_RDH_LOCAL_PEER_HANDSHAKE, ENC_RDH_NUM_EVENT_TYPE // We assume each event is used only once // Enforced by encd_init_mesh_event() } enc_mesh_event_type_t; #define ENC_MESH_MAX_NUM_EVENTS 64 #define KiB (1024) #define MiB (1024 * KiB) #define GiB (1024 * MiB) struct enc_mesh_nbr_grp { int *ranks; int ranks_n; }; struct enc_mesh_event { struct enc_mesh_nbr_grp src_neighbor_grp; struct enc_mesh_nbr_grp dst_neighbor_grp; bool valid; enc_mesh_event_type_t evt_type; }; typedef enum enc_alg_mesh_type { ENC_ALG_FULL_MESH, ENC_ALG_GROUPED_MESH, ENC_ALG_MESH_TRN2, ENC_ALG_MESH_SWITCH, ENC_ALG_MESH_INVALID } enc_alg_mesh_type_t; /* TODO: In a separate commit we will change this to a cpp * file so we can have classes */ #define ENC_MAX_OP_TYPES (13) struct enc_alg_mesh_subtype { struct enc_mesh_event events[ENC_MESH_MAX_NUM_EVENTS]; int num_events; struct encd_alg_mesh_subtype *drv_mesh; struct enc_alg_mesh *mesh; /* backward reference */ size_t op_max_limit[ENC_MAX_OP_TYPES]; /* upper limit below which we will use mesh */ size_t op_min_limit[ENC_MAX_OP_TYPES]; /* lower limit above which we will use mesh */ size_t op_max_limit_sbuf[ENC_MAX_OP_TYPES]; /* upper limit below which we will use mesh for 2D tensors */ size_t op_min_limit_sbuf[ENC_MAX_OP_TYPES]; /* lower limit above which we will use mesh for 2D Tensors */ bool no_inplace_support; bool is_use_chnl_buffer; /* Whether channel bufer will be used or not */ bool is_rdh; bool is_single_step_mesh; bool is_two_step_pod_mesh; bool is_latency_opt; bool is_bw_opt; bool is_rmv_dst_routing; uint32_t alltoall_iteration; }; #define ENC_MAX_MESH_SUBTYPES (20) #define ENC_MESH_MAX_NUM_DEVICES (128) struct enc_alg_mesh { enc_alg_mesh_type_t mesh_type; union { struct { uint32_t devid_to_rankid[ENC_MESH_MAX_NUM_DEVICES]; /* Whether it is a single or a multi chip mesh */ bool is_multi_chip; } trn2; struct { int num_non_net_node_local_groups; } trn1; struct { bool root_rank; int num_intra_group_roots; int local_root_ids[ENC_MESH_MAX_NUM_DEVICES]; int global_root_ids[ENC_MESH_MAX_NUM_DEVICES]; } inf2; }; int group_id; int num_groups; /* Mesh uses only a single channel */ struct enc_channel channel; struct enc_alg_mesh_subtype mesh_subtype[ENC_MAX_MESH_SUBTYPES]; /* Holds maximum amt of data a single group is allowed to deposit into * the channel buffer. The definition of a group varies by platform type. * On TRN1, TRN2 a group currently consists of all or some ranks from a * single chip but on INF2 it refers to a collection of chips. The concept * of a group exists to avoid traffic replication on the wire by combining * input data from multiple ranks within a group before sending it outside * of the group. Therefore at the destination side we only receive a single * chunk of data per group. */ size_t max_chbuf_space_per_group; /* Valid only for TRN2. For TRN2 to prevent AXI deadlock we avoid on-chip * routing at the destination chip and deposit data in the HBM closest to * the entry port. So the rank owning that HBM receives data on behalf of * other ranks on that same chip. This is why we need to carve out dedicated * channel buf space for each of the other s-engines on the same chip. */ size_t max_chbuf_space_per_seng; /* Valid only for single step mesh where we directly copy the entire input * buffer into another rank's channel buffer. */ size_t max_chbuf_space_per_rank; /* Whether to use double buffer to skip global handshake */ bool double_buffer; /* Whether to build RDH */ bool build_rdh; bool rdh_double_buffer; void *rdh_devmem_res; /*intra rdh channel buffer */ bool use_2dev_proxy; bool tokens_exchanged; /* reinitialzed tokens from old mesh config*/ bool use_net; /* Whether inter-node mesh with network proxy is used or not */ /* Backward references to NCCL comm and general cluster info. * These might come from enc_comm or enc_alg_hier */ struct enc_nccl_comm_node *nccl_comm_node; /* Reference to NCCL comm */ struct enc_comm_info *ci; /* General cluster information */ struct enc_comm *comm; /* Backward reference to ENC comm */ struct encd_alg_mesh *drv_alg; /* * DMA mapped memory to host dedicated for A2Av metadata available only during * execution. */ enc_host_mem_t alltoallv_host_input; }; struct enc_alg_hier { struct { struct enc_nccl_comm_node *nccl_comm_node; struct enc_comm_info ci; struct enc_alg_metaring ring; struct enc_alg_metaring kangaring; struct enc_alg_mesh mesh; } intra; struct { struct enc_nccl_comm_node *nccl_comm_node; struct enc_comm_info ci; struct enc_alg_metaring ring; struct enc_alg_metaring rdh; struct enc_alg_mesh mesh; } inter; struct { struct { struct enc_nccl_comm_node *nccl_comm_node; struct enc_comm_info ci; struct enc_alg_metaring ring; } stage[NEC_HIER_CC_PIPELINE_STAGE_N]; } pipeline; void* devmem_res; /* Hierarchical Reduce Scatter uses intermediate buffer */ struct enc_comm *comm; /* Backward reference to ENC comm */ struct encd_alg_hier *drv_alg; }; /** * Comm info to query from NCCL */ typedef struct nccl_comm_info { /* General cluster information */ uint64_t cluster_id; // randomly generated id used to identify unique clusters in log metrics time_t epoch; // the epoch of the initial barrier at the start of a collectives execution. used when generating core dumps so that all ranks agree on a datetime. int neuron_dev; int rank; int rank_n; int local_rank_n; int local_rack_rank_n; int node; int node_n; bool enable_pod; bool use_net; /* Whether network interface is used or not with the communicator */ int pod; int pod_n; int pod_node; int pod_node_n; struct enc_peer_info *peers; /* Needs to be allocated before calling ncclGetCommInfo() or NULL if peers info is not needed */ /* Ring algorithm information */ int channel_n; struct enc_ring rings[NEC_MAX_CHANNELS]; /* Kangaring algorithm information */ int kangaring_channel_n; int* kangaring_paths[NEC_MAX_CHANNELS]; /* Hamiltonian cycles of MLAs, used to construct 1-rank-per-mla rings */ int mla_cycle_n; int* mla_cycles[NEC_MAX_CHANNELS]; } nccl_comm_info_t; typedef struct enc_nccl_comm_node { void *nccl_comm; char *key; size_t key_sz; /* Tracking the graph information in the nccl_comm. We can use * ncclGetCommInfo() but it's expensive. Instead, simply track the graph * information here. This flag can only changed from true to false. The * other way is not possible. */ bool disable_graph; bool global_nccl_comm_node; int refcnt; uint32_t stream_id; uint32_t context_id; uint32_t num_local_participants; uint32_t num_local_leaders; uint32_t my_local_leader; uint32_t *local_participants; uint32_t *local_leaders; struct bp_barrier *local_barrier; bool intra_pod_interface; /* When intra-pod interface is used, we can't skip exeuction barrier */ } enc_nccl_comm_node_t; /* Neuron Device information. This data structure is used to send the device information from NRT to * nccom for nccl communicator building. */ #define ENC_PROXY_HISTOGRAM_OUTPUT_PATH_LENGTH_MAX (128) typedef struct enc_proxy_histogram_config { bool enable; size_t bucket_usecs; size_t num_buckets; size_t per_neff_warmup; size_t warmup; char output_path[ENC_PROXY_HISTOGRAM_OUTPUT_PATH_LENGTH_MAX]; } enc_proxy_histogram_config_t; typedef struct enc_neuron_device_info { int nec_dev_id; int mla_idx; int tpb_idx; int host_device_id; int routing_id; uint64_t pod_id; nec_pod_type_t pod_type; uint32_t pod_node_id; uint32_t virtual_server_id; enc_proxy_histogram_config_t histogram_config; } enc_neuron_device_info_t; /** * Collective communicator corresponding to ncclComm structure * * enc_comm is the Collective Comm that holds all the necessary information to * execute an collective operation. This should be pre-set before operations are * posted mainly because of the topology information built upon physical * connectivity. Collective operations are executed on multiple channels and a * channel is a path for data transfer along a pre-built topology. */ struct enc_comm { struct enc_nccl_comm_node *nccl_comm_node; /* Reference to NCCL comm */ struct enc_comm_info ci; /* General cluster information */ int id; int stream_id; /* * Algorithms */ struct enc_alg_metaring ring; struct enc_alg_metaring kangaring; struct enc_alg_metaring rdh; struct enc_alg_hier hier; struct enc_alg_mesh mesh; /** * Use these handles to share network connector buffers across NEFFs. * Only used in global comm. Other comms will refer to the global comm to reuse them. * We use net_conn_count to sequentially assign these reservations to network conectors * to make sure: * 1) different comm in a NEFF don't reuse the same buffer (for multi-stream cases) * 2) for each NEFF, we always start with index 0 and go up for the most overlap and * reusability. We reset net_conn_count to 0 in enc_load_operations */ int net_conn_count; void* net_connector_devmem_res[NEC_MAX_NET_BUFFERS]; // TODO: nr_channel_chunks and chunk_size should not be a comm property anymore int nr_channel_chunks; /* Channel buffer depth, applies to all channels */ size_t chunk_size; /* Unit of transfer, applies to all channels */ struct encd_comm *drv_comm; /* Reference to driver comm */ char topology[1024]; /* Used for debugging purposes only to print the topology in case of an error */ }; /** * Global communicator */ struct enc_glb_comm { uint32_t g_device_id; /* Same as comm->rank */ uint32_t g_device_cnt; /* Same as comm->rank_n */ uint32_t vtpb_idx; int nec_dev_id; int mla_idx; /* Absolute neuron device hw id. This is the ID that driver exposes neuron device on to host system aka OS. Neuron devices are expesed to RT by different ID in case docker remaps devices */ int host_device_id; int routing_id; uint32_t virtual_server_id; nec_pod_type_t pod_type; uint32_t pod_node_id; uint32_t pod_sz; uint64_t pod_id; const char *root_comm_id; /* By getenv in nrt_config */ bool check_sigs; /* By getenv in nrt_config */ uint32_t *rank_nodes; /* The node index of each rank */ uint32_t *local_ranks; /* The intra-node rank of each rank */ enc_nccl_comm_node_t nccl_comm_node; /* nccl_comm node can be used by any stream */ struct bananaphone *local_rings; struct bp_handle *local_peer_handles; /** * A set of buffers containing values that are used to * increment semaphores over efa transactions. */ uint32_t *inc_recv_sem_values_buffer; size_t inc_recv_sem_values_buffer_size; struct enc_comm comm; /* TODO: manage all the devmem reservations in a single place * Today we share the buffers under the below path: * enc_glb_comm->comm->ring.channels[ring_channel_id].devmem_res * We need to move the above reservations and the one below to a * singleton class e.g. enc_glb_comm->devmem_res_pool */ void* inter_rdh_devmem_res[NEC_MAX_STREAM_N]; /* TODO: manage all the devmem reservations in a single place * this mem res is referred by comm->rdh.rdh_devmem_res */ void* intra_rdh_devmem_res[NEC_MAX_STREAM_N]; void* mesh_devmem_res_per_rg[NEC_MAX_STREAM_N * NEC_MAX_COMM_N * H_COMM_MAX_ID]; void* rdh_devmem_res_per_rg[NEC_MAX_STREAM_N * NEC_MAX_COMM_N]; void *gateway_devmem_res[NEC_MAX_STREAM_N][NEC_MAX_CHANNELS]; pthread_mutex_t gcomm_setup_mtx; void *proxy_queue; // opaque pointer to enc_proxy_queue void *device_barrier_table; }; /** * Network transport FIFOs * * Host send proxy should know the EFA buffer index, offset in the buffer and the size of * each data tranfer to send to remote device and recv proxy * needs destination addresses for each data from sender to submit network receive request. * Send and recv proxy should know when to report the completion of using * EFA buffer and complete is used to notify it. * * Such information is recorded when operation is loaded and becomes available on execution. Host * proxy uses these APIs to query the recorded FIFO. */ /** * A net_ops_info_t entry corresponds to a set of smaller operations that are defined by multiple * net_src_addr_t and net_dest_addr_t. These sub operations can correspond to different types of * actions, so store a net_addr_mark_t identifier in each net_src_addr_t or net_dest_addr_t entry * to denote the purpose of the sub-operation. */ typedef enum net_addr_mark { NET_TRANSFER, /* Will drive data transfer over EFA */ NET_OP_COMPLETE, /* Will mark final completion of a collective operation */ EXEC_COMPLETE /* Will mark final completion of a collective load execution */ } net_addr_mark_t; typedef struct net_src_addr { uint32_t net_op_idx; int complete; dma_addr_t dev_addr; void *host_addr; void *nccl_mhandle; uint32_t size; net_addr_mark_t mark; void* proxy_histogram_tag; /* Fields below are for mesh only */ int dst_rank; /* For local RDMA read */ void *dst_addr; void *dst_mhandle; } net_src_addr_t; typedef struct net_dest_addr { uint32_t net_op_idx; int complete; dma_addr_t dev_addr; void *host_addr; void *nccl_mhandle; uint32_t size; net_addr_mark_t mark; /* Fields below are for mesh only */ int src_rank; } net_dest_addr_t; typedef struct net_ops_info { uint16_t sema_shift_offset; bool early_send_completion; bool early_recv_posting; volatile uint32_t *inc_send_handshake; volatile uint32_t *inc_send_complete; volatile uint32_t *inc_recv_handshake; volatile uint32_t *inc_recv_complete; uint32_t tx_entry_cnt; uint32_t rx_entry_cnt; uint32_t net_idx_loop_size; uint32_t initial_send_credits; uint32_t ending_recv_credits; size_t data_type_sz; bool is_dynamic_send_recv_sz; bool variable_peer; bool add_to_histogram; /* * proxy uses this pointer to get connector information from transaction * saddr/daddr fifo entry of each operation. */ void *enc_channel; } net_ops_info_t; /** * API for proxy-thread to increase handshake and send/recv semaphores by writing directly to the * memory mapped semaphore inc register. * For more information, see documentation on struct enc_net_host_memory definition. */ void nec_inc_semaphore(volatile uint32_t *sem_inc_addr, uint32_t val); /** * API for proxy-thread to get dynamic send and offset for the case where message * size is determined by data only available during execution. */ size_t nec_get_dynamic_send_size_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int dst_rank, int rank_n); size_t nec_get_dynamic_send_offset_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int dst_rank, int rank_n); size_t nec_get_dynamic_recv_offset_bytes(enc_host_mem_t *dyn_input, size_t data_type_sz, int src_rank, int rank_n); void nec_set_recv_size_bytes(enc_host_mem_t *dyn_input, size_t recv_size_bytes, size_t data_type_sz, int src_rank, int rank_n); /** * Qeury device information */ int nec_get_device_count(int *available_devices_array, uint32_t array_size); int nec_get_device_pci_bdf(int neuron_dev, uint32_t *domain, uint32_t *bus_num, uint8_t *pci_slot, uint8_t *dev_func); /** * Query vcore size */ NRT_STATUS nec_get_virtual_core_size(uint32_t *virtual_core_size); typedef struct nec_version_info { uint64_t major; uint64_t minor; uint64_t patch; uint64_t maintenance; char git_hash[16]; uint64_t compatibility_version; // Any new fields added needs to be here. The fields before this cannot be // changed to maintain backward compatibility uint8_t future_fields[]; } nec_version_info_t; NRT_STATUS nec_get_version_info(nec_version_info_t *version_info); NRT_STATUS nec_build_port_and_rid_map(int local_nec_dev_id, int *mla_indexes, int *host_device_ids, int count); bool nec_is_mla_available(int local_nec_dev_id, int mla_idx); int nec_mla_idx_to_rid(int local_nec_dev_id, int mla_idx); int nec_rid_to_mla_idx(int local_nec_dev_id, int rid); int nec_get_peer_mla_idx(int local_nec_dev_id, int mla_idx, int port); int nec_get_p2p_pod_peer_node(uint32_t nec_dev_id, int node, uint32_t port_distance, int *peer_node); NRT_STATUS nec_pod_node_can_access_peer_node(nec_pod_type_t pod_type, uint32_t local_rid, uint32_t local_node_id, uint32_t remote_rid, uint32_t remote_node_id, int *can_access_peer); void nec_ndl_printk(char *str, uint32_t size, uint32_t action); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include // Use quoted includes in nrt headers including other nrt headers. Most clients // (ptxla, jax, etc.) build with bazel, and bazel has issue with angle-brackets. // See https://bazel.build/docs/bazel-and-cpp#include-paths for details. #include "nrt/nrt_status.h" #include "ndl/neuron_driver_shared_tensor_batch_op.h" #ifdef __cplusplus extern "C" { #endif /** Major and minor version of runtime. */ #define NRT_MAJOR_VERSION 2 #define NRT_MINOR_VERSION 0 typedef struct nrt_model nrt_model_t; typedef struct nrt_tensor nrt_tensor_t; typedef struct nrt_cc_context nrt_cc_context_t; /** * WARNING: Do not change the value of existing enums! * These values will be used by libnrt consumers, we * cannot change the defines under them, only append. */ typedef enum { NRT_TENSOR_PLACEMENT_DEVICE, NRT_TENSOR_PLACEMENT_HOST, NRT_TENSOR_PLACEMENT_VIRTUAL, } nrt_tensor_placement_t; typedef enum { NRT_FRAMEWORK_TYPE_INVALID = 0, // Invalid NRT_FRAMEWORK_TYPE_NO_FW = 1, // Framework less execution NRT_FRAMEWORK_TYPE_TENSORFLOW, // Tensorflow NRT_FRAMEWORK_TYPE_PYTORCH, // Pytorch NRT_FRAMEWORK_TYPE_MXNET, // Mxnet NRT_FRAMEWORK_TYPE_PRECHECK, // Neuron Node Precheck } nrt_framework_type_t; enum { NRT_INSTANCE_UNKNOWN = 0, NRT_INSTANCE_INF1 = 1, NRT_INSTANCE_TRN1 = 2, NRT_INSTANCE_TRN1N = 3, NRT_INSTANCE_INF2 = 4, NRT_INSTANCE_TRN2 = 5, NRT_INSTANCE_TRN2N = 6, NRT_INSTANCE_INF2E = 7, NRT_INSTANCE_TRN2P = 8, NRT_INSTANCE_TRN2U = 9, NRT_INSTANCE_TRN2E = 10, NRT_INSTANCE_TRN2EU = 11, NRT_INSTANCE_TRN2AC = 12, NRT_INSTANCE_TRN2UAC = 13, NRT_INSTANCE_TRN3 = 14, NRT_INSTANCE_TRN3PDS98 = 15 }; enum { NRT_INSTANCE_SIZE_1XL, NRT_INSTANCE_SIZE_2XL, NRT_INSTANCE_SIZE_4XL, NRT_INSTANCE_SIZE_6XL, NRT_INSTANCE_SIZE_8XL, NRT_INSTANCE_SIZE_24XL, NRT_INSTANCE_SIZE_32XL, NRT_INSTANCE_SIZE_48XL, NRT_INSTANCE_SIZE_3XL, // Note: Add new sizes right above this line to prevent breaking backward compatibility NRT_INSTANCE_SIZE_UNKNOWN, NRT_INSTANCE_SIZE_NUM = NRT_INSTANCE_SIZE_UNKNOWN, }; typedef enum nrt_op_type { NRT_OP_ADD = 0x0, NRT_OP_FMA = 0x1, NRT_OP_MAX = 0x2, NRT_OP_MIN = 0x3, NRT_OP_INVALID = 0xF, } nrt_op_type_t; typedef enum nrt_dtype { NRT_DTYPE_UNKNOWN = 0x0, NRT_DTYPE_INVALID = 0x0, NRT_DTYPE_FP8_E3 = 0xD, NRT_DTYPE_FP8_E4 = 0xE, NRT_DTYPE_FP8_E5 = 0xF, NRT_DTYPE_FLOAT16 = 0x7, NRT_DTYPE_BFLOAT16 = 0x6, NRT_DTYPE_FLOAT32 = 0xA, NRT_DTYPE_FP32R = 0xB, NRT_DTYPE_UINT8 = 0x3, NRT_DTYPE_UINT16 = 0x5, NRT_DTYPE_UINT32 = 0x9, NRT_DTYPE_UINT64 = 0x1, NRT_DTYPE_INT8 = 0x2, NRT_DTYPE_INT16 = 0x4, NRT_DTYPE_INT32 = 0x8, NRT_DTYPE_INT64 = 0xC, } nrt_dtype_t; typedef enum nrt_cc_op_type { NRT_CC_ALLGATHER, NRT_CC_ALLREDUCE, NRT_CC_REDUCESCATTER } nrt_cc_op_type_t; typedef struct nrt_instance_info { uint32_t family; uint32_t size; char arch_name[16]; char device_revision[8]; } nrt_instance_info_t; NRT_STATUS nrt_get_instance_info(nrt_instance_info_t *info, size_t instance_info_len); /** Initialize neuron runtime. * * @param framework[in] - Type of the framework. * @param fw_version[in] - Framework version as string. (eg 2.1) * @param fal_version[in] - Framework Abstraction Layer version as string. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_init(nrt_framework_type_t framework, const char *fw_version, const char *fal_version); /** Closes all the devices and cleans up the runtime state. */ void nrt_close(); /** Load given NEFF and place it in one or more neuron cores. * * @param neff_bytes[in] - Pointer to NEFF data. * @param size[in] - Length of the NEFF data. * @param vnc[in] - VNC index where the NEFF should be loaded(-1 means runtime would automatically load in first free VNC). * @param vnc_count[in] - DEPRECATED: always use -1 * @param model[out] - Resulting model would be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_load(const void *neff_bytes, size_t size, int32_t vnc, int32_t vnc_count, nrt_model_t **model); /** Load given NEFF for collective operations and place it in one or more neuron cores. * * If global NCCL communicator was not previously created, we will create it inside this API with the assumption that * global device id is same as ctx_device_id and global device count is same as ctx_device_count. * * @param neff_bytes[in] - Pointer to NEFF data. * @param size[in] - Length of the NEFF data. * @param vnc[in] - VNC index where the NEFF should be loaded(-1 means runtime would automatically load in first free VNC). * @param vnc_count[in] - DEPRECATED: always use -1 * @param ctx_device_id[in] - Device ID relative to the number of devices participating in this NEFF * @param ctx_device_count[in] - Number of devices participating in collectives operations in this NEFF * @param model[out] - Resulting model would be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_load_collectives(const void *neff_bytes, size_t size, int32_t vnc, int32_t vnc_count, uint32_t ctx_device_id, uint32_t ctx_device_count, nrt_model_t **model); /** Unload given model and free up device and host resources. * * @param model - Model to unload. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_unload(nrt_model_t *model); /** Get the number of VNCs used by a loaded model. (deprecated) * * @param model[in] - Model. * @param vnc_count[out] - The number of VNCs used by the model. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_model_nc_count(const nrt_model_t *model, uint32_t *vnc_count); /** Get the number of VNCs used by a loaded model. (deprecated) * * @param model[in] - Model. * @param vnc_count[out] - The number of VNCs used by the model. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_model_vnc_count(const nrt_model_t *model, uint32_t *vnc_count); /** Returns VirtualNeuronCores available in instance. (deprecated) * * @param vnc_count[out] - VirtualNeuronCores available in instance. * * @note This API can be called before nrt_init(). * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_total_nc_count(uint32_t *vnc_count); /** Returns VirtualNeuronCores available in instance. * * @param vnc_count[out] - VirtualNeuronCores available in instance. * * @note This API can be called before nrt_init(). * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_total_vnc_count(uint32_t *vnc_count); /** Returns VirtualNeuronCores visible to the application. (deprecated) * * @param vnc_count[out] - VirtualNeuronCores visible to the application. * * @note This API can be called before nrt_init(). * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_visible_nc_count(uint32_t *vnc_count); /** Returns VirtualNeuronCores visible to the application. * * @param vnc_count[out] - VirtualNeuronCores visible to the application. * * @note This API can be called before nrt_init(). * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_visible_vnc_count(uint32_t *vnc_count); /** A container to hold multiple tensors */ typedef void nrt_tensor_set_t; /** Allocates a new tensor set. * * @param result[out] - Pointer to newly allocated tensor set would be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_allocate_tensor_set(nrt_tensor_set_t **result); /** Destroys given tensor_set and frees memory. * * @param tensor_set[in] - Tensors set to be freed. * * @return NRT_STATUS_SUCCESS on success. */ void nrt_destroy_tensor_set(nrt_tensor_set_t **tensor_set); /** Add/replace given tensor to tensor set * * @param tensor_set[in] - Tensor set to which the tensor is added. * @param tensor_name[in] - Name of the tensor. * @param tensor[in] - Pointer to tensor. This pointer should be valid till nrt_destroy_tensor_set() is called. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_add_tensor_to_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t *tensor); /** Get a tensor's info from a tensor set. * * @param tensor_set[in] - Tensor set. * @param tensor_name[in] - Name of the tensor. * @param tensor[out] - Pointer to tensor would be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_tensor_from_tensor_set(nrt_tensor_set_t *tensor_set, const char *tensor_name, nrt_tensor_t **tensor); /** Execute given model with given inputs and collect outputs. * * @param model[in] - Model to execute. * @param input_set[in] - Set of input tensors. * @param output_set[in] - Set of output tensors. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_execute(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set); /** Execute given model with given inputs, repeat execution specified number of times and collect outputs. * * @param model[in] - Model to execute. * @param input_set[in] - Set of input tensors. * @param output_set[in] - Set of output tensors. * @param repeat_count[in] - Number of to repeat execution. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_execute_repeat(nrt_model_t *model, const nrt_tensor_set_t *input_set, nrt_tensor_set_t *output_set, int repeat_count); /** Build (initialize and setup) NCCL global communicator. * * @param vnc[in] - Local VNC (within the instance) * @param g_device_id[in] - Global device id * @param g_device_count[in] - Max world size of all neffs that will be executed * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_build_global_comm(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count); /** Allocates a tensor that can be passed and used by a model for compute. * * @param tensor_placement[in] - Where the tensor would be allocated (device, host, or virtual memory) * @param vnc[in] - Virutal Neuron Core id to allocate the tensor on. Pass in -1 if allocating tensors on host memory. * @param size[in] - Size in bytes of the tensor to allocate. * @param name[in] - OPTIONAL. Name of the tensor. * @param tensor[out] - Pointer to newly created tensor will be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_allocate(nrt_tensor_placement_t tensor_placement, int vnc, size_t size, const char *name, nrt_tensor_t **tensor); /** Deallocates a tensor created by "nrt_tensor_allocate". * * @param tensor[in] - Deallocates given tensor. * * @return None */ void nrt_tensor_free(nrt_tensor_t **tensor); /** Copies data from tensor to passed in buffer. * * @param tensor[in] - Tensor used to reference the tensor to read from. * @param buf[out] - Buffer used to store data read from the tensor. * @param offset[in] - Offset into the tensor to read from. * @param size[in] - Number of bytes to read. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_read(const nrt_tensor_t *tensor, void *buf, size_t offset, size_t size); /** Copies data from passed in buffer to tensor. * * @param tensor[in/out] - Tensor used to reference the tensor to write to. * @param buf[in] - Buffer used to store data to write to the tensor. * @param offset[in] - Offset into the tensor to write to. * @param size[in] - Number of bytes to write. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_write(nrt_tensor_t *tensor, const void *buf, size_t offset, size_t size); /** A batch of tensor operations on a single tensor */ // the definition of nrt_tensor_batch_op_t is in neuron_driver_shared_tensor_batch_op.h typedef struct nrt_tensor_batch { const nrt_tensor_t *tensor; // Tensor handle const nrt_tensor_batch_op_t *ops; // Array of operations for this tensor uint32_t num_ops; // Number of operations for this tensor } nrt_tensor_batch_t; /** Batch read data from multiple tensors. * * @param batches[in] - An array of batches, each of which describes operations on one tensor * @param num_batches[in] - Number of batches (tensors) in the array * @param unsafe[in] - If true, skip tensor tracking/blocking (use with caution) * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_read_batch(const nrt_tensor_batch_t *batches, uint64_t num_batches, bool unsafe); /** Batch write data to multiple tensors. * * @param batches[in] - An array of batches, each of which describes operations on one tensor * @param num_batches[in] - Number of batches (tensors) in the array * @param unsafe[in] - If true, skip tensor tracking/blocking (use with caution) * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_write_batch(const nrt_tensor_batch_t *batches, uint64_t num_batches, bool unsafe); /** Copies data between tensors. * * When copying between two device tensors, they must both be allocated on the SAME Neuron Core. * A NRT_INVALID will be returned in the failing case. * * @param src[in] - Tensor to copy from. * @param src_offset[in] - Offset into the source tensor to copy from. * @param dst[out] - Tensor to copy to. * @param dst_offset[in] - Offset into the destination tensor to copy to. * @param size[in] - Number of bytes to copy. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_copy(const nrt_tensor_t *src, size_t src_offset, nrt_tensor_t *dst, size_t dst_offset, size_t size); /** Gets the size of the passed in tensor. * * @param tensor[in] - Tensor used to reference the tensor to get size of. * * @return Size of the tensor. */ size_t nrt_tensor_get_size(const nrt_tensor_t *tensor); /** Set the memory + offset pointed to by tensor to value * * @param tensor[in] - allocated tensor * @param offset[in] - offset within the tensor * @param value[in] - value to set with * @param size[in] - size of memory to set * * @return 0 on success. */ NRT_STATUS nrt_tensor_memset(nrt_tensor_t *tensor, uint64_t offset, int value, size_t size); /** Allocates an empty tensor, i.e. the tensor structure w/o any attached storage * * @param name[in] - OPTIONAL. Name of the tensor. * @param tensor[out] - Pointer to newly created tensor will be stored here. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_allocate_empty(const char *name, nrt_tensor_t **tensor); /** Attaches caller supplied buffer to a tensor. Any storage previously attached to the tensor is detached * and freed if was owned by the tensor. * The buffer is supplied by the caller and must persist through the entire lifetime of the tensor. * * @param tensor[in] - Tensor * @param buffer[in] - Caller supplied buffer to use as tensor's storage * @param size[in] - Buffer Size * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_tensor_attach_buffer(nrt_tensor_t *tensor, void *buffer, size_t size); /** Creates a tensor to point to a slice of another tensor * does not do a deep copy, just points the "slice" tensor storage to the "source" tensor storage * * @param tensor_source[in] - Tensor to point at * @param offset[in] - Offset from the beginning of the source tensor to point at * @param size[in] - Size of the slice * @param name[in] - Optional name for the new tensor * @param tensor_slice[in] - Newly allocated tensor to point to the storage of the source tensor * */ NRT_STATUS nrt_tensor_allocate_slice( const nrt_tensor_t *tensor_source, size_t offset, size_t size, const char *name, nrt_tensor_t **tensor_slice); /** Given a tensor get the virtual address. * * @param tensor[in] - Tensor for which the VA needs to be obtained * * @return va on success, NULL on failure. */ void *nrt_tensor_get_va(const nrt_tensor_t *tensor); /** Returns on device allocation info for a tensor * * @param tensor[in] - Tensor for which the information needs to be obtained * @param alloc_info[out] - On device allocation information * * @return NRT_STATUS_SUCCESS on success. */ typedef struct nrt_tensor_device_allocation_info { uint64_t physical_address; // physical address in device memory space size_t size; // allocation size, could be larger than the tensor size int hbm_index; // which of the HBMs the tensor is placed } nrt_tensor_device_allocation_info_t; NRT_STATUS nrt_tensor_get_device_allocation_info(const nrt_tensor_t *tensor, nrt_tensor_device_allocation_info_t *alloc_info); /** * @brief A Runtime API to check if a given output tensor is fully written/complete. * If timeout is given as unbounded, it emits a warning at the first 30 seconds. * * @param output_tensor: The given output tensor. * @param timeout: The maximum total duration to wait for tensor completion in microseconds. * If timeout is negative, the wait is unbounded. The caller is in charge of handling the timeout behaviors. * o/w, it checks completion until the timeout. * @param expected_completion_count: The number of completions expected by the caller. * * @return NRT_STATUS: It returns NRT_SUCCESS if the tensor is complete; * It returns NRT_INVALID, if the output tensor is given as NULL; * It returns NRT_TIMEOUT if the tensor is not reaching the expected_completion_count within the timeout. */ NRT_STATUS nrt_tensor_check_output_completion(const nrt_tensor_t *output_tensor, int64_t timeout, uint64_t expected_completion_count); /** * @brief A Runtime API to reset the completion counter inside an output tensor to 0. * * @param output_tensor: The given output tensor. * @return NRT_STATUS: It returns NRT_SUCCESS if reset is successful; * It returns NRT_INVALID, if the output tensor is given as NULL. */ NRT_STATUS nrt_tensor_reset_output_completion(nrt_tensor_t *output_tensor); /** * @brief Get the anonymous file-descriptor of dma-buf associated with * a Neuron device memory region if it was registered for EFA peer direct * * @param addr[in] - Device buffer virtual address * @param size[in] - Device buffer size (in bytes) * @param fd[out] - dma-buf fd * * @return NRT_SUCCESS on success */ NRT_STATUS nrt_get_dmabuf_fd(uint64_t va, uint64_t size, int* fd); /** Get the host based device id from the device id presented to runtime (which may container based device id) * @param neuron_dev[in] - device id * @param host_device_id[out] - host device id * @return NRT_SUCCESS if call was successful, NRT_INVALID otherwise */ NRT_STATUS nrt_host_device_id_get( int neuron_dev, uint32_t *host_device_id); /** Return array of routing IDs indexed by host device ID. This is the definitive routing ID mapping provided from the driver * @param coutn[in/out] - [in] number of entries in the mapping table provided. [out] count of entries returned * @param host_did_to_rid_map[in] - table/map of routing IDs indexed by host device ID * @return NRT_SUCCESS if call was successful, NRT_INVALID otherwise */ NRT_STATUS nrt_host_device_id_rid_map_get(uint32_t *count, uint32_t *host_did_to_rid_map); /** * Get the HBM virtual address and size for a specific HBM index. * @param device_id[in] - Device ID * @param hbm_idx[in] - HBM index * @param addr[out] - Pointer to store the virtual address * @param size[out] - Pointer to store the size of the HBM region * @return NRT_SUCCESS if call was successful and HBM region was mapped * NRT_INVALID_HANDLE if there are no more HBM regions to map for this device * NRT_INVALID if the interface isn't supported or for invalid parameters * NRT_FAILURE for other errors */ NRT_STATUS nrt_get_hbm_mmap_va(int device_id, int hbm_idx, void **addr, size_t *size); typedef struct nrt_vnc_memory_stats { size_t bytes_used; size_t bytes_limit; // NOTE: For backward compatibility, when making updates, don't delete existing fields, and // ALWAYS add to the end of this struct! } nrt_vnc_memory_stats_t; /** Get the NRT memory stats for a VNC. * * @param vnc[in] - Local VNC (within the instance) * @param stats[out] - Pointer to a nrt_vnc_memory_stats struct * @param stats_size_in[in] - Caller expected size of the nrt_vnc_memory_stats struct, for compatibility purposes * @param stats_size_out[out] - Library written size of the nrt_vnc_memory_stats struct, for compatibility purposes * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_vnc_memory_stats(uint32_t vnc, nrt_vnc_memory_stats_t *stats, size_t stats_size_in, size_t *stats_size_out); /** Get BDF of the EFA device attached to a Neuron device identified by VA of HBM allocation on that device * * @param va[in] - VA of a memory allocated on a Neuron devices * @param efa_bdf[out] - a buffer (of sufficient size) to store BDF of the connected EFA device * @param len[in/out] - in: length of buffer (including NULL), out: length of string (excluding NULL) * * @return NRT_SUCCESS on success * NRT_RESOUCE if the buffer is not large enough to store the BDF string * NRT_FAILURE for other errors */ NRT_STATUS nrt_get_attached_efa_bdf(const void *va, char *efa_bdf, size_t *len); /****************************** * Out-of-NEFF collectives * ******************************/ typedef struct nrt_cc_comm { uint32_t *replica_group; /* a list of participants */ uint32_t rank; /* my rank in the replica_group */ uint32_t rank_n; /* size of replica_group */ uint32_t ctx_device_id; uint32_t ctx_device_count; uint32_t vnc; } nrt_cc_comm_t; typedef struct nrt_tensor_list { nrt_tensor_t **tensors; size_t num_tensors; } nrt_tensor_list_t; /** Build (initialize and setup) global communicator for host-driven collective operations. * * @param vnc[in] - Local VNC (within the instance) * @param g_device_id[in] - Global device id * @param g_device_count[in] - Max world size of all participating workers * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_cc_global_comm_init(uint32_t vnc, uint32_t g_device_id, uint32_t g_device_count); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_async.h ================================================ /* * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include // Use quoted includes in nrt headers including other nrt headers. Most clients // (ptxla, jax, etc.) build with bazel, and bazel has issue with angle-brackets. // See https://bazel.build/docs/bazel-and-cpp#include-paths for details. #include "nrt/nrt.h" #ifdef __cplusplus extern "C" { #endif // execution units typedef enum { NRTA_XU_TENSOR_READ = 0, NRTA_XU_TENSOR_WRITE, NRTA_XU_TENSOR_OP, // For tensor ops other than read and write NRTA_XU_COMPUTE, NRTA_XU_COLLECTIVES, // For new XU types, must only add after existing ones NRTA_XU_TYPE_NUM } nrta_xu_t; // nrta_seq_t's are monotomically increasing ids of executions // The first 16 bits are a Execution Unit ID, while the last // 48 bits are a strictly ordered Sequence Number typedef uint64_t nrta_seq_t; typedef uint16_t nrta_xu_id_t; #define NRTA_SEQ_NUM_MAX ((1ull << 48) - 1) #define NRTA_SEQ_NUM_MASK NRTA_SEQ_NUM_MAX #define NRTA_SEQ_GET_SEQ_NUM(seq_id) (seq_id & NRTA_SEQ_NUM_MASK) #define NRTA_SEQ_GET_XU_ID(seq_id) (seq_id >> 48) typedef struct nrta_error { nrta_seq_t seq_id; uint64_t error_code; // NRT_STATUS, but typed as uint64 to ensure consistent representation across compilers } nrta_error_t; static_assert(sizeof(nrta_error_t) == 16, "nrta_error_t must be of size 16"); // data structure used to store errors encountered during execution typedef struct nrta_error_tracker nrta_error_tracker_t; /** Enqueues a tensor write request. Copies the data from a host buffer to a * tensor allocated on a Neuron device. Uses TENSOR_WRITE execution unit based * on the LNC that allocated the tensor. * * @param tensor[in] - Destination tensor * @param buf[in] - Host buffer containing source data * @param offset[in] - Offset into the tensor * @param size[in] - Number of bytes to write * @param queue[in] - XU queue to use, * @param err[in] - error tracker * @param req_sequence[out] - Sequence number of the scheduled request * * @return NRT_SUCCESS on success */ NRT_STATUS nrta_tensor_write(nrt_tensor_t *tensor, const void *buf, uint64_t offset, uint64_t size, int queue, nrta_error_tracker_t *err, nrta_seq_t *req_sequence); /** Enqueues a tensor read request. Copies the data from a tensor allocated on a Neuron device * to a host buffer. Uses TENSOR_READ execution unit based * on the LNC that allocated the tensor. * * @param buf[in] - Destination Host buffer * @param tensor[in] - Source tensor * @param offset[in] - Offset into the tensor * @param size[in] - Number of bytes to read * @param queue[in] - XU queue to use, * @param err[in] - error tracker * @param req_sequence[out] - Sequence number of the scheduled request * * @return NRT_SUCCESS on success */ NRT_STATUS nrta_tensor_read(void *buf, nrt_tensor_t *tensor, uint64_t offset, uint64_t size, int queue, nrta_error_tracker_t *err, nrta_seq_t *req_sequence); /** Enqueues a tensor copy request. Copies data between two tensors allocated * on the same Logical Neuron Core. Uses TENSOR_OP execution unit. * * NOTE: the tensors must be allocated until the copy completes * * @param src[in] - Source tensor * @param src_offset[in] - Offset into the source tensor * @param dst[in] - Destination tensor * @param dst_offset[in] - Offset into the destination tensor * @param size[in] - Number of bytes to copy * @param queue[in] - XU queue to use * @param err[in] - error tracker * @param req_sequence[out] - Sequence number of the scheduled request * * @return NRT_SUCCESS on success */ NRT_STATUS nrta_tensor_copy(nrt_tensor_t *src, uint64_t src_offset, nrt_tensor_t *dst, uint64_t dst_offset, uint64_t size, int queue, nrta_error_tracker_t *err, nrta_seq_t *req_sequence); /** Schedules an asynchronous request to execute a model with specified inputs * and outputs. Uses COMPUTE execution unit of an LNC of the loaded model. * * @param model[in] - The model to schedule for execution * @param input_set[in] - Set of input tensors for the model * @param output_set[in] - Set of tensors to receive the outputs * @param queue[in] - XU queue to use, must be 0 * @param err[in] - error tracker * @param req_sequence[out] - Sequence number of the scheduled request * * @return NRT_SUCCESS on successful preparation, appropriate error code otherwise */ NRT_STATUS nrta_execute_schedule(nrt_model_t *model, const nrt_tensor_set_t *input, nrt_tensor_set_t *output, int queue, nrta_error_tracker_t *err, nrta_seq_t *req_sequence); /** Prepares collective context and HW configuration needed for collectives operation. * Allocates a collective context handle that is returned to the caller * which is freed in the schedule thread post CC op execution. * * @param comm[in] - Communicator containing the replica group * @param input[in] - Input tensor list * @param output[out] - Output tensor list * @param dtype[in] - Data type of elements * @param op[in] - Reduction operation (e.g., SUM, MAX) if applicable * @param cc_op[in] - Collective operation (e.g., ALLREDUCE, ALLGATHER) * @param cc_ctx[out] - Collective context * * @return NRT_SUCCESS on successful preparation, appropriate error code otherwise */ NRT_STATUS nrta_cc_prepare(nrt_cc_comm_t *comm, nrt_tensor_list_t *input, nrt_tensor_list_t *output, nrt_dtype_t dtype, nrt_op_type_t op, nrt_cc_op_type_t cc_op, nrt_cc_context_t **cc_ctx); /** Schedules an asynchronous request to execute collective operation * * @param cc_ctx[in] - Collective context * @param queue[in] - XU queue to use, must be 0 * @param err[in] - error tracker * @param req_sequence[out] - Sequence number of the scheduled request * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrta_cc_schedule(nrt_cc_context_t **cc_ctx, int queue, nrta_error_tracker_t *err, nrta_seq_t *req_sequence); // completion status /** Checks completion status of a scheduled request * * @param seq[in] - Scheduled request sequence id * @param is_completed[out] - true if the request is completed, false otherwise * * @return NRT_SUCCESS if the request is completed, NRT_INVALID if the seq is not valid */ NRT_STATUS nrta_is_completed(nrta_seq_t seq, bool *is_completed); /** Returns sequence number of the last completed request * * @param lnc[in] - LNC * @param xu[in] - XU * @param queue[in] - XU's queue * @param seq[out] - last completed sequence number * * @return NRT_SUCCESS on success */ NRT_STATUS nrta_get_sequence(uint32_t lnc, nrta_xu_t xu, int queue, nrta_seq_t *seq); /** Returns a pollable file descriptor that is READABLE when the execution request * specified by seq is complete. * * Note that users should only use the `poll` family of functions and `close` on this file * descriptor. Any other FD function is invalid and can lead to undefined behavior. * * The file descriptor must be passed to `close` to free the handle once the handle is not * needed anymore. * * @param seq[in] - sequence to track completion * @param fd[out] - FD associate with the sequence. * @return NRT_SUCCESS on success */ NRT_STATUS nrta_get_completion_handle(nrta_seq_t seq, int *fd); /** Creates an error tracker list * * @param lnc_idx[in] - Logical Neuron Core this list will be used for * @param error_tracker[out] - Created list. * @return NRT_SUCCESS on success */ NRT_STATUS nrta_error_tracker_create(uint32_t lnc_idx, nrta_error_tracker_t **error_tracker); /** Frees an error tracker list * * @param error_tracker[in] - Error tracker list to free * */ void nrta_error_tracker_destroy(nrta_error_tracker_t *error_tracker); /** Gets list of errors from error tracker list * * @param error_tracker[in] - Error tracker list to get errors from * @param list[out] - Array of errors obtained from teh error tracker * @param error_count[out] - Number of errors in the list * @return NRT_SUCCESS on success */ NRT_STATUS nrta_error_tracker_get_list(nrta_error_tracker_t *error_tracker, const nrta_error_t **list, size_t *error_count); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_async_sendrecv.h ================================================ #pragma once #include "nrt/nrt.h" #include "nrt/nrt_status.h" #ifdef __cplusplus extern "C" { #endif typedef struct nrt_async_sendrecv_comm nrt_async_sendrecv_comm_t; typedef struct nrt_async_sendrecv_request nrt_async_sendrecv_request_t; /** * Get the maximum number of async sendrecv communicators per logical neuron core * * @param num[out] - The maximum number of async sendrecv communicators per logical neuron core * @return NRT_SUCCESS on success * NRT_FAILURE for errors */ NRT_STATUS nrt_async_sendrecv_get_max_num_communicators_per_lnc(int* num); /** * Get the maximum number of pending requests per async sendrecv communicator * * @param num[out] - The maximum number of pending requests per async sendrecv communicator * @return NRT_SUCCESS on success * NRT_FAILURE for errors */ NRT_STATUS nrt_async_sendrecv_get_max_num_pending_request(int* num); /** Initialize asynchronous tensor send and receive on logical neuron core * * Logical neuron core ID is the absolute ID of the logical core on * the host machine. The ID is uneffected by device remapping via * docker and selection of visible logical cores. * * This function may only be called when runtime is initialized. This * function must have a matching call to nrt_async_sendrecv_close() before * nrt_close() is called. * This function returns error in case preceeding call to * nrt_async_sendrecv_close() on the logical neuron core returned error. * * @param lnc[in] - Logical neuron core ID on the current server * @return NRT_SUCCESS if logical core has been initialized successfully * NRT_FAILURE for errors */ NRT_STATUS nrt_async_sendrecv_init(int lnc); /** Closes asynchronous tensor send and receive of logical neuron core and cleans up resources * * A call to this function must have a preceeding matching call to * nrt_async_sendrecv_init(). After this function was invoked, all sendrecv * communicators and requests associated with this logical neuron core * are closed and cannot be accessed anymore invoking functions with those * communicators or requests is regarded undefined behavior. * Cases where this function is called and one of the communicators is * not connected yet are considered an error. Cases where this * function is called and send or receive requests are still inflight * are considered an error. * * @param lnc[in] - Logical neuron core ID on the current server * @return NRT_SUCCESS if logical core has been closed successfully * NRT_FAILURE for errors */ NRT_STATUS nrt_async_sendrecv_close(int lnc); /** Create send communicator * * Before send communicator can be used to initiate sending a tensor, * connection to receive communicator must be established. Use * function nrt_async_sendrecv_test_comm() to test whether connection is * established. * Async sendrecv for logical neuron core lnc must have been * initialized via call to nrt_async_sendrecv_init() before this function is * invoked. * This function is thread-safe. * * @param peer_ip[in] - IP adress of peer logical neuron core * @param peer_lnc[in] - Logical neuron core ID on the peer server * @param lnc[in] - Logical neuron core ID on the current server * @param send_comm[out] - Pointer to send communicator * @return NRT_SUCCESS if logical core has been created successfully * NRT_RESOURCE if the number of created communicators exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_COMMUNICATORS_PER_LNC * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_connect(const char* peer_ip, int peer_lnc, int lnc, nrt_async_sendrecv_comm_t** send_comm); /** Create receive communicator * * Before receive communicator can be used to initiate receiveing a tensor, * connection to receive communicator must be established. Use * function nrt_async_sendrecv_test_comm() to test whether connection is * established. * Async sendrecv for logical neuron core lnc must have been * initialized via call to nrt_async_sendrecv_init() before this function is * invoked. * This function is thread-safe. * * @param peer_ip[in] - IP adress of peer logical neuron core * @param peer_lnc[in] - Logical neuron core ID on the peer server * @param lnc[in] - Logical neuron core ID on the current server * @param recv_comm[out] - Pointer to receive communicator * @return NRT_SUCCESS if logical core has been created successfully * NRT_RESOURCE if the number of created communicators exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_COMMUNICATORS_PER_LNC * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_accept(const char* peer_ip, int peer_lnc, int lnc, nrt_async_sendrecv_comm_t** recv_comm); /** Test whether connection has been established * * @param comm[in] - The send or receive communicator * @param done[out] - True if connection to peer communicator is established * @return NRT_SUCCESS if test performed without error * NRT_INVALID_HANDLE if handle is invalid * NRT_TIMEOUT if the communicator fails to establish connection within time limit * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_test_comm(nrt_async_sendrecv_comm_t* comm, bool* done); /** Asynchronously send a tensor * * This is a non-blocking function. * * This function is thread-safe. This function is only allowed to be * invoked on a communicator that is sucessfully tested to be * connected via call to nrt_async_sendrecv_test_comm(). * * @param tensor[in] - Tensor to receive to * @param offset[in] - Offset into the tensor to receive to * @param length[in] - Number of bytes to read * @param send_comm[in] - Send communicator * @param request[out] - Pointer to receive request * @return NRT_SUCCESS on success * NRT_INVALID_HANDLE if handle is invalid * NRT_RESOURCE if the number of pending requests exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_PENDING_REQUEST * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_send_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, nrt_async_sendrecv_comm_t* send_comm, nrt_async_sendrecv_request_t** request); /** Asynchronously receive a tensor * * This is a non-blocking function. * * This function is thread-safe. This function is only allowed to be * invoked on a communicator that is sucessfully tested to be * connected via call to nrt_async_sendrecv_test_comm(). * * @param tensor[in] - Tensor to receive to * @param offset[in] - Offset into the tensor to receive to * @param length[in] - Number of bytes to read * @param recv_comm[in] - Receive communicator * @param request[out] - Pointer to receive request * @return NRT_SUCCESS on success * NRT_INVALID_HANDLE if handle is invalid * NRT_RESOURCE if the number of pending requests exceeds the limit of NRT_ASYNC_SENDRECV_MAX_NUM_PENDING_REQUEST * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_recv_tensor(nrt_tensor_t* tensor, size_t offset, size_t length, nrt_async_sendrecv_comm_t* recv_comm, nrt_async_sendrecv_request_t** request); /** Test the completion status of a asynchronous request * * This function is thread-safe when invoked with different * requests. This function is not allowed to be invoked concurrently * by multiple threads with the same request at the same time. When * this function returned request to be completed, this function is * not allowed to be invoked again with the same request. * * @param request[in] - Request to test * @param done[out] - Whether the request has completed * @param size[out] - Number of bytes sent/received * @return NRT_SUCCESS on success * NRT_INVALID_HANDLE if handle is invalid * NRT_TIMEOUT if the request fails to complete data transfer within time limit * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_test_request(nrt_async_sendrecv_request_t* request, bool* done, size_t* size); /** Flush received messae to ensure full arrival in memory * * Ensure that received messages of successfully tested async sendrecv * receive operations prior to call to this function fully arrived in * memory after this function completes. * * @param lnc[in] - Receiving logical neuron core ID * @return NRT_SUCCESS if flush operation succeeded * NRT_FAILURE for other errors */ NRT_STATUS nrt_async_sendrecv_flush(int lnc); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_experimental.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #include #include "nrt/nrt_status.h" #include "nrt/nrt.h" #ifdef __cplusplus extern "C" { #endif /** Usage of a Tensor in the NEFF */ typedef enum nrt_tensor_usage { NRT_TENSOR_USAGE_INPUT = 0, // Tensor is used for ifmap NRT_TENSOR_USAGE_OUTPUT, // Tensor is used for ofmap } nrt_tensor_usage_t; #define NRT_TENSOR_NAME_MAX 256 typedef struct nrt_tensor_info { char name[NRT_TENSOR_NAME_MAX]; // Name of the tensor nrt_tensor_usage_t usage; // Type of the tensor size_t size; // Tensor size in bytes nrt_dtype_t dtype; // data type uint32_t *shape; // an array representing data shape uint32_t ndim; // the number of dimensions } nrt_tensor_info_t; typedef struct nrt_tensor_info_array { uint64_t tensor_count; // Total number of tensors in the NEFF nrt_tensor_info_t tensor_array[]; // Array of tensor info } nrt_tensor_info_array_t; /* Function definition for async exec status callbacks */ typedef void (*NRT_ASYNC_EXEC_STATUS_CALLBACK)(void *params, uint32_t model_id, uint32_t vnc, uint64_t job_id, NRT_STATUS status); /** Return input/output tensor information for a given model. * * @param model[in] - Model for which tensor information needs to be extracted. * @param tensor_info[out] - Pointer to store the result. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_model_tensor_info(nrt_model_t *model, nrt_tensor_info_array_t **tensor_info); /** Return the instance count for this model handle (optimal number of concurrent threads that can call nrt_execute). (deprecated) * * @param model[in] - Model for the instance count needs to be returned. * @param instance[out] - Pointer to store the result. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_model_instance_count(nrt_model_t *model, uint32_t *instance_count); /** Free input/output tensor information for a given model. * * @param tensor_info[in] - Pointer to store the result. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_free_model_tensor_info(nrt_tensor_info_array_t *tensor_info); /** Enable tracing for all VNCs visible to the app * * @param trace_mem[in] - collect memory allocation info * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_trace_start(bool trace_mem); /** Serialize all data and disable tracing * * @param filename[in] - filename to write to * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_trace_stop(const char *filename); /** temporary, to be removed. See comment in neuron_nccl.cc */ void *nrt_get_libnccl_net(int *err, char *err_msg, size_t err_msg_size); /** Structs to pass around ucode image info */ typedef struct nrt_ucode_img { uint8_t *bin; size_t size; } nrt_ucode_img; typedef struct nrt_ucode_info { nrt_ucode_img iram; nrt_ucode_img dram; } nrt_ucode_info; /** Specify pooling engine ucode iram and dram images that will get loaded by nrt_init(). * To use this API, it MUST be called BEFORE nrt_init(). * Swapping ucode after nrt_init() is NOT supported. Ucode images are only loaded once. * This API provides a temporary workaround for swapping ucode. */ NRT_STATUS nrt_set_pool_eng_ucode(const nrt_ucode_info *ucode_info); /** Copies data to memory mapped Neuron device memory * * @param dest[in] - Pointer to destination memory (mmaped device memory) * @param src[in] - Pointer to source memory * @param size[in] - Copy size * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_memcpy_to_device(void *dest, const void *src, size_t size); /** Register a return status callback to post exec status to when running in async exec mode. * Calling this multiple times will replace the previouly registered callback. * * @param callback[in] - Callback to post nrt exec status to for async execution. * @param params[in] - Params for the async exec thread to pass to the callback upon * execution completion. Can be NULL. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_register_async_exec_callback(NRT_ASYNC_EXEC_STATUS_CALLBACK callback, void *params); /** Implements a barrier by running a small all-reduce over all workers * * @param vnc[in] - local VNC (within the instance) * @param global_device_id[in] - global worker ID * @param global_device_count[in] - total number of workers * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_barrier(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count); /** Perform all-rank AllGather * * @param vnc[in] - local VNC (within the instance) * @param g_device_id[in] - global worker ID * @param g_device_count[in] - total number of workers * @param rank_input_size[in] - input size * @param input[in] - ptr to input data from this rank * @param output[out] - ptr to output buffer of size (g_device_count*rank_input_size) * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_all_gather(int32_t vnc, uint32_t g_device_id, uint32_t g_device_count, uint32_t rank_input_size, void *input, void *output); /** Blocks caller until all queued executions on async worker thread are drained. * * @param vnc - VNC index to block on. * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_async_drain_queued_execs(int32_t vnc); typedef struct nrt_model_info { uint32_t vnc; // additional fields can be added here in the future // do not remove previously added fields because it will cause // memory corruption if the caller was compiled using a different // version of this header. } nrt_model_info_t; /** Returns information about loaded model * * @param model [in] - the model * @param info [out] - the information about the model * @param info_size_in [in] - the size of the info structure (used for version control) * @param info_size_out [out] - the number of bytes written (for version control) * * @return NRT_SUCCESS on success */ NRT_STATUS nrt_get_model_info(const nrt_model_t *model, nrt_model_info_t *info, size_t info_size_in, size_t *info_size_out); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_profile.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include "nrt/nrt.h" #ifdef __cplusplus extern "C" { #endif /** Enable profiling for a model * * @param model[in] - model to profile * @param filename[in] - output filename that will be used with nrt_profile_stop() * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_start(nrt_model_t *model, const char *filename); /** Collect results and disable profiling for a model * * @param filename[in] - output filename to save the NTFF profile to * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_stop(const char *filename); /** Options for continuous device profiling. * * Opaque struct used to preserve compatibility and enforce proper usage. * Use nrt_profile_continuous_options_set_* functions set options. * Default options: * - output_dir: "./output" * * Usage: * nrt_profile_continuous_options_t *options; * nrt_profile_continuous_options_allocate(&options); * nrt_profile_continuous_options_set_output_dir(options, "./output"); */ typedef struct nrt_profile_continuous_options nrt_profile_continuous_options_t; /** Allocate memory for the nrt_profile_continuous_options_t struct and set all options to defaults. * * @param options[in] - pointer to a pointer to nrt_profile_continuous_options_t struct */ NRT_STATUS nrt_profile_continuous_options_allocate(nrt_profile_continuous_options_t **options); /** Free up memory allocated for the options struct needed for continuous device profiling. * * @param options[in] - pointer to a nrt_profile_continuous_options struct * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_continuous_options_free(nrt_profile_continuous_options_t *options); /** Sets the output directory for results of continuous device profiling. * * The filename is set automatically. * * @param[in,out] options Pointer to the options struct. * @param[in] output_dir Path to the output directory. * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_continuous_options_set_output_dir(nrt_profile_continuous_options_t *options, const char *output_dir); /** @brief Start continuous device profiling. * * When continuous device profiling is started, profiling is enabled for every model but notifications * will only be serialized to disk when the user calls nrt_profile_continuous_save(). This gives * the user control over which profiles are saved to disk. When a profile is not saved, the overhead * of trace serialization and disk write is avoided. Continuous profiling is ideal for scenarios where you * only want to save profiles for specific executions. In this mode you do not need to call * nrt_profile_start() and nrt_profile_stop() because they are called internally. Continuous profiling * will not start if inspect device profiling is already enabled or async execution is enabled. * * @param options[in] - options to control continuous device profiling * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_continuous_start(nrt_profile_continuous_options_t *options); /** Save NTFF profile to disk for the latest model executed on requested NeuronCore. * * Output directory will be set according to the options passed into this function. The filenames of * NTFFs within the output directory are chosen automatically to avoid conflicts. Calling save does * not stop continuous profiling. * * @param vnc[in] - (start) NeuronCore id to collect profile for * @param options[in] - options to control continuous device profiling * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_continuous_save(uint32_t vnc, nrt_profile_continuous_options_t *options); /** Stops continuous device profiling. * * Calling stop does not save a profile. * * @return NRT_SUCCESS on success. */ NRT_STATUS nrt_profile_continuous_stop(); /* Begin tracing/profiling * * Users of this API must set options through environment variables: * * - NEURON_RT_INSPECT_ENABLE: Set to 1 to enable system and device profiles. * For control over which profile types are captured, use NEURON_RT_INSPECT_SYSTEM_PROFILE * and NEURON_RT_INSPECT_DEVICE_PROFILE. * - NEURON_RT_INSPECT_OUTPUT_DIR: The directory where captured profile data will be saved to. * Defaults to ./output. * - NEURON_RT_INSPECT_SYSTEM_PROFILE: Set to 0 to disable the capture of system profiles. * Defaults to 1 when NEURON_RT_INSPECT_ENABLE is set to 1. * - NEURON_RT_INSPECT_DEVICE_PROFILE: Set to 0 to disable the capture of device profiles. * Defaults to 1 when NEURON_RT_INSPECT_ENABLE is set to 1. * - NEURON_RT_INSPECT_ON_FAIL: Set to 1 to enable dumping of device profiles in case of an error * during graph execution. Defaults to 0. * * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_begin(); /* Stop tracing/profiling and dump profile data. * Does nothing if `duration` is given to nrt_inspect_begin() and already elapsed * * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_stop(); /** @brief Options for nrt_inspect_begin_with_options API. * * Opaque struct used to preserve compatibility and enforce proper usage. * Use nrt_inspect_config_set_* functions to set options or * nrt_inspect_config_set_defaults to set use default options. * * Example Usage: * nrt_inspect_config_t *options; * nrt_inspect_config_allocate(&options); * nrt_inspect_config_set_output_dir(options, "./output"); */ typedef struct nrt_inspect_config nrt_inspect_config_t; /** Allocate memory for the options structure which is needed to * start profiling using nrt_inspect_begin_with_options. This will set all options to defaults * * @param options[out] - pointer to a pointer to options nrt_inspect_config struct * */ NRT_STATUS nrt_inspect_config_allocate(nrt_inspect_config_t **options); /** @brief all fields of the nrt_inspect_config structure to their default values. * * Default behavior after calling this function: * - Session ID: 1 * - Output directory: "./output" (when not explicitly set) * - Activity types: All activity types enabled (system_profile, device_profile, host_memory, cpu_util) * - System trace: All NeuronCores and event types enabled for capture * - Inspect mode: Disabled (profiles not captured automatically) * - Inspect on failure: Disabled (profiles not captured on execution failures) * * @param options[in,out] - Pointer to an nrt_inspect_config structure. * * @return NRT_SUCCESS on success * * @note These default values set here are NOT influenced by the environment variables. * If you are using the environment variables to set the values you do not need to use this method * or any of the nrt_inspect_config_set_* functions. */ NRT_STATUS nrt_inspect_config_set_defaults(nrt_inspect_config_t *options); /** Free up memory allocated for the options structure which is needed to * start profiling using nrt_inspect_begin_with_options * * @param options[in] - pointer to an options nrt_inspect_config struct * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_config_free(nrt_inspect_config_t *options); /** * @brief Sets the session ID for the nrt_inspect_config_t which is needed to * start profiling using nrt_inspect_begin_with_options * * @param[in,out] options Pointer to the options structure. * @param[in] session_id Session ID to set. * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_config_set_session_id(nrt_inspect_config_t *options, int session_id); /** * @brief Sets the output directory for results of * profiling using nrt_inspect_begin_with_options * * @param[in,out] options Pointer to the options structure. * @param[in] output_dir Path to the output directory. Must be a valid non-empty string * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, NRT_RESOURCE for memory allocation failure. * * @note The function makes an internal copy of the string, so the caller * does not need to keep the original string alive. * @note Call nrt_inspect_config_free() to properly clean up allocated memory. */ NRT_STATUS nrt_inspect_config_set_output_dir(nrt_inspect_config_t *options, const char *output_dir); /** * @brief Sets max number of system trace events that can be stored across all ring buffers * * @param[in,out] options Pointer to the options structure. * @param[in] sys_trace_max_events_per_nc Max number of system trace events that can be stored across all ring buffers. * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_config_set_sys_trace_max_events_per_nc(nrt_inspect_config_t *options, uint64_t sys_trace_max_events_per_nc); /** * @brief Sets system trace capture enabled for a specific NeuronCore * ring buffers won't be allocated for disabled NeuronCores * * @param[in,out] options Pointer to the options structure. * @param[in] nc_idx Index of the NeuronCore. * @param[in] enabled Boolean value to enable or disable system trace capture. * @return NRT_SUCCESS on success */ NRT_STATUS nrt_inspect_config_set_capture_enabled_for_nc(nrt_inspect_config_t *options, uint32_t nc_idx, bool enabled); /** * @brief Sets system trace capture enabled for a specific event type * can save memory and reduce output size * @param[in,out] options Pointer to the options structure. * @param[in] event_type Valid event types. * @param[in] enabled Capture enabled flag. * @return NRT_SUCCESS on success * * @note Event type must be a string from the list of supported event types. To get the list of supported event types, * use nrt_sys_trace_get_event_types in the nrt_sys_trace.h header file. */ NRT_STATUS nrt_inspect_config_set_capture_enabled_for_event_type_string(nrt_inspect_config_t *options, const char *event_type, bool enabled); /** * @brief Enable both system and device profiling for normal execution * * When disabled (default), no profiles are captured during normal execution. * This flag controls whether profiles are captured automatically for each execution. * Note: If both enable_inspect and enable_inspect_on_fail are false, no profiling occurs. * * @param[in,out] options Pointer to the options structure. * @param[in] enable_inspect Boolean value to enable or disable inspect profiling. * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters. */ NRT_STATUS nrt_inspect_config_set_enable_inspect(nrt_inspect_config_t *options, bool enable_inspect); /** * @brief Enable dumping of device profiles in case of execution failures * * When enabled, device profiles will be captured and saved when graph execution fails. * This is disabled by default. If both enable_inspect and enable_inspect_on_fail are false, * no profiling occurs at all. * * @param[in,out] options Pointer to the options structure. * @param[in] enable_inspect_on_fail Boolean value to enable or disable inspect on failure. * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters. */ NRT_STATUS nrt_inspect_config_set_enable_inspect_on_fail(nrt_inspect_config_t *options, bool enable_inspect_on_fail); /** * Begin tracing/profiling with configurable options * * Parameters: * @param[in] options - A pointer to an nrt_inspect_config struct containing configuration options * for profiling. Use nrt_inspect_config_set_* functions to set options. * If NULL is passed, default options will be used. * @return NRT_SUCCESS on success * * @note This API ignores all the NEURON_RT_INSPECT_* environment variables. * If you are using the environment variables to set the values you do not need to use this method * or any of the nrt_inspect_config_set_* functions. Use nrt_inspect_begin() instead. */ NRT_STATUS nrt_inspect_begin_with_options(nrt_inspect_config_t *options); /** * @brief Returns all available activity type strings * * This function allocates and returns an array of all supported activity type * strings. The caller is responsible for freeing both the individual strings * and the array itself, or can use nrt_inspect_config_free_activity_types(). * * @param[out] activity_types Pointer to store the allocated array of activity type strings. * @param[out] count Pointer to store the number of activity types returned. * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, * NRT_RESOURCE for memory allocation failure. */ NRT_STATUS nrt_inspect_config_get_all_activity_types(const char ***activity_types, size_t *count); /** * @brief Returns the currently enabled activity type strings * * This function examines the enabled_activities bitmask in the configuration * and returns an array of strings for only the currently enabled activity types. * The caller is responsible for freeing both the individual strings and the array itself, * or can use nrt_inspect_config_free_activity_types(). * * @param[in] options Pointer to the options structure. * @param[out] activity_types Pointer to store the allocated array of enabled activity type strings. * @param[out] count Pointer to store the number of enabled activity types returned. * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters, * NRT_RESOURCE for memory allocation failure. */ NRT_STATUS nrt_inspect_config_get_enabled_activity_types(nrt_inspect_config_t *options, const char ***activity_types, size_t *count); /** * @brief Free the activity types array allocated by nrt_inspect_config_get_all_activity_types * or nrt_inspect_config_get_enabled_activity_types. * This function properly frees both the array and all individual strings. * * @param[in] activity_types Pointer to the activity types array to be freed. * @param[in] count Number of activity types in the array. */ void nrt_inspect_config_free_activity_types(const char **activity_types, size_t count); /** * @brief Sets or clears a specific activity type in the configuration * * This function enables or disables a specific activity type by name. It converts * the activity type string to the corresponding enum value and updates the * enabled_activities bitmask accordingly. * * @param[in,out] options Pointer to the options structure. * @param[in] activity_type String name of the activity type. Valid values are: * "system_profile", "device_profile", "host_memory", * "cpu_util", "all" * @param[in] enabled True to enable the activity, false to disable it. * @return NRT_SUCCESS on success, NRT_INVALID for invalid parameters or unknown activity type. */ NRT_STATUS nrt_inspect_config_set_activity(nrt_inspect_config_t *options, const char *activity_type, bool enabled); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_status.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #ifdef __cplusplus extern "C" { #endif // NOTE: if making changes here please also keep // KaenaTools: KaenaTools: pkg/rt/rt.go in sync typedef enum { NRT_SUCCESS = 0, NRT_FAILURE = 1, // non specific failure, don't use if there is more descriptive type NRT_INVALID = 2, // e.g. invalid NEFF, bad instruction, bad DMA descriptor, input tensor name/size does not match the model, etc. // TODO invalid_handle is no longer useful because handles are not passed in nrt API // remove NRT_INVALID_HANDLE = 3, // make this one explicit instead of using more generic INVALID_INPUT because it could be a common caller mistake NRT_RESOURCE = 4, // failed to allocate a resource for requested operation // TODO separate exec timeout from others NRT_TIMEOUT = 5, // operation timed out NRT_HW_ERROR = 6, // Hardware failure NRT_QUEUE_FULL = 7, // not enough space in the execution input queue NRT_LOAD_NOT_ENOUGH_NC = 9, // Failed to allocate enough NCs for loading a NEFF NRT_UNSUPPORTED_NEFF_VERSION = 10, // Unsupported version of NEFF // DO NOT USE - keep for backward compat NRT_FAIL_HOST_MEM_ALLOC = 11, // failed to allocate host memory // Unique retcodes to help the caller identify when nrt apis are called outside the scope of nrt_init() and nrt_close() NRT_UNINITIALIZED = 13, NRT_CLOSED = 14, NRT_QUEUE_EMPTY = 15, // Accessed a queue with no data NRT_EXEC_UNIT_UNRECOVERABLE = 101, // Encountered fatal error and Execution Unit is in limbo, cannot recover. Need to reset NRT_EXEC_BAD_INPUT = 1002, // invalid input has been submitted to exec() NRT_EXEC_COMPLETED_WITH_NUM_ERR = 1003, // execution was completed with numerical errors (produced NaN) NRT_EXEC_COMPLETED_WITH_ERR = 1004, // execution was completed with other errors, // either logical - event double clear, or physical - parity error NRT_EXEC_NC_BUSY = 1005, // the neuron core is locked (in use) by another model/process NRT_EXEC_OOB = 1006, // one or more indirect memcopies and/or embedding updates are out of bound NRT_COLL_PENDING = 1100, // collective operation is still pending // classify different types of execution hangs/timeouts. For unknown/generic hang, use NRT_TIMEOUT. NRT_EXEC_HW_ERR_COLLECTIVES = 1200, // Stuck in collectives op (missing notification(s)). Possibly caused by a hardware error on another worker. NRT_EXEC_HW_ERR_HBM_UE = 1201, // An HBM encountered an unrepairable uncorrectable error and produced incorrect results. NRT_EXEC_HW_ERR_NC_UE = 1202, // An on-chip memory of Neuron Core encountered a parity error and produced incorrect results. NRT_EXEC_HW_ERR_DMA_ABORT = 1203, // A DMA engine encountered an unrecoverable error. NRT_EXEC_SW_NQ_OVERFLOW = 1204, // Software notification queue overflow. NRT_EXEC_HW_ERR_REPAIRABLE_HBM_UE = 1205, // An HBM encountered an repairable uncorrectable error and produced incorrect results. NRT_NETWORK_PROXY_FAILURE = 1206, // EFA network proxy operation failed. } NRT_STATUS; const char *nrt_get_status_as_str(NRT_STATUS status); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_sys_trace.h ================================================ /* * Copyright 2025, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #include #ifdef __cplusplus extern "C" { #endif /* * This is a public interface used by both the fetch api (which allows near * real-time querying of captured events), and inspect profiling (which saves * captured events to disk), as well as other profiling functions. */ //------------------------------------------------ // Section: System Trace Capture //------------------------------------------------ typedef struct nrt_sys_trace_config nrt_sys_trace_config_t; /** Allocate memory for the options structure which is needed to * start profiling using nrt_sys_trace_start. This will set all options to * defaults. The reason we use an _allocate function is so that users don't need * to know the size or implementation details of the config struct. * * @param options[in] - pointer to a pointer to options nrt_sys_trace_config struct * */ NRT_STATUS nrt_sys_trace_config_allocate(nrt_sys_trace_config_t **options); /** Set all fields of the nrt_sys_trace_config structure to their default values. * * @param options[in,out] - Pointer to an nrt_sys_trace_config structure. */ void nrt_sys_trace_config_set_defaults(nrt_sys_trace_config_t *options); /** Free up memory allocated for the options structure which is needed to * start profiling using nrt_sys_trace_start * * @param options[in] - pointer to an options nrt_sys_trace_config struct * */ void nrt_sys_trace_config_free(nrt_sys_trace_config_t *options); /** * @brief Sets max number of events that can be stored across all ring buffers * * @param[in,out] options Pointer to the options structure. * @param[in] max_events_per_nc Max number of events that can be stored in each ring buffer. */ void nrt_sys_trace_config_set_max_events_per_nc(nrt_sys_trace_config_t *options, uint64_t max_events_per_nc); /** * @brief Sets system trace capture enabled for a specific NeuronCore * ring buffers won't be allocated for disabled NeuronCores. * Can save memory, reduce output size, and speed up trace processing. * @param[in,out] options Pointer to the options structure. * @param[in] nc_idx NeuronCore index. * @param[in] enabled Capture enabled flag. */ void nrt_sys_trace_config_set_capture_enabled_for_nc(nrt_sys_trace_config_t *options, uint32_t nc_idx, bool enabled); /** * @brief Sets system trace capture enabled for a specific event type. * Can save memory, reduce output size, and speed up trace processing. * @param[in,out] options Pointer to the options structure. * @param[in] event_type Event type string, possible values are from nrt_sys_trace_get_event_types * @param[in] enabled Capture enabled flag. */ NRT_STATUS nrt_sys_trace_config_set_capture_enabled_for_event_type(nrt_sys_trace_config_t *options, const char *event_type, bool enabled); /** * @brief Returns an allocated array of all valid event type strings. * @param[out] event_types Pointer to array of const char* (allocated). * @param[out] count Number of event types. * @return NRT_SUCCESS on success, error code otherwise. * @note The user is responsible for freeing the array and each string, or can use * nrt_sys_trace_free_event_types() for convenience. * * Example usage: * const char **event_types = nullptr; * size_t count = 0; * NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count); * // Manual cleanup: * for (size_t i = 0; i < count; ++i) { * free((void*)event_types[i]); * } * free((void*)event_types); * // Or use convenience function: * nrt_sys_trace_free_event_types(event_types, count); */ NRT_STATUS nrt_sys_trace_get_event_types(const char ***event_types, size_t *count); /** * @brief Free the event types array allocated by nrt_sys_trace_get_event_types. * This function properly frees both the array and all individual strings. * * @param[in] event_types Pointer to the event types array to be freed. * @param[in] count Number of event types in the array. */ void nrt_sys_trace_free_event_types(const char **event_types, size_t count); /** * @brief Returns an allocated array of enabled event type strings for the given config. * @param[in] options Pointer to the nrt_sys_trace_config_t structure. * @param[out] event_types Pointer to array of const char* (allocated). * @param[out] count Number of enabled event types. * @return NRT_SUCCESS on success, error code otherwise. * @note The user is responsible for freeing the array and each string. */ NRT_STATUS nrt_sys_trace_config_get_enabled_event_types(nrt_sys_trace_config_t *options, const char ***event_types, size_t *count); // Initiailization for system trace capture including allocating memory for event ring buffers NRT_STATUS nrt_sys_trace_start(nrt_sys_trace_config_t *options); // Teardown for system trace capture including freeing allocated memory for event ring buffers NRT_STATUS nrt_sys_trace_stop(); //------------------------------------------------ // Section: System Trace Fetch //------------------------------------------------ typedef struct nrt_sys_trace_fetch_options nrt_sys_trace_fetch_options_t; NRT_STATUS nrt_sys_trace_fetch_options_allocate(nrt_sys_trace_fetch_options_t **options); void nrt_sys_trace_fetch_options_set_defaults(nrt_sys_trace_fetch_options_t *options); void nrt_sys_trace_fetch_options_free(nrt_sys_trace_fetch_options_t *options); // Max number of events to fetch per NeuronCore void nrt_sys_trace_fetch_options_set_max_events_per_nc(nrt_sys_trace_fetch_options_t *options, uint64_t max_events_per_nc); // Fetch events only for specified NeuronCore void nrt_sys_trace_fetch_options_set_nc_idx(nrt_sys_trace_fetch_options_t *options, uint64_t nc_idx); /** * Fetches system trace events from process memory and returns them as a JSON-formatted string. * Once events are fetched, they cannot be fetched again. * * @param[out] buffer On successful return, will point to a dynamically allocated, null-terminated * JSON string containing the trace events. Memory for the output buffer is * allocated internally; therefore, the caller should not allocate or initialize * the buffer before calling this function. Instead, the caller must initialize * the buffer pointer to NULL and, after a successful call, is responsible for * freeing the allocated memory by calling nrt_sys_trace_buffer_free(buffer). * * @param[out] written_size A pointer to a size_t variable that will be set to the number of bytes written * into the allocated buffer. * * @param[in] options Pointer to options such as max number of events to fetch. * * @return NRT_SUCCESS on success. * * Usage example: * char *buffer; * size_t written_size; * nrt_sys_trace_fetch_options_t *options; * nrt_sys_trace_fetch_options_allocate(&options); * nrt_sys_trace_fetch_options_set_nc_idx(options, 0); // Fetch events from NeuronCore 0 only instead of all * nrt_sys_trace_fetch_options_set_max_events_per_nc(options, 10000); // Fetch up to 10,000 events instead of all * nrt_sys_trace_fetch_events(&buffer, &written_size, options); * // or if you want to use the default options: * nrt_sys_trace_fetch_events(&buffer, &written_size, NULL); * // finally free the buffer when the events are no longer needed: * nrt_sys_trace_buffer_free(buffer) */ NRT_STATUS nrt_sys_trace_fetch_events(char **buffer, size_t *written_size, const nrt_sys_trace_fetch_options_t *options); /** Free the buffer allocated by nrt_sys_trace_fetch_events. Should be called after the events are no longer needed. * * @param buffer [in] - Pointer to buffer to be freed. * * @return NRT_SUCCESS on success. */ void nrt_sys_trace_buffer_free(char *buffer); #ifdef __cplusplus } #endif ================================================ FILE: src/libnrt/include/nrt/nrt_version.h ================================================ /* * Copyright 2021, Amazon.com, Inc. or its affiliates. All Rights Reserved */ #pragma once #ifdef __cplusplus extern "C" { #endif #define RT_VERSION_DETAIL_LEN 128 #define GIT_HASH_LEN 64 typedef struct nrt_version { uint64_t rt_major; uint64_t rt_minor; uint64_t rt_patch; uint64_t rt_maintenance; char rt_detail[RT_VERSION_DETAIL_LEN]; char git_hash[GIT_HASH_LEN]; } nrt_version_t; /** Get the NRT library version * * @param ver[out] - Pointer to nrt version struct * @param size[in] - Length of the data needed to be filled in the nrt_version_struct * * @return NRT_STATUS_SUCCESS on success. */ NRT_STATUS nrt_get_version(nrt_version_t *ver, size_t size); #ifdef __cplusplus } #endif ================================================ FILE: src/neuron-gatherinfo/LICENSE ================================================ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: src/neuron-gatherinfo/clear_params_tfpb.py ================================================ import re import copy import argparse import tensorflow as tf import numpy as np import string from google.protobuf import text_format from tensorflow.core.framework import node_def_pb2 from tensorflow.core.framework import attr_value_pb2 from tensorflow.python.framework import tensor_util from tensorflow.tools.graph_transforms import TransformGraph def zero_const(node): val = tf.make_ndarray(node.attr.get("value").tensor) new_val = val * 0.0 new_tensor = tensor_util.make_tensor_proto(new_val, new_val.dtype, new_val.shape) node.attr["value"].CopyFrom(attr_value_pb2.AttrValue(tensor=new_tensor)) def ZeroAllConst(graphdef): sess = tf.compat.v1.Session(graph=tf.import_graph_def(graphdef)) const_by_name = {} node_by_name = {} for node in graphdef.node: node_by_name[node.name] = node if node.op == "Const": const_by_name[node.name] = node if node.op == "BiasAdd" or node.op == "MatMul" \ or node.op.startswith("Conv") \ or node.op.startswith("FusedBatchNorm"): for i in node.input: i_node = node_by_name[i] if i_node.op == "Const": zero_const(i_node) if i_node.op == "Identity": x_node = node_by_name[i_node.input[0]] if x_node.op == "Const": zero_const(x_node) return graphdef def load_graph(model_file): graph_def = tf.compat.v1.GraphDef() with open(model_file, "rb") as f: graph_def.ParseFromString(f.read()) return graph_def if __name__ == "__main__": parser = argparse.ArgumentParser(description="Zero-out parameters of BiasAdd, MatMul, Conv*, and FusedBatchNorm of TensorFlow frozen graph.") parser.add_argument("--graph", help="File name of frozen graph to be converted", required=True) parser.add_argument("--out_graph", help="File name to save converted frozen graph", required=True) args = parser.parse_args() graph_orig = load_graph(args.graph) graph_mod = ZeroAllConst(graph_orig) with tf.io.gfile.GFile(args.out_graph, "wb") as f: f.write(graph_mod.SerializeToString()) #with tf.io.gfile.GFile(args.out_graph + "txt", 'w') as f: # f.write(text_format.MessageToString(graph_mod)) ================================================ FILE: src/neuron-gatherinfo/mx_neuron_check_model.py ================================================ import os import json import sys import struct import argparse import subprocess from collections import Counter class neuron_parser: def __init__(self): self.parser = argparse.ArgumentParser() self.parser.add_argument('model_path', type=str, help='path prefix to MXNet model (the part before -symbol.json).') self.parser.add_argument('--show_names', action='store_true', help='list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model).') self.parser.add_argument('--expand_subgraph', action='store_true', help='show subgraph operations.') self.parser_args = self.parser.parse_args() self.neuronop_info = {} self.total_pipeline_cores = 0 self.min_required_pipeline_cores = 0 path = self.parser_args.model_path if os.path.exists(path + '-symbol.json'): self.load_mxnet_model(path) elif os.path.isdir(path): self.load_tensorflow_model(path) else: raise RuntimeError('Cannot determine framework type from model path argument.') self.supported = self.get_neuron_supported() self.supported.extend(self.addl_support) for name, executable, (sg_nodetypes, sg_nodenames) in self.neuron_nodes: num_cores, requested_cores, _ = self.get_cores_from_executable(executable) self.neuronop_info[name] = (num_cores, requested_cores, sg_nodetypes, sg_nodenames) self.total_pipeline_cores += num_cores if num_cores > self.min_required_pipeline_cores: self.min_required_pipeline_cores = num_cores def get_neuron_supported(self): exec_cmd = ["neuron-cc", "list-operators", "--framework", self.framework] oplist = subprocess.check_output(' '.join(exec_cmd), shell=True) oplist = str(oplist, 'utf-8') oplist = oplist.split("\n") return oplist[:-1] # Remove the last element which is '' def get_tf_subgraph_types_names(self, node): from tensorflow.core.framework import graph_pb2 graph_def = graph_pb2.GraphDef() graph_def.ParseFromString(node.attr['graph_def'].s) sg_nodes = graph_def.node sg_nodes = [sg_node for sg_node in sg_nodes if sg_node.op not in self.excl_types] nodetypes = [sg_node.op for sg_node in sg_nodes] nodenames = [sg_node.name for sg_node in sg_nodes] return nodetypes, nodenames def load_tensorflow_model(self, path): import tensorflow as tf import tensorflow_hub as hub self.framework = 'TENSORFLOW' self.neuron_optype = "NeuronOp" self.excl_types = ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2'] self.addl_support = ['FusedBatchNormV3', 'BatchMatMulV2', 'AddV2', 'StopGradient', self.neuron_optype] model = hub.load(path) graph_def = model.graph.as_graph_def() nodes = graph_def.node nodes = [node for node in nodes if node.op not in self.excl_types] self.nodetypes = [node.op for node in nodes] self.nodenames = [node.name for node in nodes] self.neuron_nodes = [(node.name, node.attr['executable'].s, self.get_tf_subgraph_types_names(node)) for node in nodes if node.op == self.neuron_optype] def get_mx_subgraph_types_names(self, node): nodetypes = [] nodenames = [] for sg in node['subgraphs']: filtered_nodes = [sg_node for sg_node in sg['nodes'] if sg_node['op'] not in self.excl_types] nodetypes.extend([sg_node['op'] for sg_node in filtered_nodes]) nodenames.extend([sg_node['name'] for sg_node in filtered_nodes]) return nodetypes, nodenames def load_mxnet_model(self, path): import mxnet as mx if mx.__version__ != "1.5.1": try: import mx_neuron as mxn except: raise "Please install mxnetneuron package." self.framework = 'MXNET' self.neuron_optype = "_neuron_subgraph_op" self.excl_types = ['null'] self.addl_support = [self.neuron_optype] sym, args, auxs = mx.model.load_checkpoint(path, 0) nodes = json.loads(sym.tojson())["nodes"] nodes = [node for node in nodes if node['op'] not in self.excl_types] self.nodetypes = [node['op'] for node in nodes] self.nodenames = [node['name'] for node in nodes] neuron_nodes_tmp = [node for node in nodes if node['op'] == self.neuron_optype] self.neuron_nodes = [(node['name'], bytearray(args[node['name']+"_neuronbin"].asnumpy()), self.get_mx_subgraph_types_names(node)) for node in neuron_nodes_tmp] @staticmethod def get_cores_from_executable(executable): _NC_HEADER_SIZE = 544 header = executable[:_NC_HEADER_SIZE] info = list(struct.unpack('168xI304xI64B', header)) numCores = info.pop(0) numCoresRequested = info.pop(0) coresPerNode = info return numCores, numCoresRequested, coresPerNode # Display table of operation type or name and whether supported or not def print_node_type_info(self): self.cnt_total = len(self.nodetypes) self.cnt_supported = 0 if self.parser_args.show_names: widthn = max(max(map(len, self.nodenames)), 8) widtht = max(max(map(len, self.nodetypes)), 8) format_str = "{:<" + str(widthn) + "} {:<" + str(widtht) + "} {:<4}" pp = lambda x: print(format_str.format(*x)) pp(['Op Name', 'Op Type', 'Neuron Supported ?']) pp(['-------', '-------', '------------------']) for idx, opname in enumerate(self.nodenames): optype = self.nodetypes[idx] if optype in self.supported: pp([opname, optype, 'Yes']) self.cnt_supported += 1 for idx, opname in enumerate(self.nodenames): optype = self.nodetypes[idx] if optype not in self.supported: pp([opname, optype, 'No']) else: count = Counter(self.nodetypes) width = max(max(map(len, self.nodetypes)), 8) format_str = "{:<" + str(width) + "} {:<14} {:<4}" pp = lambda x: print(format_str.format(*x)) pp(['Op Type', 'Num Instances', 'Neuron Supported ?']) pp(['-------', '-------------', '------------------']) for key in count: if key in self.supported: pp([key, count[key], 'Yes']) self.cnt_supported += count[key] for key in count: if key not in self.supported: pp([key, count[key], 'No']) print() def print_subgraph_ops(self, sg_nodetypes, sg_nodenames): if self.parser_args.show_names: widthn = max(max(map(len, sg_nodenames)), 8) widtht = max(max(map(len, sg_nodetypes)), 8) format_str = "{:<" + str(widthn) + "} {:<" + str(widtht) + "}" pp = lambda x: print(' ', format_str.format(*x)) pp(['Op Name', 'Op Type']) pp(['-------', '-------']) for idx, opname in enumerate(sg_nodenames): optype = sg_nodetypes[idx] pp([opname, optype]) else: count = Counter(sg_nodetypes) width = max(max(map(len, sg_nodetypes)), 8) format_str = "{:<" + str(width) + "} {:<14}" pp = lambda x: print(' ', format_str.format(*x)) pp(['Op Type', 'Num Instances']) pp(['-------', '-------------']) for key in count: pp([key, count[key]]) def print_neuron_node_info(self): idx = 0 width = max(max(map(len, self.neuronop_info)), 14) format_str = "{:<" + str(width) + "} {:<14}" pp = lambda x: print(format_str.format(*x)) pp(['Subgraph Name', 'Num Pipelined NeuronCores']) pp(['-------------', '-------------------------']) core_cnt_list = [] for name, (num_cores, _, sg_nodetypes, sg_nodenames) in self.neuronop_info.items(): pp([name, num_cores]) core_cnt_list.append(num_cores) idx += 1 if self.parser_args.expand_subgraph: self.print_subgraph_ops(sg_nodetypes, sg_nodenames) print() def print_neuron_support_stats(self): print("* Total inference operations: {}".format(self.cnt_total)) print("* Total Neuron supported inference operations: {}".format(self.cnt_supported)) if self.cnt_total > 0: perc = self.cnt_supported / self.cnt_total * 100 else: perc = 0 print("* Percent of total inference operations supported by Neuron: {:.1f}".format(perc)) print() def print_common_desc(self): if self.parser_args.show_names: print("* Each line shows an operation name and whether the type of that operation is supported in Neuron.") else: print("* Each line shows an operation type, the number of instances of that type within model,\n" \ "* and whether the type is supported in Neuron.") print("* Some operation types are excluded from table because they are no-operations or training-related operations:\n", \ self.excl_types, "\n") def run(self): if len(self.neuronop_info) > 0: print("\n* Found {} Neuron subgraph(s) ({}(s)) in this compiled model.\n" \ "* Use this tool on the original uncompiled model to see Neuron supported operations.\n" \ "* The following table shows all operations, including Neuron subgraphs.".format(len(self.neuronop_info), self.neuron_optype)) self.print_common_desc() self.print_node_type_info() print('* Please run this model on Inf1 instance with at least {} NeuronCore(s).'.format(self.min_required_pipeline_cores)) print("* The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n"\ "* (and subgraph operations if --expand_subgraph is used):\n") self.print_neuron_node_info() else: print("\n* The following table shows the supported and unsupported operations within this uncompiled model.") self.print_common_desc() self.print_node_type_info() self.print_neuron_support_stats() if __name__=='__main__': toolkit = neuron_parser() toolkit.run() ================================================ FILE: src/neuron-gatherinfo/neuron-gatherinfo.py ================================================ #!/usr/bin/env python3 # coding=utf-8 """ Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0 Program to gather information from a system """ import sys import os import argparse import shutil import subprocess import re ACTUAL_CMD = os.path.realpath(sys.argv[0]) USAGE_MSG = """ Usage: {} [options] This program is used to gather information from this system for analysis and debugging """.format(ACTUAL_CMD) EXCLUDE_FILES_BY_NAME = "weight files, model, NEFF (Neuron Executable File Format)" HELP_CC_FILES = """ Location of the neuron-cc generated files """ DEFAULT_CCFILES_LOCATION = "~/bin" SYSLOG_SEARCH_PATTERNS = r"nrtd|neuron|kernel:" EXTERNAL_CMDS = ["lscpu", "lshw", "lspci | grep -i Amazon", "neuron-cc --version", "neuron-ls", "top -b -n 1", "uname -a", "uptime", ] PROC_FILES = ["/proc/cmdline", "/proc/cpuinfo", "/proc/filesystems", "/proc/interrupts", "/proc/iomem", "/proc/loadavg", "/proc/meminfo", "/proc/modules", "/proc/mtrr", "/proc/version", ] HELP_ADDITIONAL_FILE_OR_DIR = """ Additional file or directory that the user wants to provide in the archive. The user can sanitize this file or directory before sharing """ INCLUDE_MSG = """ By default, only the lines containing (grep) patterns like '{}' from the syslog are copied. Other lines are excluded. Using this option allows the timestamp section of other lines to be included. The rest of the contents of the line itself are elided. Providing the timestamp section may provide time continuity while viewing the copied syslog file """.format(SYSLOG_SEARCH_PATTERNS) HELP_RT_FILES = """ Location of the neuron runtime generated files """ MISCINFO_FILE = 'miscinfo.txt' HELP_VERBOSE = """ Verbose mode displays commands executed and any additional information which may be useful in debugging the tool itself """ INCLUDE_EXTNS = ('.pb') HELP_INCLUDE_EXTN_FILES = """ Include files with these extensions from the compiler work directory in the archive: {} """.format(INCLUDE_EXTNS) HELP_STDOUT = """ The file where the stdout of the compiler run was saved """ HELP_OUTDIR_MSG = """ The output directory where all the files and other information will be stored. The output will be stored as an archive as well as the actual directory where all the contents are copied. This will allow a simple audit of the files, if necessary. *** N O T E ***: Make sure that this directory has enough space to hold the files and resulting archive """ USERCMDFILE = "how-the-user-executed-the-script-{}.txt".format(os.path.basename(ACTUAL_CMD)) NEURONDUMPPROGRAM = "/opt/aws/neuron/bin/neuron-dump.py" NEURONDUMPFILE = os.path.splitext(os.path.basename(NEURONDUMPPROGRAM))[0] NEURON_ERRMSG = "Error: File {} doesn't exist, aws-neuron-tool package isn't installed?".format( NEURONDUMPPROGRAM) NEURON_INFO_TARBALL = "{}".format(os.path.splitext(os.path.basename(ACTUAL_CMD))[0]) NEURONTMPDIR = NEURON_INFO_TARBALL ARCHIVE_MSG = "\n\n\t******\n\tArchive created at:\n\t\t{}\n\tFrom directory:\n\t\t{}\n\t******\n\n" NOT_IMPLEMENTED_MSG = ", nothing to see here, folks (not implemented as yet)" # these are the only compiler-generated files that are included by default COMPILER_FILES = ['graph_def.neuron-cc.log', 'all_metrics.csv', 'hh-tr-operand-tensortensor.json'] COMPILER_FILES_USER_OPT_IN = ['exp_and_others.json', 'graph_def.neff', 'graph_def.pb', 'hh-spilled.json', 'hh-tr-accDN2virtDN.json', 'hh-tr-external-move.json', 'hh-tr-internal-move.json', 'hh-tr-removeDN.json', 'hh-transforms.json', 'wavegraph.json', 'hh.json', 'pass03_scheduling.json', 'relay_graph_opt_pre_color.txt', 'relay_graph_post_opt_kelp.txt', 'relay_graph_post_opt_unit_level.txt', 'relay_graph_pre_opt.txt', 'saved_model.pb', 'sch.json', 'sch_tmp.json', 'schedule_trace.json', 'wavegraph-bin.json'] MODEL_DATA_MSG = """ By using this option, the entire compiler work directory's contents will be included (excluding the {} files, unless an additional option is used). This would include model information, etc. The files that are included, by default, are these: {} """.format(INCLUDE_EXTNS, ", ".join(COMPILER_FILES)) MODEL_DATA_MSG_INFO = """ \t************************** \tBased on your command line option, we're also packaging these files: \t\t{} \tAnd this directory: {} \t************************** """ def get_os_version(): ''' function to obtain the Linux version Args: Output: Returns: string with value 'Ubuntu' or 'RedHat' ''' try: with open("/proc/version") as fdin: data = fdin.read() if data.find('Ubuntu') == -1: osver = 'RedHat' else: osver = 'Ubuntu' except FileNotFoundError: osver = 'Ubuntu' return osver def get_files(*, basedir, matchfiles, verbose): ''' function to get the files based on a base directory and file extension Args: basedir : base directory where files reside matchfiles : set of files to match verbose : flag to indicate if verbose messages need to be displayed Output: Returns: list of files found ''' myfiles = list() for dpath, _, files in os.walk(basedir): for mfile in files: if mfile in matchfiles: mfile = os.path.realpath(os.path.join(dpath, mfile)) if os.path.isfile(mfile): myfiles.append(mfile) else: if verbose: print("Warning: {} is not a file".format(mfile)) return myfiles def dump_compiler_info(*, outdir, location, allowmodel=False, addfldir=None, verbose=False): ''' function to gather the following information: Framework: - TensorFlow - MXNet - PyTorch Compiler: Args: outdir : output directory location : location of compiler-generated files allowmodel : if True, allow gathering of additional files verbose : flag to indicate if verbose messages need to be displayed Output: compiler-generated files copied to outdir Returns: ''' if location is not None: if allowmodel: # copy the entire directory try: shutil.copytree(location, os.path.join(outdir, os.path.basename(location)), ignore_dangling_symlinks=True) except shutil.Error: pass else: fileset = set(COMPILER_FILES) l1data = get_files(basedir=location, matchfiles=fileset, verbose=verbose) copy_files(outdir=outdir, basedir=location, filelist=l1data, verbose=verbose) if addfldir is not None: if os.path.isfile(addfldir): shutil.copy(addfldir, outdir) else: # directory copy try: shutil.copytree(addfldir, os.path.join(outdir, os.path.basename(addfldir)), ignore_dangling_symlinks=True) except shutil.Error: pass # print("Function: ", sys._getframe().f_code.co_name, # pylint: disable=W0212 # NOT_IMPLEMENTED_MSG) def copy_stdout(*, outdir, stdout, verbose): ''' function to copy the stdout file to the destination location Args: outdir : destination location (output directory) stdout : file containing the output of running neuron-cc verbose : flag to indicate if verbose messages need to be displayed Output: Returns: ''' if verbose: print("Copying {} to {}".format(stdout, outdir)) shutil.copy(stdout, outdir) def copy_syslog(*, outdir, include_flag=False, verbose): ''' function to copy contents of the syslog to the output directory Args: outdir : output directory location where the syslog's contents are to be copied include_flag : if True, include lines that do not match verbose : flag to indicate if verbose messages need to be displayed Output: copy of syslog's contents with just "Neuron-specific" lines Returns: ''' # syslog looks like this: # 2019-11-21T19:32:50.347183+00:00 ink neuron-rtd[17977]: nrtd[17977]: # The first regex (regex1) is used to match lines that we want to see in our copy regex1 = re.compile(r'^(\S+)\s.*?({})'.format(SYSLOG_SEARCH_PATTERNS)) regex2 = re.compile(r'^(\S+)\s') osver = get_os_version() if osver == 'Ubuntu': syslog = '/var/log/syslog' else: syslog = '/var/log/messages' try: with open(syslog) as fdin,\ open(os.path.join(outdir, 'copy-of-syslog'), 'w') as fdout: for line in fdin: match = regex1.search(line) if match is not None: fdout.write(line) else: if include_flag: match = regex2.match(line) if match is not None: # exclude the rest of the line fdout.write(match.group(1) + ' XXX contents elided XXX\n') else: print("Error in parsing this line: {}".format(line)) except FileNotFoundError: print("Error, /var/log/syslog not found") def dump_rt_info(*, location, verbose): ''' function to dump the following information: - runtime - Framework (??) Args: location: location of runtime files verbose : flag to indicate if verbose messages need to be displayed Returns: list of info ''' # l1data = get_files(basedir=location, file_extn=('.sh')) print("Function: ", sys._getframe().f_code.co_name, # pylint: disable=W0212 NOT_IMPLEMENTED_MSG) def allow_capture_of_files(): ''' function to allow the capture of files from the customer's environment This is OFF by default and has to be explicitly enabled by the command-line option by the user Args: Output: Returns: ''' print("Function: ", sys._getframe().f_code.co_name, # pylint: disable=W0212 NOT_IMPLEMENTED_MSG) def add_additional_filters(filterfile): ''' function to apply additional filters to files that are being captured Args: filterfile : text file with patterns (regexs), one per line, to use as filters Output: Returns: ''' print("Function: ", sys._getframe().f_code.co_name, # pylint: disable=W0212 NOT_IMPLEMENTED_MSG) def dump_miscinfo(*, outdir, verbose): ''' function to dump miscellaneous information, including: - system info (uname -a) - package info (??? list of packages installed) - neuron-ls - neuron-top Args: outdir : output directory verbose : flag to indicate if verbose messages need to be displayed Output: Creates various reports in the outdir location Returns: ''' osver = get_os_version() if osver == 'Ubuntu': pkgcmds = ["apt list | egrep '^aws'", "pip list | egrep '^neuron|^numpy|^tensor|^scipy'"] else: pkgcmds = ["rpm -qa | egrep '^aws|^neuron|^numpy|^tensor|^scipy'"] cmds = EXTERNAL_CMDS + pkgcmds for cmd in cmds: cmdname = cmd.split(' ')[0] # get just the command name for creating the file cmdfile = os.path.join(outdir, "report-{}.txt".format(cmdname)) with open(cmdfile, "w") as fdout: if verbose: print("Running cmd: {} and capturing output in file: {}".format(cmd, cmdfile)) try: res = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True) stdout, stderr = res.communicate() if stderr is not None: fdout.write("Error in executing cmd: {}\nError: {}\n".format(cmd, str(stderr))) else: fdout.write("Output from executing cmd: {}\n\n{}\n".format(cmd, str(stdout))) except (OSError, ValueError) as err: fdout.write("Error in executing cmd: {}\nError: {}\n".format(cmd, err)) def dump_proc_info(*, outdir, verbose): ''' function to dump information related to "/proc" Args: outdir : output directory verbose : flag to indicate if verbose messages need to be displayed Output: Creates various reports in the outdir location Returns: ''' for procfile in PROC_FILES: fname = procfile.split('/') # use the 2nd and 3rd items from this (canonical form) pfile = os.path.join(outdir, "report-{}-{}.txt".format(fname[1], fname[2])) if verbose: print("Copying contents of: {} to: {}".format(procfile, pfile)) try: with open(pfile, "w") as fdout, open(procfile) as fdin: fdout.write("Contents of {}\n\n".format(procfile)) fdout.write(fdin.read()) except FileNotFoundError: print("Error: file {} not found\n".format(procfile)) def sanity_check(options): ''' function to check if command-line arguments are valid Args: options : options from argparse parser Output: Returns: 0 : success 1 : failure ''' # the script has to be run as root or "sudo" if os.getuid() != 0: print("*** Rerun this script as user 'root' or as sudo **\n\n") return 1 outdir = options.outdir retval = 0 if os.path.isfile(outdir) or os.path.isdir(outdir): print("Error: {} already exists, please provide a non-existing directory".format(outdir)) retval = 1 if not os.path.isfile(options.stdout): print("Error: {} doesn't exist, please provide an existing file".format(options.stdout)) retval = 1 if options.addfldir is not None: if not os.path.isfile(options.addfldir) and not os.path.isdir(options.addfldir): print("Error: {} isn't a file nor a directory".format(options.addfldir)) retval = 1 for mydir in [options.ccdir, options.rtdir]: if mydir is not None and not os.path.isdir(mydir): print("Error: {} is not a directory, please provide a directory".format(mydir)) retval = 1 if options.allowmodel and options.ccdir is None: print("Error: you need to specify a compiler work directory along with the 'm' option") retval = 1 return retval def copy_files(*, outdir, basedir, filelist, verbose): ''' function to copy files from the original source area into the destination. This is also the place for any massaging or eliding of file contents Args: outdir : destination location basedir : base directory from where the files are to be copied filelist: list of files to be copied verbose : flag to indicate if verbose messages need to be displayed Output: Copy of files (possibly altered) from the source Returns: ''' for thisfile in filelist: myfile = '.' + thisfile[len(basedir):] mydir = os.path.dirname(os.path.join(outdir, myfile)) if not os.path.isdir(mydir): os.makedirs(mydir) shutil.copy(thisfile, mydir, follow_symlinks=True) def write_miscinfo(*, outdir, data): ''' function to write out the contents of the miscellaneous commands Args: outdir : destination location data : list of strings to be stored in a file Output: MISCINFO_FILE created with the contents of the output of the various commands ''' flname = os.path.join(outdir, MISCINFO_FILE) with open(flname, "w") as fdout: fdout.write("\n".join(data)) def run_neuron_dump(outdir, verbose): ''' function to call the existing neuron-dump.py tool Args: outdir : destination location verbose : flag to indicate if verbose messages need to be displayed Output: tarball created by this tool Returns: ''' if not os.path.isfile(NEURONDUMPPROGRAM): print(NEURON_ERRMSG) return cmd = "{} -o {}".format(NEURONDUMPPROGRAM, os.path.join(outdir, NEURONDUMPFILE)) if verbose: print("Executing command: {}".format(cmd)) try: res = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True, shell=True) stdout, stderr = res.communicate() if stderr is not None: print("Error in executing cmd: {}\nError: {}\n".format(cmd, str(stderr))) except (OSError, ValueError) as err: print("Error in executing cmd: {}\nError: {}\n".format(cmd, err)) if verbose: print("Output of cmd: {}\n{}".format(cmd, stdout)) def package_tarball(*, outdir, allowmodel, ccdir, verbose): ''' function to package everything into a tarball Args: outdir : output directory allowmodel : flag to indicate whether the user has allowed gathering of model data Output: A tar ball created in directory one level above outdir this would be the directory provided by the user Returns: ''' mytarball = os.path.join(os.path.split(outdir)[0], NEURON_INFO_TARBALL) if verbose: print("Creating archive: {}".format(mytarball)) archivefile = shutil.make_archive(mytarball, 'gztar', outdir) print(ARCHIVE_MSG.format(archivefile, outdir)) if allowmodel: print(MODEL_DATA_MSG_INFO.format("\n\t\t".join(COMPILER_FILES), ccdir)) def add_cmdline_args(): ''' function to add the command line arguments and options Args: Output: Returns: parser for cmd line ''' parser = argparse.ArgumentParser( formatter_class=argparse.RawDescriptionHelpFormatter, description=USAGE_MSG) parser.add_argument('--additionalfileordir', dest='addfldir', help=HELP_ADDITIONAL_FILE_OR_DIR, default=None) parser.add_argument('-c', '--compileroutdir', dest='ccdir', help=HELP_CC_FILES, default=None) parser.add_argument('-i', '--include', dest='includemismatch', help=INCLUDE_MSG, action='store_true', default=False) parser.add_argument('-f', '--filter', dest='filterfile', default=None) parser.add_argument('-m', "--modeldata", # data related to model, etc. will be gathered dest='allowmodel', action='store_true', help=MODEL_DATA_MSG, default=False) parser.add_argument('-o', '--out', dest='outdir', help=HELP_OUTDIR_MSG, required=True) parser.add_argument('-r', '--runtimeoutdir', dest='rtdir', help=HELP_RT_FILES, default=None) parser.add_argument('-s', '--stdout', dest='stdout', help=HELP_STDOUT, required=True) parser.add_argument('-v', '--verbose', dest='verbose', help=HELP_VERBOSE, action='store_true', default=False) return parser def main(): """ main function creates command-line option parser, sanity checks, and then executes code based on command-line options """ parser = add_cmdline_args() if len(sys.argv) == 1: parser.print_help() sys.exit(1) options = parser.parse_args() # append the directory where we'll create files to what the user provides options.outdir = os.path.realpath(os.path.join(options.outdir, NEURONTMPDIR)) if options.ccdir is not None: options.ccdir = os.path.realpath(options.ccdir) if options.addfldir is not None: options.addfldir = os.path.realpath(options.addfldir) if options.rtdir is not None: options.rtdir = os.path.realpath(options.rtdir) options.stdout = os.path.realpath(options.stdout) if sanity_check(options): parser.print_help() sys.exit(1) # create the base directory try: os.makedirs(options.outdir) except FileNotFoundError: print("Error in creating directory {}".format(options.outdir)) sys.exit(1) # if options.allow: # allow_capture_of_files() if options.filterfile is not None: add_additional_filters(os.path.realpath(options.filterfile)) # record the command as executed by the user with open(os.path.join(options.outdir, USERCMDFILE), "w") as fdout: fdout.write("Command executed as: {}\n".format(" ".join(sys.argv))) dump_compiler_info(outdir=options.outdir, location=options.ccdir, allowmodel=options.allowmodel, addfldir=options.addfldir, verbose=options.verbose) # Not being used now. neuron-dump.py would do this # dump_rt_info(location=options.rtdir, verbose=options.verbose) dump_miscinfo(outdir=options.outdir, verbose=options.verbose) dump_proc_info(outdir=options.outdir, verbose=options.verbose) copy_stdout(outdir=options.outdir, stdout=options.stdout, verbose=options.verbose) copy_syslog(outdir=options.outdir, include_flag=options.includemismatch, verbose=options.verbose) # run the existing tool neuron-dump.py as well run_neuron_dump(outdir=options.outdir, verbose=options.verbose) package_tarball(outdir=options.outdir, allowmodel=options.allowmodel, ccdir=options.ccdir, verbose=options.verbose) # change permissions for the directory and output os.system("chown -R {} {}".format(os.getlogin(), os.path.split(options.outdir)[0])) # write_miscinfo(outdir=options.outdir, data=l3) if __name__ == "__main__": main() ================================================ FILE: src/neuron-gatherinfo/tf_neuron_check_model.py ================================================ import os import json import sys import struct import argparse import subprocess from collections import Counter class neuron_parser: def __init__(self): self.parser = argparse.ArgumentParser() self.parser.add_argument('model_path', type=str, help='a TensorFlow SavedModel directory (currently supporting TensorFlow v1 SaveModel only).') self.parser.add_argument('--show_names', action='store_true', help='list operation by name instead of summarizing by type (caution: this option will generate many lines of output for a large model).') self.parser.add_argument('--expand_subgraph', action='store_true', help='show subgraph operations.') self.parser_args = self.parser.parse_args() self.neuronop_info = {} self.total_pipeline_cores = 0 self.min_required_pipeline_cores = 0 path = self.parser_args.model_path if os.path.exists(path + '-symbol.json'): self.load_mxnet_model(path) elif os.path.isdir(path): self.load_tensorflow_model(path) else: raise RuntimeError('Cannot determine framework type from model path argument.') self.supported = self.get_neuron_supported() self.supported.extend(self.addl_support) for name, executable, (sg_nodetypes, sg_nodenames) in self.neuron_nodes: num_cores, requested_cores, _ = self.get_cores_from_executable(executable) self.neuronop_info[name] = (num_cores, requested_cores, sg_nodetypes, sg_nodenames) self.total_pipeline_cores += num_cores if num_cores > self.min_required_pipeline_cores: self.min_required_pipeline_cores = num_cores def get_neuron_supported(self): exec_cmd = ["neuron-cc", "list-operators", "--framework", self.framework] oplist = subprocess.check_output(' '.join(exec_cmd), shell=True) oplist = str(oplist, 'utf-8') oplist = oplist.split("\n") return oplist[:-1] # Remove the last element which is '' def get_tf_subgraph_types_names(self, node): from tensorflow.core.framework import graph_pb2 graph_def = graph_pb2.GraphDef() graph_def.ParseFromString(node.attr['graph_def'].s) sg_nodes = graph_def.node sg_nodes = [sg_node for sg_node in sg_nodes if sg_node.op not in self.excl_types] nodetypes = [sg_node.op for sg_node in sg_nodes] nodenames = [sg_node.name for sg_node in sg_nodes] return nodetypes, nodenames def load_tensorflow_model(self, path): import tensorflow as tf import tensorflow_hub as hub self.framework = 'TENSORFLOW' self.neuron_optype = "NeuronOp" self.excl_types = ['Placeholder', 'PlaceholderWithDefault', 'NoOp', 'Const', 'Identity', 'IdentityN', 'VarHandleOp', 'VarIsInitializedOp', 'AssignVariableOp', 'ReadVariableOp', 'StringJoin', 'ShardedFilename', 'SaveV2', 'MergeV2Checkpoints', 'RestoreV2'] self.addl_support = ['FusedBatchNormV3', 'BatchMatMulV2', 'AddV2', 'StopGradient', self.neuron_optype] model = hub.load(path) graph_def = model.graph.as_graph_def() nodes = graph_def.node nodes = [node for node in nodes if node.op not in self.excl_types] self.nodetypes = [node.op for node in nodes] self.nodenames = [node.name for node in nodes] self.neuron_nodes = [(node.name, node.attr['executable'].s, self.get_tf_subgraph_types_names(node)) for node in nodes if node.op == self.neuron_optype] def get_mx_subgraph_types_names(self, node): nodetypes = [] nodenames = [] for sg in node['subgraphs']: filtered_nodes = [sg_node for sg_node in sg['nodes'] if sg_node['op'] not in self.excl_types] nodetypes.extend([sg_node['op'] for sg_node in filtered_nodes]) nodenames.extend([sg_node['name'] for sg_node in filtered_nodes]) return nodetypes, nodenames def load_mxnet_model(self, path): import mxnet as mx if mx.__version__ != "1.5.1": try: import mxnetneuron as mxn except: raise "Please install mxnetneuron package." self.framework = 'MXNET' self.neuron_optype = "_neuron_subgraph_op" self.excl_types = ['null'] self.addl_support = [self.neuron_optype] sym, args, auxs = mx.model.load_checkpoint(path, 0) nodes = json.loads(sym.tojson())["nodes"] nodes = [node for node in nodes if node['op'] not in self.excl_types] self.nodetypes = [node['op'] for node in nodes] self.nodenames = [node['name'] for node in nodes] neuron_nodes_tmp = [node for node in nodes if node['op'] == self.neuron_optype] self.neuron_nodes = [(node['name'], bytearray(args[node['name']+"_neuronbin"].asnumpy()), self.get_mx_subgraph_types_names(node)) for node in neuron_nodes_tmp] @staticmethod def get_cores_from_executable(executable): _NC_HEADER_SIZE = 544 header = executable[:_NC_HEADER_SIZE] info = list(struct.unpack('168xI304xI64B', header)) numCores = info.pop(0) numCoresRequested = info.pop(0) coresPerNode = info return numCores, numCoresRequested, coresPerNode # Display table of operation type or name and whether supported or not def print_node_type_info(self): self.cnt_total = len(self.nodetypes) self.cnt_supported = 0 if self.parser_args.show_names: widthn = max(max(map(len, self.nodenames)), 8) widtht = max(max(map(len, self.nodetypes)), 8) format_str = "{:<" + str(widthn) + "} {:<" + str(widtht) + "} {:<4}" pp = lambda x: print(format_str.format(*x)) pp(['Op Name', 'Op Type', 'Neuron Supported ?']) pp(['-------', '-------', '------------------']) for idx, opname in enumerate(self.nodenames): optype = self.nodetypes[idx] if optype in self.supported: pp([opname, optype, 'Yes']) self.cnt_supported += 1 for idx, opname in enumerate(self.nodenames): optype = self.nodetypes[idx] if optype not in self.supported: pp([opname, optype, 'No']) else: count = Counter(self.nodetypes) width = max(max(map(len, self.nodetypes)), 8) format_str = "{:<" + str(width) + "} {:<14} {:<4}" pp = lambda x: print(format_str.format(*x)) pp(['Op Type', 'Num Instances', 'Neuron Supported ?']) pp(['-------', '-------------', '------------------']) for key in count: if key in self.supported: pp([key, count[key], 'Yes']) self.cnt_supported += count[key] for key in count: if key not in self.supported: pp([key, count[key], 'No']) print() def print_subgraph_ops(self, sg_nodetypes, sg_nodenames): if self.parser_args.show_names: widthn = max(max(map(len, sg_nodenames)), 8) widtht = max(max(map(len, sg_nodetypes)), 8) format_str = "{:<" + str(widthn) + "} {:<" + str(widtht) + "}" pp = lambda x: print(' ', format_str.format(*x)) pp(['Op Name', 'Op Type']) pp(['-------', '-------']) for idx, opname in enumerate(sg_nodenames): optype = sg_nodetypes[idx] pp([opname, optype]) else: count = Counter(sg_nodetypes) width = max(max(map(len, sg_nodetypes)), 8) format_str = "{:<" + str(width) + "} {:<14}" pp = lambda x: print(' ', format_str.format(*x)) pp(['Op Type', 'Num Instances']) pp(['-------', '-------------']) for key in count: pp([key, count[key]]) def print_neuron_node_info(self): idx = 0 width = max(max(map(len, self.neuronop_info)), 14) format_str = "{:<" + str(width) + "} {:<14}" pp = lambda x: print(format_str.format(*x)) pp(['Subgraph Name', 'Num Pipelined NeuronCores']) pp(['-------------', '-------------------------']) core_cnt_list = [] for name, (num_cores, _, sg_nodetypes, sg_nodenames) in self.neuronop_info.items(): pp([name, num_cores]) core_cnt_list.append(num_cores) idx += 1 if self.parser_args.expand_subgraph: self.print_subgraph_ops(sg_nodetypes, sg_nodenames) print() def print_neuron_support_stats(self): print("* Total inference operations: {}".format(self.cnt_total)) print("* Total Neuron supported inference operations: {}".format(self.cnt_supported)) if self.cnt_total > 0: perc = self.cnt_supported / self.cnt_total * 100 else: perc = 0 print("* Percent of total inference operations supported by Neuron: {:.1f}".format(perc)) print() def print_common_desc(self): if self.parser_args.show_names: print("* Each line shows an operation name and whether the type of that operation is supported in Neuron.") else: print("* Each line shows an operation type, the number of instances of that type within model,\n" \ "* and whether the type is supported in Neuron.") print("* Some operation types are excluded from table because they are no-operations or training-related operations:\n", \ self.excl_types, "\n") def run(self): if len(self.neuronop_info) > 0: print("\n* Found {} Neuron subgraph(s) ({}(s)) in this compiled model.\n" \ "* Use this tool on the original uncompiled model to see Neuron supported operations.\n" \ "* The following table shows all operations, including Neuron subgraphs.".format(len(self.neuronop_info), self.neuron_optype)) self.print_common_desc() self.print_node_type_info() print('* Please run this model on Inf1 instance with at least {} NeuronCore(s).'.format(self.min_required_pipeline_cores)) print("* The following list show each Neuron subgraph with number of pipelined NeuronCores used by subgraph\n"\ "* (and subgraph operations if --expand_subgraph is used):\n") self.print_neuron_node_info() else: print("\n* The following table shows the supported and unsupported operations within this uncompiled model.") self.print_common_desc() self.print_node_type_info() self.print_neuron_support_stats() if __name__=='__main__': toolkit = neuron_parser() toolkit.run() ================================================ FILE: src/neuronperf/LICENSE ================================================ AWS Neuron License Agreement THIS IS AN AGREEMENT BETWEEN YOU AND AMAZON WEB SERVICES, INC. (WITH ITS AFFILIATES, "AWS" OR "WE") THAT GOVERNS YOUR USE OF THE AWS NEURON SOFTWARE (TOGETHER WITH ANY UPDATES AND UPGRADES TO IT, AND ACCOMPANYING DOCUMENTATION, THE “SOFTWARE”) THAT WE MAKE AVAILABLE TO YOU. IF YOU DOWNLOAD, INSTALL, OR USE THE SOFTWARE, YOU ACCEPT AND AGREE TO BE BOUND BY THIS AGREEMENT AND REPRESENT THAT YOU HAVE THE AUTHORITY TO BIND YOURSELF OR THE ENTITY YOU REPRESENT TO THIS AGREEMENT. 1. Use of the Software We hereby grant you a personal, limited, nonexclusive, non-transferable, non-sublicenseable, revocable, royalty-free, worldwide license during the term of this Agreement to install and use the Software in connection with AWS Services. You may not use the Software if you do not have an account in good standing with AWS. Some components of the Software (whether developed by AWS or third parties) may also be governed by applicable open source software licenses located in the software component's source code. Your license rights with respect to these individual components are defined by the applicable open source software license, and nothing in this Agreement will restrict, limit, or otherwise affect any rights or obligations you may have, or conditions to which you may be subject, under such open source software licenses. “AWS Services” means each of the services made available by AWS as may be updated by AWS from time to time in its sole discretion at https://aws.amazon.com/service-terms/ and are subject to your AWS Customer Agreement or AWS Enterprise Agreement. 2. Limitations You may not, and you will not encourage, assist or authorize any other person to (a) sell, rent, lease, lend, loan, distribute, act as a service bureau, publicly communicate, transform, or sub-license the Software or otherwise assign any rights to the Software in whole or in part, (b) modify, alter, tamper with, repair, or otherwise create derivative works of the Software, (c) reverse engineer, disassemble, or decompile the Software or apply any other process or procedure to derive the source code of any software included in the Software, or (d) access or use the Software or the AWS Service in a way intended to avoid incurring fees or exceeding usage limits or quotas. All rights granted to you are conditioned on your continued compliance with this Agreement, and will immediately and automatically terminate if you do not comply with any term or condition of this Agreement or the AWS Customer Agreement or AWS Enterprise Agreement, including any failure to remit timely payment for the Software or the AWS Service. You will not use the Software with any software or other materials that are subject to licenses or restrictions (e.g., open source software licenses) that, when combined with the Software, would require us to disclose, license, distribute or otherwise make all or any part of such Software available to anyone. You will not remove, modify, or obscure any copyright, patent, trademark or other proprietary or attribution notices on or in any Software. 3. Reservation of Rights You may not use the Software for any illegal purpose. The Software is the intellectual property of AWS or its licensors. The structure, organization, and code of the Software are valuable trade secrets and AWS confidential information. The Software is protected by applicable law, including without limitation copyright laws and international treaty provisions. Except for the rights expressly granted to you in this Agreement, all right, title and interest in the Software are reserved and retained by AWS and our licensors. You do not acquire any intellectual property or other rights in the Software as a result of downloading, installing, or using the Software. 4. Updates In order to keep the Software up-to-date, we may offer automatic or manual updates at any time. If we elect to provide maintenance or support of any kind, we may terminate that maintenance or support at any time without notice to you. 5. Termination You may terminate this Agreement at any time by uninstalling and destroying all copies of the Software that are in your possession or control. This Agreement (including any rights granted to you under this Agreement) will immediately and automatically terminate without notice from us if (a) you fail to comply with any term or condition of this Agreement or any other agreement you have with AWS, or (b) you fail to make timely payment for any AWS Service. In the case of termination, you must cease all downloading, installation, and use of the Software and uninstall and destroy all copies of the Software that are in your possession or control. We may modify, suspend, discontinue, or terminate your right to use part or all of the Software at any time without notice to you, and in that event we may modify the Software to make it inoperable. AWS will not be liable to you should it exercise those rights. Our failure to insist upon or enforce your strict compliance with this Agreement will not constitute a waiver of any of our rights. No waiver of any provision of this Agreement shall be effective unless in writing. 6. Disclaimer of Warranties and Limitation of Liability a. YOU EXPRESSLY ACKNOWLEDGE AND AGREE THAT INSTALLATION AND USE OF, AND ANY OTHER ACCESS TO, THE SOFTWARE IS AT YOUR SOLE RISK. THE SOFTWARE IS DELIVERED TO YOU “AS IS” WITH ALL FAULTS AND WITHOUT WARRANTY OF ANY KIND, AND AWS, ITS LICENSORS AND DISTRIBUTORS, AND EACH OF THEIR RESPECTIVE AFFILIATES AND SUPPLIERS (COLLECTIVELY, THE “RELEASED PARTIES”) DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, ACCURACY, QUIET ENJOYMENT, AND NON-INFRINGEMENT. NO ORAL OR WRITTEN INFORMATION OR ADVICE GIVEN BY A RELEASED PARTY OR AN AUTHORIZED REPRESENTATIVE OF A RELEASED PARTY WILL CREATE A WARRANTY. THE LAWS OF CERTAIN JURISDICTIONS DO NOT ALLOW THE DISCLAIMER OF IMPLIED WARRANTIES. IF THESE LAWS APPLY TO YOU, SOME OR ALL OF THE ABOVE DISCLAIMERS, EXCLUSIONS, OR LIMITATIONS MAY NOT APPLY TO YOU, AND YOU MAY HAVE ADDITIONAL RIGHTS. b. TO THE EXTENT NOT PROHIBITED BY LAW, NO RELEASED PARTY WILL BE LIABLE TO YOU FOR ANY INCIDENTAL OR CONSEQUENTIAL DAMAGES FOR BREACH OF ANY EXPRESS OR IMPLIED WARRANTY, BREACH OF CONTRACT, NEGLIGENCE, STRICT LIABILITY, OR ANY OTHER LEGAL THEORY RELATED TO THE SOFTWARE, INCLUDING WITHOUT LIMITATION ANY DAMAGES ARISING OUT OF LOSS OF PROFITS, REVENUE, DATA, OR USE OF THE APPLICATION, EVEN IF A RELEASED PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. IN ANY CASE, ANY RELEASED PARTY’S AGGREGATE LIABILITY UNDER THE AGREEMENT WILL BE LIMITED TO $50.00. THE LAWS OF CERTAIN JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF INCIDENTAL OR CONSEQUENTIAL DAMAGES. IF THESE LAWS APPLY TO YOU, SOME OR ALL OF THE ABOVE EXCLUSIONS OR LIMITATIONS MAY NOT APPLY TO YOU, AND YOU MAY HAVE ADDITIONAL RIGHTS. 7. Indemnification You are liable for and will defend, indemnify, and hold harmless the Released Parties and their officers, directors, agents, and employees, from and against any liability, loss, damage, cost, or expense (including reasonable attorneys’ fees) arising out of your use of the Software, violation of the Agreement, violation of applicable law, or violation of any right of any person or entity, including without limitation intellectual property rights. 8. Compliance with Laws; Export Regulations You will comply with all export and re-export restrictions and regulations of the United States Department of Commerce and other United States and foreign agencies and authorities that may apply to the Software, and not to transfer, or encourage, assist, or authorize the transfer of the Software to a prohibited country or otherwise in violation of any applicable restrictions or regulations. 9. U.S. Government End Users The Software is provided to the U.S. Government as “commercial items,” “commercial computer software,” “commercial computer software documentation,” and “technical data” with the same rights and restrictions generally applicable to the Software. If you are using the Software on behalf of the U.S. Government and these terms fail to meet the U.S. Government’s needs or are inconsistent in any respect with federal law, you will immediately discontinue your use of the Software. The terms “commercial item,” “commercial computer software,” “commercial computer software documentation,” and “technical data” are defined in the Federal Acquisition Regulation and the Defense Federal Acquisition Regulation Supplement. 10. Amendment We may amend this Agreement at our sole discretion by posting the revised terms on the AWS website (aws.amazon.com) or within the Software. Your continued use of the Software after any amendment's effective date evidences your agreement to be bound by it. If you do not agree to a change, you must stop using the Software and terminate this Agreement. 13. Conflicts In the event of any conflict or inconsistency among the terms and conditions of this Agreement and the existing AWS Customer Agreement or your AWS Enterprise Agreement, such conflict or inconsistency will be resolved by giving precedence to this Agreement. 14. Entire Agreement and Severability This is the entire agreement between AWS and you regarding the Software and supersedes all prior understandings regarding such subject matter (including any Evaluation Agreement). If any term or condition of this Agreement is deemed invalid, void, or for any reason unenforceable, that part will be deemed severable and will not affect the validity and enforceability of any remaining term or condition. ================================================ FILE: src/neuronperf/README.md ================================================ # NeuronPerf A library for benchmarking machine learning models on accelerators. ## Documentation https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuronperf/index.html ================================================ FILE: src/neuronperf/build.sh ================================================ #!/bin/bash set -ex python3 -m pytest -vv \ --verbose \ --ignore=build/private \ --cov=neuronperf \ --cov-report term-missing \ --cov-report html:build/brazil-documentation/coverage \ --cov-report xml:build/brazil-documentation/coverage/coverage.xml \ --color=yes \ -x \ test \ -m "sanity or slow" python3 setup.py bdist_wheel --dist-dir build/pip/public/neuronperf ================================================ FILE: src/neuronperf/conf.py ================================================ """Sphinx configuration.""" import datetime import os import shutil from amazon_doc_utils import brazil_info # Get metadata from brazil brazil_version, intersphinx_factory = brazil_info.get( [brazil_info.PackageVersion, brazil_info.IntersphinxFactory] ) def run_apidoc(app): """Generate doc stubs using sphinx-apidoc.""" module_dir = os.path.join(app.srcdir, "../src/") output_dir = os.path.join(app.srcdir, "_apidoc") excludes = [] # Ensure that any stale apidoc files are cleaned up first. if os.path.exists(output_dir): shutil.rmtree(output_dir) cmd = [ "--separate", "--module-first", "--doc-project=API Reference", "-o", output_dir, module_dir, ] cmd.extend(excludes) try: from sphinx.ext import apidoc # Sphinx >= 1.7 apidoc.main(cmd) except ImportError: from sphinx import apidoc # Sphinx < 1.7 cmd.insert(0, apidoc.__file__) apidoc.main(cmd) def setup(app): """Register our sphinx-apidoc hook.""" app.connect("builder-inited", run_apidoc) # Sphinx configuration below. project = brazil_version.name version = brazil_version.mv release = brazil_version.full_version copyright = "{}, Amazon.com".format(datetime.datetime.now().year) intersphinx_mapping = intersphinx_factory.get_mapping() extensions = [ "sphinx.ext.autodoc", "sphinx.ext.intersphinx", "sphinx.ext.napoleon", "sphinx.ext.todo", "sphinx.ext.viewcode", ] source_suffix = ".rst" master_doc = "index" autoclass_content = "class" autodoc_member_order = "bysource" default_role = "py:obj" html_theme = "haiku" htmlhelp_basename = "{}doc".format(project) napoleon_use_rtype = False ================================================ FILE: src/neuronperf/model_neuron_b1.csv ================================================ n_models,workers_per_model,pipeline_size,batch_size,throughput_avg,throughput_peak,latency_ms_p0,latency_ms_p50,latency_ms_p90,latency_ms_p95,latency_ms_p99,latency_ms_p100,load_avg_ms,warmup_avg_ms,e2e_avg_ms,input_avg_ms,preprocess_avg_ms,postprocess_avg_ms,infer_avg_ms,worker_avg_s,total_infs,total_s,status,model_filename,multiprocess,multiinterpreter,device_type,instance_type 1,1,1,1,31346.0,31408.0,0.03,0.03,0.031,0.032,0.037,0.732,62.217,2.625,0.031,0.001,0.0,0.0,0.028,4.93,154704,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge 16,16,1,1,380604.75,380923.0,0.03,0.032,0.054,0.054,0.057,0.938,293.806,3.266,0.043,0.001,0.0,0.0,0.039,4.7,1799549,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge 1,2,1,1,51178.0,51319.0,0.035,0.036,0.037,0.039,0.047,1.13,114.118,2.713,0.037,0.001,0.0,0.0,0.033,4.88,248984,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge 16,32,1,1,381098.75,383905.0,0.03,0.058,0.067,0.073,0.121,48.07,303.916,4.42,0.08,0.001,0.0,0.0,0.074,4.69,1804925,5.0,finished,model_neuron_b1.pt,True,False,neuron,inf1.6xlarge ================================================ FILE: src/neuronperf/pyproject.toml ================================================ [tool.black] line-length = 100 [tool.isort] known_first_party = ["neuronperf"] [tool.pytest.ini_options] markers = [ "sanity", "slow", ] # required for compatibility with black: profile = "black" # To maintain consistency with other settings line_length = 100 ================================================ FILE: src/neuronperf/src/neuronperf/__init__.py ================================================ # -*- coding: utf-8 -*- """ NeuronPerf Library ~~~~~~~~~~~~~~~~~~ A library for benchmarking machine learning models on accelerators. :copyright: (c) 2022 Amazon Inc. :license: See LICENSE. """ from .__version__ import __title__, __description__, __url__, __version__ from .__version__ import __author__, __author_email__, __license__ from .__version__ import __copyright__ # setup logging first import logging _log_level = logging.DEBUG log = logging.getLogger(__name__) log.setLevel(_log_level) from .logging import _get_stream_handlers for handler in _get_stream_handlers(_log_level): log.addHandler(handler) from .benchmarking import compile, benchmark, set_verbosity from .cpu import cpu from .cpu.cpu import DummyModel from .reporting import CSV_COLS, PRINT_COLS, get_reports, print_reports, write_csv, write_json from .timing import timestamp_convert, Timer ================================================ FILE: src/neuronperf/src/neuronperf/__version__.py ================================================ __title__ = "neuronperf" __description__ = "A benchmarking library for machine learning accelerators." __url__ = "https://awsdocs-neuron.readthedocs-hosted.com/en/neuronperf" __version__ = "0.0.0.0" __author__ = "AWS" __author_email__ = "neuronperf@amazon.com" __license__ = "Proprietary" __copyright__ = "Copyright Amazon Web Services and its Affiliates. All rights reserved." ================================================ FILE: src/neuronperf/src/neuronperf/benchmarking.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.benchmarking ~~~~~~~~~~~~~~~~~~~~~~~ Provides utility functions and classes that underlie the framework benchmarkers. """ from typing import Any, Callable, Dict, List, Union import collections import concurrent import concurrent.futures import copy import functools import logging import multiprocessing import os import psutil import subprocess import sys import tempfile import threading import time import traceback import dill from . import model_index from .compile_constants import NEURONCORE_PIPELINE_CORES, FAST_MATH, FAST_MATH_OPTIONS from .reporting import get_reports from .scripts import run_benchmark_file from .timing import Timer log = logging.getLogger(__name__) # Wrapper for sending back subprocess failure info. Needs to be at top level for pickle. BenchmarkerErrorWrapper = collections.namedtuple("BenchmarkerErrorWrapper", "trace") ERROR = "error" SUPPORTED_DEVICE_TYPES = ["neuron", "cpu", "cuda", "gpu"] # TODO: "tpu"] BENCHMARK_SECS = 120 class Benchmarker(threading.Thread): r""" :class:`benchmarking:Benchmarker` benchmarks a single model. This class is a `threading.Thread`. Call `start` to launch a non-blocking benchmarking thread. Calling `stop` will end the benchmarking and block until all subroutines complete. An object of this class may be serialized and sent to multiple subprocesses for parallel use. After benchmarking, results can be obtained with `results`. """ def __init__( self, id: int, device_id: int, load_fn: Callable[[str], Any], model_filename: str, inputs, workers_per_model: int, env_setup_fn: Callable[[int, Dict, Any], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, model_class_name: str = None, model_class_file: str = None, ): super().__init__() self.id = id self.device_id = device_id self.load_fn = load_fn self.model_filename = model_filename self.inputs = inputs self.input_iter = None # Prepared in setup() self.input_lock = threading.Lock() self.workers_per_model = workers_per_model self.env_setup_fn = env_setup_fn self.setup_fn = setup_fn self.preprocess_fn = preprocess_fn self.postprocess_fn = postprocess_fn self.dataset_loader_fn = dataset_loader_fn self.model_class_name = model_class_name self.model_class_file = model_class_file # Mutable internal state. self.model = None self.benchmark_timer = Timer() self.env_setup_timer = Timer() self.setup_timer = Timer() self.load_timer = Timer() self.warmup_timer = Timer() self.input_timer = Timer() self.preprocess_timers = [Timer() for _ in range(workers_per_model)] self.infer_timers = [Timer() for _ in range(workers_per_model)] self.postprocess_timers = [Timer() for _ in range(workers_per_model)] self.e2e_timers = [Timer() for _ in range(workers_per_model)] self.worker_timers = [Timer() for _ in range(workers_per_model)] self.n_infs = [0] * workers_per_model self.process_id = 0 # set at launch time self.benchmarking = False self.benchmarking_lock = threading.Lock() self.status_lock = threading.Lock() self.status = "ready" self.error = None def _status(self, status, error=None): """Update internal status, unless a previous error has occurred.""" with self.status_lock: if self.status == ERROR: return self.status = status if error: self.error = error def next_input(self): self.input_lock.acquire() self.input_timer.start() try: return next(self.input_iter) finally: self.input_timer.stop() self.input_lock.release() def prepare_inputs(self): """Prepares input iterator; runs an optional custom setup function.""" if self.dataset_loader_fn: def input_iter(): dataset_loader = self.dataset_loader_fn(self.inputs, self.workers_per_model) while True: inputs = next(dataset_loader) yield inputs if isinstance(inputs, tuple) else (inputs,) self.input_iter = input_iter() else: def input_iter(): inputs = self.inputs if isinstance(self.inputs, tuple) else (self.inputs,) while True: yield inputs self.input_iter = input_iter() def load(self): """Loads the model that will be used for benchmarking.""" with self.load_timer: self.model = self.load_fn(self.model_filename, device_id=self.device_id) def warmup(self): """Warmup the model with a single e2e inference.""" with self.warmup_timer: inputs = self.next_input() if self.preprocess_fn: inputs = self.preprocess_fn(*inputs) outputs = self.model(*inputs if isinstance(inputs, tuple) else inputs) if self.postprocess_fn: self.postprocess_fn(outputs) self.n_infs[0] += 1 # track warmup infs in worker 0 def setup(self): """Perform all setup work prior to benchmarking.""" self.prepare_inputs() if self.env_setup_fn: with self.env_setup_timer: self.env_setup_fn() self.load() if self.setup_fn: with self.setup_timer: self.setup_fn(self.model) self.warmup() def infer(self, worker_id) -> tuple: """Execute a single inference.""" with self.e2e_timers[worker_id]: inputs = self.next_input() if self.preprocess_fn: with self.preprocess_timers[worker_id]: inputs = self.preprocess_fn(*inputs) with self.infer_timers[worker_id]: outputs = self.model(*inputs if isinstance(inputs, tuple) else inputs) if self.postprocess_fn: with self.postprocess_timers[worker_id]: outputs = self.postprocess_fn(outputs) return outputs def worker_thread(self, worker_id): """A single worker thread that runs inference until signalled to stop.""" n_infs = 0 try: log.debug(f"Benchmarker {self.id}, Worker {worker_id} started.") with self.worker_timers[worker_id]: while self.benchmarking and self.status != ERROR: self.infer(worker_id) n_infs += 1 if self.status == ERROR: log.debug( f"Benchmarker {self.id}, Worker {worker_id} stopped early due to an error after {n_infs} inferences." ) except StopIteration: pass except: trace = "".join(traceback.format_exception(*sys.exc_info())) log.error( f"Benchmarker {self.id}, Worker {worker_id} encountered an error during benchmarking:\n{trace}" ) self._status(ERROR, BenchmarkerErrorWrapper(trace)) finally: self.n_infs[worker_id] += n_infs log.debug( f"Benchmarker {self.id}, Worker {worker_id} finished after {self.n_infs[worker_id]} inferences." ) def run(self): with self.benchmarking_lock: if self.benchmarking: raise RuntimeError( f"Benchmarker {self.id} can't start because it is already running." ) self.benchmarking = True self._status("running") # Set our process id, now that we are launched. self.process_id = os.getpid() # Launch all workers and begin benchmarking. # If any individual worker reports an error, self.status will reflect # that after this method. with self.benchmark_timer: try: self.setup() except: trace = "".join(traceback.format_exception(*sys.exc_info())) log.error(f"Benchmarker {self.id} encountered an error during prep:\n{trace}") self._status(ERROR, BenchmarkerErrorWrapper(trace)) else: with concurrent.futures.ThreadPoolExecutor(max_workers=self.workers_per_model) as exe: for worker_id in range(self.workers_per_model): exe.submit(self.worker_thread, worker_id) # There are three ways to reach the next section: # 1. We ran out of benchmarking examples in a provided dataset (graceful quit on StopIteration). # 2. We were asked to stop(). # 3. We encountered an error. # In cases 1 and 3, we can acquire the lock, update our state if necessary, and quit. # In case 2, we already hold the lock, so we can skip this section and let stop() handle cleanup. if self.benchmarking_lock.acquire(blocking=False): try: self.benchmarking = False self._status("finished") finally: self.benchmarking_lock.release() def stop(self): # Setting self.benchmarking = False triggers workers to terminate gracefully. # We must hold the benchmarking_lock until the thread has joined to ensure # consistent use of the self.benchmarking flag. with self.benchmarking_lock: if not self.benchmarking: return self._status("stopping") self.benchmarking = False self.join() self._status("finished") def results(self) -> dict: with self.benchmarking_lock: if self.benchmarking: raise RuntimeError("Cannot produce results until benchmarking has completed.") return { "id": self.id, "device_id": self.device_id, "workers_per_model": self.workers_per_model, "n_infs": sum(self.n_infs), "status": self.status, "process_id": self.process_id, "total_s": self.benchmark_timer.total_duration("s"), "timers": { "env_setup": [self.env_setup_timer], "setup": [self.setup_timer], "load": [self.load_timer], "input": [self.input_timer], "warmup": [self.warmup_timer], "preprocess": self.preprocess_timers, "infer": self.infer_timers, "postprocess": self.postprocess_timers, "e2e": self.e2e_timers, "worker": self.worker_timers, }, } class StatsThread(threading.Thread): """A thread to collect some system metrics duirng benchmarking.""" def __init__(self, interval: float): super().__init__() self.interval = interval # interval (in seconds) to collect metrics self.cpu_percents = [] self.mem_percents = [] self.running = True def run(self): while self.running: cpu_percent = psutil.cpu_percent(interval=self.interval, percpu=False) mem_percent = psutil.virtual_memory()[2] self.cpu_percents.append(cpu_percent) self.mem_percents.append(mem_percent) def join(self, **kwargs): self.running = False super().join(**kwargs) def _combine_results(results: List[dict]) -> dict: """Combines the results of multiple benchmarkers into a single results structure.""" combined_results = {} for result in results: # workers_per_model should be the same across all benchmarkers, so we only need it once. combined_results.setdefault("workers_per_model", result["workers_per_model"]) # If an error occurred anywhere, preserve it. combined_results["status"] = ( result["status"] if combined_results.get("status", "") != ERROR else ERROR ) combined_results["n_infs"] = combined_results.get("n_infs", 0) + result["n_infs"] # Keep the longest subprocess duration. combined_results["total_s"] = max(combined_results.get("total_s", 0), result["total_s"]) # Concatenate all timing info. timers = combined_results.get("timers", {}) for k, v in result["timers"].items(): timer_list = timers.get(k, []) timer_list.extend(v) timers[k] = timer_list combined_results["timers"] = timers return combined_results def _get_num_workers(pipeline_size: int) -> int: """Returns a best-guess number of worker threads for a single benchmarking process.""" return 2 if pipeline_size == 1 else pipeline_size - 1 def get_instance_type() -> str: """Try to obtain the maximum number of NeuronCores available on this instance.""" try: import urllib.request with urllib.request.urlopen( "http://169.254.169.254/latest/meta-data/instance-type" ) as response: instance_type = response.read().decode("utf-8") log.debug("Automatically determined instance type: {}".format(instance_type)) return instance_type except: return None def _get_cost_per_hour(instance_type: str) -> float: # Hourly rates instancetype_to_cost = { "inf1.xlarge": 0.228, "inf1.2xlarge": 0.362, "inf1.6xlarge": 1.18, "inf1.24xlarge": 4.721, } try: return instancetype_to_cost[instance_type] except: # Just ignore unknown instance types for now return None def _get_max_neuroncores(instance_type: str = None) -> int: """Try to obtain the maximum number of NeuronCores available on this instance.""" instancetype_to_neuroncores = { "inf1.xlarge": 4, "inf1.2xlarge": 4, "inf1.6xlarge": 16, "inf1.24xlarge": 64, } try: if not instance_type: instance_type = get_instance_type() return instancetype_to_neuroncores[instance_type] except: num_cores = 2 log.warning(f"Unknown Neuron device size. Assuming {num_cores} NeuronCores is the maximum.") return num_cores def _get_num_gpus(instance_type: str = None) -> int: """Try to obtain the maximum number of NeuronCores available on this instance.""" instancetype_to_gpus = { "g4dn.xlarge": 1, "g4dn.2xlarge": 1, "g4dn.4xlarge": 1, "g4dn.8xlarge": 1, "g4dn.16xlarge": 1, "g4dn.12xlarge": 4, "g4dn.metal": 8, "g4ad.xlarge": 1, "g4ad.2xlarge": 1, "g4ad.4xlarge": 1, "g4ad.8xlarge": 2, "g4ad.16xlarge": 4, "p4d.24xlarge": 8, } try: if not instance_type: instance_type = get_instance_type() return instancetype_to_gpus[instance_type] except: log.warning("Unknown GPU device size. Assuming 1 GPU is available.") return 1 def _get_num_devices(device_type: str, instance_type: str = None) -> int: """This is a stub, to be populated later for other instance types.""" if device_type == "neuron": return _get_max_neuroncores(instance_type) elif device_type == "cpu": return multiprocessing.cpu_count() elif device_type == "cuda" or device_type == "gpu": return _get_num_gpus(instance_type) else: log.warning("An unknown device_type was passed: {}".format(device_type)) return None def _sanitize_inputs(inputs, batch_sizes: Union[int, List[int]], dataset_inputs=False) -> List[int]: """Return inputs and batch_sizes with matching lengths, or throw an error.""" if not isinstance(inputs, list): inputs = [inputs] if isinstance(batch_sizes, int): batch_sizes = [batch_sizes] if not batch_sizes: log.warning( "Batch sizes were not provided, so assuming 1 and only the first input will be benchmarked." ) batch_sizes = [1] if not dataset_inputs: if len(batch_sizes) < len(inputs): delta = len(inputs) - len(batch_sizes) log.warning( "Received {} inputs, but only {} batch sizes. Discarding last {} inputs.".format( len(inputs), len(batch_sizes), delta ) ) inputs = inputs[: len(batch_sizes)] elif len(inputs) < len(batch_sizes): delta = len(batch_sizes) - len(inputs) log.warning( "Received {} batch sizes, but only {} inputs. Discarding last {} batch sizes.".format( len(batch_sizes), len(inputs), delta ) ) batch_sizes = batch_sizes[: len(inputs)] return inputs, batch_sizes def set_verbosity(verbosity: int): r""" Controls the verbosty of NeuronPerf logging. :param int verbosity: 0 = error, 1 = info, 2 = debug """ if 0 == verbosity: log.setLevel(logging.ERROR) elif 1 == verbosity: log.setLevel(logging.INFO) else: log.setLevel(logging.DEBUG) def compile( compile_fn, model, inputs, batch_sizes: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[str, List[int]] = None, models_dir: str = "models", model_name: str = None, filename: str = None, compiler_args: dict = None, verbosity: int = 1, **kwargs, ) -> str: r""" Compiles the provided model with each provided example input, pipeline size, and performance level. :param model: The model to compile. :param list inputs: A list of example inputs. :param Union[int, List[int]] batch_sizes: A list of batch sizes that correspond to the example inputs. :param Union[int, List[int]] pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`. :param Union[int, List[int]] performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`mixed-precision`. :param str models_dir: The directory where compilation artifacts will be stored. :param str model_name: An optional model name tag to apply to compiled artifacts. :param str filename: The name of the model index to write out. If not provided, a name will be generated and returned. :param dict compiler_args: Additional compiler arguments to be forwarded with every compilation. :param int verbosity: 0 = error, 1 = info, 2 = debug :return: A model index filename. If a configuration fails to compile, it will not be included in the index and an error will be logged. :rtype: str """ # Set NeuronPerf logging verbosity. set_verbosity(verbosity) # Standardize arguments. if not pipeline_sizes: pipeline_sizes = [1] if not performance_levels: performance_levels = [] if not compiler_args: compiler_args = {} if not model_name: if isinstance(model, str): model_name = model else: try: model_name = model.__name__ except AttributeError: log.warning("Unable to determine a model name, using 'Model'.") model_name = "Model" if isinstance(pipeline_sizes, int): pipeline_sizes = [pipeline_sizes] if isinstance(performance_levels, int): performance_levels = [performance_levels] inputs, batch_sizes = _sanitize_inputs(inputs, batch_sizes) # Sanity check and sanitize compiler_args. if NEURONCORE_PIPELINE_CORES in compiler_args: if pipeline_sizes: log.warning( ( "You provided NeuronCore Pipeline Core sizes using both " "compiler_args and pipeline_sizes. Ignoring flag in compiler_args." ) ) else: pipeline_sizes = [compiler_args[NEURONCORE_PIPELINE_CORES]] del compiler_args[NEURONCORE_PIPELINE_CORES] if FAST_MATH in compiler_args: if performance_levels: log.warning( ( f"You provided performance_levels and {FAST_MATH}. " "Ignoring flag in compiler_args." ) ) del compiler_args[FAST_MATH] # Check if performance levels are within expected bounds. max_performance = max(FAST_MATH_OPTIONS) performance_levels_invalid = list( filter( lambda level: level < min(FAST_MATH_OPTIONS) or level > max_performance, performance_levels, ) ) if performance_levels_invalid: log.warning( "You provided some invalid performance_levels. Ignoring: {}".format( performance_levels_invalid ) ) performance_levels = [ level for level in performance_levels if (level in performance_levels) and (level not in performance_levels_invalid) ] # If we still have no values, set default to max performance. if not performance_levels: performance_levels.append(max_performance) # Create standard output dir, if it doesn't exit. os.makedirs(models_dir, exist_ok=True) # Compile all requested model combinations. model_idxs = [] # TODO: Support appending to existing index by filtering already-compiled configs. def make_index(): """Create a model index file that contains info about all compiled models.""" index = model_index.append(*model_idxs) # Return the name of the new index file. return model_index.save(index, filename=filename) compile_idx = 1 n_compiles = len(inputs) * len(pipeline_sizes) * len(performance_levels) for input_idx, example_input in enumerate(inputs): batch_size = batch_sizes[input_idx] for pipeline_size in pipeline_sizes: for performance_level in performance_levels: _compiler_args = copy.copy(compiler_args) _compiler_args[FAST_MATH] = FAST_MATH_OPTIONS[performance_level] if pipeline_size != 1: _compiler_args[NEURONCORE_PIPELINE_CORES] = str(pipeline_size) # Construct a more informative model name with some config info model_name_ex = "{}_b{}_p{}_{}".format( model_name, batch_size, pipeline_size, model_index.generate_id(), ) log.info( ( f"Compiling batch size {batch_size} for {pipeline_size} NeuronCore(s) with performance level " f"{performance_level}/{max_performance}. [{compile_idx}/{n_compiles}]" ) ) status = "ready" timer = Timer() with timer: try: model_filename = compile_fn( model, example_input, models_dir, model_name_ex, compiler_args=_compiler_args, **kwargs, ) status = "finished" except KeyboardInterrupt: status = "error" model_filename = None log.error("Compilation interrupted, terminating.") return make_index() except: status = "error" model_filename = None log.exception( ( f"Failed to compile input={input_idx}, " f"batch_size={batch_size}, " f"pipeline_size={pipeline_size}, " f"performance_level={performance_level}." ) ) finally: model_idx = model_index.create( model_filename, model_name=model_name, batch_size=batch_size, pipeline_size=pipeline_size, performance_level=performance_level, compile_s=round(timer.total_duration("s"), 2), status=status, ) model_idxs.append(model_idx) filename = make_index() compile_idx += 1 return filename def run_benchmarker(benchmarker, duration, pipe=None): def _send(results): if pipe: pipe.send(results) pipe.close() else: return results try: log.debug(f"Benchmarker {benchmarker.id} started.") check_freq = 0.1 # Check progress every 0.1 seconds. start_time = time.time() benchmarker.start() elapsed = 0 while (elapsed < duration) and benchmarker.benchmarking: elapsed = time.time() - start_time remaining = max(0, duration - elapsed) time.sleep(min(check_freq, remaining)) benchmarker.stop() except: trace = "".join(traceback.format_exception(*sys.exc_info())) error = BenchmarkerErrorWrapper(trace) return _send(error) else: results = benchmarker.results() if benchmarker.status != ERROR else benchmarker.error return _send(results) finally: log.debug(f"Benchmarker {benchmarker.id} finished.") def _run_benchmarker_new_interpreter(benchmarker, duration): """ This function is a workaround for frameworks that cannot be safely forked. The premise is to launch a new Python interpreter and run benchmarking from within the new interpreter. It works by writing serialized benchmarkers to temporary files, and then launching run_benchmark_file.py. The script writes back serialized results. """ # Temporary serialization workaround. This attribute is inherited from Thread. # TODO: Separate data from benchmarking. setattr(benchmarker, "_stderr", None) script = run_benchmark_file.__file__ # Serialize the benchmarker to a file. f = tempfile.NamedTemporaryFile(delete=False) log.debug("Dumping Benchmarker {} to file '{}'.".format(benchmarker.id, f.name)) try: dill.dump(benchmarker, f) except dill.PicklingError: raise dill.PicklingError( ( "NeuronPerf was unable to serialize the benchmarker. This is probably becuause your model " "could not be serialized. Make sure to use top-level classes instead of locals. You may " "need to wrap your model and manually load it using Python's importlib." ) ) f.close() # Run the benchmarking script in a clean Python process. command = [ sys.executable, script, f.name, str(duration), ] # If we are manually loading a model class file in subprocesses, we need to let them know. if benchmarker.model_class_name and benchmarker.model_class_file: command.append(f"--model_class_name={benchmarker.model_class_name}") command.append(f"--model_class_file={benchmarker.model_class_file}") proc = subprocess.Popen( command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding="utf-8" ) # Interpreter and framework overhead add a delay to processing. We should ensure # that during multiinterpreter benchmarking, sufficient time is waited for results. timeout = 60 + duration try: outs, errs = proc.communicate(timeout=timeout) with open(f.name, "rb") as fp: result = dill.load(fp) if isinstance(result, BenchmarkerErrorWrapper): raise ChildProcessError( "Benchmarker {} encountered an error:\n{}".format(benchmarker.id, result.trace) ) if isinstance(result, Benchmarker): # If we still have a benchmarker object instead of results, something # went wrong that wasn't handled by the benchmarker routine. from pathlib import Path path = Path(f.name) logs = os.path.join(path.parent, "neuronperf_error_{}".format(str(path.stem))) if os.path.exists(logs): with open(logs, "rt") as logs_fp: err_logs = logs_fp.readlines() os.unlink(logs) raise ChildProcessError( "Benchmarker {} failed. Logs from child process:\n{}".format( benchmarker.id, "".join(err_logs) ) ) else: raise ChildProcessError( ( "Benchmarker {} failed and no error logs were found. A child process may have " "aborted. To obtain a stack trace, try running a single configuration inside a " "single process by passing multiprocess=False, multiinterpreter=False" ) ) return result except subprocess.TimeoutExpired: proc.kill() raise ChildProcessError( "Benchmarker {} stopped responding after {} seconds.".format(benchmarker.id, timeout) ) finally: os.unlink(f.name) def _run_benchmarkers_multiprocess( benchmarkers: List[Benchmarker], duration: int, benchmark_func=run_benchmarker ) -> dict: results = [] # Hand each benchmarker object to a subprocess. pipes, procs = [], [] for benchmarker in benchmarkers: parent_pipe, child_pipe = multiprocessing.Pipe() pipes.append(parent_pipe) proc = multiprocessing.Process( target=benchmark_func, args=(benchmarker, duration, child_pipe) ) procs.append(proc) # Launch benchmarking. for proc in procs: proc.start() # Collect results. for id, (pipe, proc) in enumerate(zip(pipes, procs)): try: proc_result = pipe.recv() if isinstance(proc_result, BenchmarkerErrorWrapper): log.error("Child process encountered an error:\n{}".format(proc_result.trace)) raise ChildProcessError() proc.join() results.append(proc_result) except KeyboardInterrupt: log.error("Benchmarking interrupted, terminating.") for proc in procs: proc.terminate() raise KeyboardInterrupt() except EOFError: log.error( ( f"Child process {id} was killed by the host OS during benchmarking.\n" "You may have run out of memory.\n" "Verify that your model can perform inference without NeuronPerf or try n_models=1." ) ) return _combine_results(results) def _run_benchmarkers_multithreaded( benchmarkers: List[Benchmarker], duration: int, benchmark_func=run_benchmarker ) -> dict: results = [] timeout = 60 + duration # Add some time for setup overhead and cleanup. try: args = ((benchmarker, duration) for benchmarker in benchmarkers) with concurrent.futures.ThreadPoolExecutor(max_workers=len(benchmarkers)) as exe: results.extend(exe.map(lambda arg: benchmark_func(*arg), args, timeout=timeout)) for result in results: if isinstance(result, BenchmarkerErrorWrapper): raise RuntimeError("Worker thread encountered an error:\n{}".format(result.trace)) except concurrent.futures.TimeoutError: log.error("Benchmarking timed out after {} seconds.".format(timeout)) except KeyboardInterrupt: raise KeyboardInterrupt("Benchmarking interrupted, terminating.") return _combine_results(results) def run_benchmarkers( benchmarkers: List[Benchmarker], duration: int, stats_interval: float = 0.5, multiprocess: bool = True, multiinterpreter: bool = False, ) -> dict: results = {} # Launch a background thread to collect system stats during benchmarking. stats_thread = StatsThread(stats_interval) stats_thread.start() try: if multiinterpreter: if not sys.executable: raise ValueError( ( "Unable to benchmark in multi-interpreter mode because " "the Python interpreter cannot be located (sys.executable is empty)." ) ) # We can safely re-use the multithreaded path here by using a custom benchmarking # function that spawns fresh interpreters. results = _run_benchmarkers_multithreaded( benchmarkers, duration, benchmark_func=_run_benchmarker_new_interpreter ) elif multiprocess: results = _run_benchmarkers_multiprocess(benchmarkers, duration) else: results = _run_benchmarkers_multithreaded(benchmarkers, duration) finally: stats_thread.join() results["cpu_percents"] = stats_thread.cpu_percents results["mem_percents"] = stats_thread.mem_percents return results def _get_env_setup_fn(benchmarker_id: int, benchmarker_config: dict, env_setup_fn): """Wrap an environment setup function with device-specific requirements.""" device_type = str(benchmarker_config["device_type"]).lower().strip() legacy = bool(os.environ.get("NEURONCORE_GROUP_SIZES")) if "neuron" == device_type: @functools.wraps(env_setup_fn) def _env_setup_fn(): import os id = benchmarker_id config = benchmarker_config pipeline_size = config["pipeline_size"] if config["multiprocess"] or config["multiinterpreter"]: # In multiprocess mode, need to specify the exact cores for the process. min_core = pipeline_size * id max_core = min_core + (pipeline_size - 1) visible_cores = f"{min_core}-{max_core}" if legacy: os.environ["NEURONCORE_GROUP_SIZES"] = str(pipeline_size) else: os.environ["NEURON_RT_VISIBLE_CORES"] = visible_cores else: # In multithreaded mode, all required cores are allocated in this process. n_models = config["n_models"] if legacy: os.environ["NEURONCORE_GROUP_SIZES"] = ",".join([str(pipeline_size)] * n_models) else: os.environ["NEURON_RT_VISIBLE_CORES"] = "0-{}".format( n_models * pipeline_size - 1 ) # Finally, call any additional custom setup function provided. if env_setup_fn: env_setup_fn(id, config) return _env_setup_fn elif device_type == "cpu": return env_setup_fn elif device_type == "cuda" or device_type == "gpu": @functools.wraps(env_setup_fn) def _env_setup_fn(): import os os.environ["CUDA_VISIBLE_DEVICES"] = str(benchmarker_id) if env_setup_fn: env_setup_fn(benchmarker_id, benchmarker_config) return _env_setup_fn else: log.warning( ( f"NeuronPerf does not implement a proper environment setup for {device_type}. " "You may need to provide your own." ) ) return env_setup_fn def _get_setup_fn(benchmarker_id: int, benchmarker_config: dict, setup_fn): """Wraps a customer provided setup function with additional info from the benchmarker.""" if not setup_fn: return None @functools.wraps(setup_fn) def _setup_fn(model): setup_fn(benchmarker_id, benchmarker_config, model) return _setup_fn def _get_device_id(benchmarker_id: int, benchmarker_config: dict): """Calculate an appropriate device id for a benchmarker object.""" device_id = benchmarker_id device_type = str(benchmarker_config["device_type"]).lower().strip() if device_type in SUPPORTED_DEVICE_TYPES: if not (benchmarker_config["multiprocess"] or benchmarker_config["multiinterpreter"]): device_id = benchmarker_id * benchmarker_config["pipeline_size"] return device_id else: log.warning( "Assuming device_id={} for benchmarker_id={} for unknown device_type={}".format( device_id, benchmarker_id, device_type ) ) return device_id def benchmark( load_fn: Callable[[str, int], Any], model_filename: str, inputs: Any, batch_sizes: Union[int, List[int]] = None, duration: float = BENCHMARK_SECS, n_models: Union[int, List[int]] = None, pipeline_sizes: Union[int, List[int]] = None, performance_levels: Union[int, List[int]] = None, workers_per_model: Union[int, None] = None, env_setup_fn: Callable[[int, Dict], None] = None, setup_fn: Callable[[int, Dict, Any], None] = None, preprocess_fn: Callable[[Any], Any] = None, postprocess_fn: Callable[[Any], Any] = None, dataset_loader_fn: Callable[[Any, int], Any] = None, multiprocess: bool = True, multiinterpreter: bool = False, return_timers: bool = False, stats_interval: float = 0.5, device_type: str = "neuron", cost_per_hour: float = None, model_name: str = None, model_class_name: str = None, model_class_file: str = None, verbosity: int = 1, ) -> List[Dict]: r""" Benchmarks the model index or individiual model using the provided inputs. If a model index is provided, additional fields such as ``pipeline_sizes`` and ``performance_levels`` can be used to filter the models to benchmark. The default behavior is to benchmark all configurations in the model index. Any additional compiler_args passed will be forwarded to the compiler on every invocation. :param Callable[[str, int], Any] load_fn: A function that accepts a model filename and device id, and returns a loaded model. This is automatically passed through the subpackage calls (e.g. ``neuronperf.torch.benchmark``). :param str model_filename: A path to a model index from compile or path to an individual model. For CPU benchmarking, a class should be passed that can be instantiated with a default constructor (e.g. ``MyModelClass``). :param list inputs: A list of example inputs. If the list contains tuples, they will be destructured on inference to support multiple arguments. :param Union[int, List[int]] batch_sizes: A list of ints indicating batch sizes that correspond to the inputs. Assumes 1 if not provided. :param duration float: The number of seconds to benchmark each model. :param n_models Union[int, List[int]]: The number of models to run in parallel. Default behavior runs 1 model and the max number of models possible, determined by a best effort from ``device_type``, instance size, or other environment state. :param Union[int, List[int]] pipeline_sizes: A list of pipeline sizes to use. See :ref:`neuroncore-pipeline`. :param Union[int, List[int]] performance_levels: A list of performance levels to try. Options are: 0 (max accuracy), 1, 2, 3 (max performance, default). See :ref:`mixed-precision`. :param Union[int, List[int]] workers_per_model: The number of workers to use per model loaded. If ``None``, this is automatically selected. :param Callable[[int, Dict], None] env_setup_fn: A custom environment setup function to run in each subprocess before model loading. It will receive the benchmarker id and config. :param Callable[[int, Dict, Any], None] setup_fn: A function that receives the benchmarker id, config, and model to perform last minute configuration before inference. :param Callable[[Any], Any]: preprocess_fn: A custom preprocessing function to perform on each input before inference. :param Callable[[Any], Any]: postprocess_fn: A custom postprocessing function to perform on each input after inference. :param bool multiprocess: When True, model loading is dispatched to forked subprocesses. Should be left alone unless debugging. :param bool multiinterpreter: When True, benchmarking is performed in a new python interpreter per model. All parameters must be serializable. Overrides multiprocess. :param bool return_timers: When True, the return of this function is a list of tuples ``(config, results)`` with detailed information. This can be converted to reports with ``get_reports(results)``. :param float stats_interval: Collection interval (in seconds) for metrics during benchmarking, such as CPU and memory usage. :param str device_type: This will be set automatically to one of the ``SUPPORTED_DEVICE_TYPES``. :param float cost_per_hour: The price of this device / hour. Used to estimate cost / 1 million infs in reports. :param str model_name: A friendly name for the model to use in reports. :param str model_class_name: Internal use. :param str model_class_file: Internal use. :param int verbosity: 0 = error, 1 = info, 2 = debug :return: A list of benchmarking results. :rtype: List[Dict] """ # Set NeuronPerf logging verbosity. set_verbosity(verbosity) # -------------------------------------------- # Input validation # -------------------------------------------- # Validate that enough information was provided. if not load_fn: raise ValueError( "You should call benchmark() through a framework submodule, e.g. neuronperf.torch.benchmark()." ) if not isinstance(model_filename, str): raise ValueError( "You must provide the path to a saved model or the path to a model index from neuronperf.compile()." ) # Useful for debugging. if not multiprocess and not multiinterpreter: log.warning("Benchmarking in a single process.") # Standardize inputs. dataset_inputs = dataset_loader_fn is not None if (not dataset_inputs) and (not isinstance(inputs, list)): inputs = [inputs] if isinstance(n_models, int): n_models = [n_models] if isinstance(pipeline_sizes, int): pipeline_sizes = [pipeline_sizes] if isinstance(performance_levels, int): performance_levels = [performance_levels] if workers_per_model is None: workers_per_model = [] elif isinstance(workers_per_model, int): workers_per_model = [workers_per_model] if duration < BENCHMARK_SECS: log.warning("Results may be unreliable with short test durations.") # If the model_filename is JSON, attempt to interpret it as a model index. index = None if model_filename.endswith(model_index.MODEL_INDEX_SUFFIX): index = model_index.load(model_filename) # If we loaded a model_index, ensure provided inputs are compatible # and use it to refine the benchmarking combinations we will run. if index: # Extract a model name from the index, if possible. if not model_name: model_name = index["model_name"] # If batch_sizes, pipeline_sizes and/or performance_levels were provided, # treat them as filters on the index. A value of None is treated as no filter. # See the docs for model_index.filter(). index = model_index.filter( index, status="finished", # only take compiled models batch_size=batch_sizes, # select all requested batch sizes pipeline_size=pipeline_sizes, performance_level=performance_levels, ) if 0 == len(index["model_configs"]): raise ValueError( "No models were found in the model index matching requested criteria. Check that compilation succeeded." ) # If a model index was provided without batch_sizes, extract the sizes from the index. if not batch_sizes: # Select unique batch_sizes in model index. batch_sizes = set(config["batch_size"] for config in index["model_configs"]) batch_sizes = sorted(list(batch_sizes)) # Validate batch sizes after attempting to extract from the model index. inputs, batch_sizes = _sanitize_inputs(inputs, batch_sizes, dataset_inputs) # If we still don't have a model name, use the filename. if not model_name: model_name = model_filename # If no pipeline_sizes are provided, we'll assume it's 1 for a single model unless told otherwise. if not pipeline_sizes: log.debug("Pipeline size was not specified, assuming 1.") pipeline_sizes = [1] # Assume max performance is desired. if not performance_levels: max_performance = max(FAST_MATH_OPTIONS) log.debug(f"Performance level was not specified, assuming {max_performance}.") performance_levels = [max_performance] # If a model was provided directly without a model index, build a dummy model index. # A single model can not possibly have been compiled for more than 1 configuration, # hence why we can assume index [0]. if not index: index = model_index.create( filename=model_filename, model_name=model_name, batch_size=batch_sizes[0], pipeline_size=pipeline_sizes[0], performance_level=performance_levels[0], ) model_configs = index["model_configs"] # -------------------------------------------- # Benchmarking # -------------------------------------------- # Estimate time remaining based on configs requested to run. # If n_models wasn't provided, the default benchmarks [min, max]. n_models_est = 2 if not n_models else len(n_models) # If workers_per_model wasn't provided, the default benchmarks [1, 2]. n_models_est *= 2 if not workers_per_model else len(workers_per_model) secs_remaining = len(model_configs) * n_models_est * duration mins_remaining = None if secs_remaining < 60 else round(secs_remaining / 60.0, 1) etr = f"{mins_remaining} minutes" if mins_remaining else f"{int(round(secs_remaining))} seconds" log.info("Benchmarking '{}', ~{} remaining.".format(model_filename, etr)) # Try to determine instance type. instance_type = get_instance_type() if not instance_type: instance_type = "unknown" # Try to automatically determine the maximum number of devices available. max_devices = _get_num_devices(device_type, instance_type) log.debug("Automatically determined number of devices: {}".format(max_devices)) # Try to detect cost / hour for this device. if not cost_per_hour: cost_per_hour = _get_cost_per_hour(instance_type) # Run through all requested combinations and generate a report. # This will produce a list of tuples, (config, results). all_results = [] def make_reports(): """Helper to generate reports from available results.""" # If all_results was set, we return the unmodified benchmarking results. return all_results if return_timers else get_reports(all_results, cost_per_hour) for model_config in model_configs: batch_size = model_config["batch_size"] pipeline_size = model_config["pipeline_size"] # Determine the number of model copies for each benchmarking session. model_counts = n_models # If the user didn't provide n_models, choose reasonable defaults. if not model_counts: # Try to run a single model and the max models supported on this hardware. if max_devices and (max_devices // pipeline_size > 1): model_counts = [1, max_devices // pipeline_size] else: model_counts = [1] # If the user provided model counts and we determine they are too large, emit a warning. else: if max_devices: model_counts_too_large = list( filter( lambda model_count: model_count * pipeline_size > max_devices, model_counts ) ) if model_counts_too_large: log.warning( ( "Some values of n_models exceed the number of devices available: " f"{model_counts_too_large} > {max_devices}" ) ) # Compute number of workers for this pipeline size, if not specified. n_workers = workers_per_model if not n_workers: n_workers = [_get_num_workers(pipeline_size)] # 1 worker thread == min latency if 1 not in n_workers: n_workers.insert(0, 1) for _workers_per_model in n_workers: # We now know everything we need to benchmark. # 1. Build a comprehensive benchmarker config, # 2. build one benchmarker per model, # 3. run the benchmarkers in parallel, # 4. and collect the results for this configuration. for model_count in model_counts: # 1. Benchmarker config config = { "model_filename": model_config["filename"], "model_name": model_name, "device_type": device_type, "instance_type": instance_type, "batch_size": batch_size, "n_models": model_count, "workers_per_model": _workers_per_model, "pipeline_size": pipeline_size, "n_devices": model_count * pipeline_size, "performance_level": model_config["performance_level"], "multiprocess": multiprocess, "multiinterpreter": multiinterpreter, "stats_interval": str(stats_interval), "start_dts": time.strftime("%Y%m%d-%H%M%S"), "duration": str(duration), } # 2. Build the benchmarkers benchmarkers = [] for benchmarker_id in range(model_count): benchmarker = Benchmarker( id=benchmarker_id, device_id=_get_device_id(benchmarker_id, config), load_fn=load_fn, model_filename=model_config["filename"], inputs=inputs if dataset_inputs else inputs[batch_sizes.index(batch_size)], workers_per_model=_workers_per_model, env_setup_fn=_get_env_setup_fn(benchmarker_id, config, env_setup_fn), setup_fn=_get_setup_fn(benchmarker_id, config, setup_fn), preprocess_fn=preprocess_fn, postprocess_fn=postprocess_fn, dataset_loader_fn=dataset_loader_fn, model_class_name=model_class_name, model_class_file=model_class_file, ) benchmarkers.append(benchmarker) # 3. Run benchmarkers in parallel log.debug("Running model config: {}".format(config)) try: results = run_benchmarkers( benchmarkers, duration, stats_interval=stats_interval, multiprocess=multiprocess, multiinterpreter=multiinterpreter, ) # 4. Collect results config["stop_dts"] = time.strftime("%Y%m%d-%H%M%S") all_results.append((config, results)) except KeyboardInterrupt: # If we are interrupted, return whatever we have on hand. return make_reports() except: # If something else goes wrong with the model, we should # log this configuration and move on. log.exception("Failure benchmarking config: {}".format(config)) return make_reports() ================================================ FILE: src/neuronperf/src/neuronperf/compile_constants.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.compile_constants ~~~~~~~~~~~~~~~~~~~~~~~ Holds constants used at compile time. """ NEURONCORE_PIPELINE_CORES = "--neuroncore-pipeline-cores" FAST_MATH = "--fast-math" FAST_MATH_OPTIONS = { 0: "none", 1: "fp32-cast-matmult no-fast-relayout", 2: "fp32-cast-matmult", 3: "all", } ================================================ FILE: src/neuronperf/src/neuronperf/cpu/__init__.py ================================================ from neuronperf.cpu.cpu import benchmark ================================================ FILE: src/neuronperf/src/neuronperf/cpu/cpu.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.cpu ~~~~~~~~~~~~~~~~~~~~~~~ Provides CPU support. """ import functools import logging from .. import benchmarking log = logging.getLogger(__name__) class DummyModel: def __call__(self, x): x *= 5 x += 3 return x def benchmark(model_class, inputs, *args, **kwargs): if not isinstance(model_class, type): raise TypeError("For CPU benchmarking, you must provide a class to instantiate.") device_type = kwargs.pop("device_type", "cpu") multiinterpreter = kwargs.pop("multiinterpreter", False) if multiinterpreter: log.warning( "CPU + multiinterpreter is not yet fully supported. You need to provide a custom load_fn that can import your class and instantiate it." ) # Create a custom load_fn that instantiates the model. def load_fn(*args, **kwargs): return model_class() kwargs["device_type"] = device_type kwargs["multiinterpreter"] = multiinterpreter return benchmarking.benchmark( load_fn, model_class.__name__, inputs, *args, **kwargs, ) ================================================ FILE: src/neuronperf/src/neuronperf/logging.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.logging ~~~~~~~~~~~~~~~~~~~~~~~ Provides logging utility functions. """ import logging FORMAT_STRING = '%(levelname)s:%(name)s - %(message)s' def _get_stream_handlers(level = logging.DEBUG): formatter = logging.Formatter(FORMAT_STRING) sh = logging.StreamHandler() sh.setLevel(logging.DEBUG) sh.setFormatter(formatter) return [sh] ================================================ FILE: src/neuronperf/src/neuronperf/model_index.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.model_index ~~~~~~~~~~~~~~~~~~~~~~~ Provides utilities for working with model indexes. """ from typing import Any, List, Union import builtins import copy as copy_module import itertools import json import logging import os import pathlib import random import shutil from .__version__ import __version__ from .compile_constants import FAST_MATH_OPTIONS log = logging.getLogger(__name__) MODEL_INDEX_SUFFIX = ".json" def generate_id(length: int = 8): """Generate a random-enough sequence to append to model names and prevent collisions.""" id_chars = "abcdefghijklmnopqrstuvwxyz0123456789" new_id = [id_chars[random.randrange(len(id_chars))] for _ in range(length)] return "".join(new_id) def generate_name(model_name: str): """Generate a model index name from a model name.""" return model_name + "_" + generate_id() + MODEL_INDEX_SUFFIX def _create(model_name: str, compile_info: list) -> dict: if not isinstance(compile_info, list): log.exception( "Expected a list of compile info dicts, received '{}'.".format(str(type(compile_info))) ) model_index = { "NeuronPerf_version": __version__, "model_name": model_name, "model_configs": compile_info, } return model_index def create( filename: str, model_name: str = None, batch_size: int = 1, pipeline_size: int = 1, performance_level: int = max(FAST_MATH_OPTIONS), compile_s: float = None, status: str = "finished", ) -> dict: r""" Create a new model index from a pre-compiled model. :param str filename: The path to the compiled model. :param str model_name: A friendly name for the model. Will default to filename. :param int batch_size: The batch size at compilation for this model. :param int pipeline_size: The pipeline size used at compilation for this model. :param int performance_level: The performance level this model was compiled with. :param float compile_s: Seconds spent compiling. :param str status: A string describing compilation result. Can be "finished" or "error". :return: A new dictionary representing a model index. :rtype: dict """ if not model_name: model_name = filename compile_info = [ { "filename": filename, "batch_size": batch_size, "pipeline_size": pipeline_size, "performance_level": performance_level, "compile_s": compile_s, "status": status, } ] return _create(model_name, compile_info) def delete(filename: str): """Deletes the model index and all associated models referenced by the index.""" if not os.path.exists(filename): log.warning("Asked to delete '{}', but it can't be located.".format(filename)) return # Load the index configs = load(filename)["model_configs"] # Remove all referenced models model_filenames = map(lambda x: x["filename"], itertools.chain(configs)) for model_filename in model_filenames: log.debug(f"Deleting '{model_filename}'.") if os.path.exists(model_filename): if os.path.isdir(model_filename): shutil.rmtree(model_filename) else: os.remove(model_filename) # Finally, remove the model index itself log.debug(f"Deleting '{filename}'") os.remove(filename) def copy(old_index: Union[str, dict], new_index: str, new_dir: str) -> str: r""" Copy an index to a new location. Will rename ``old_index`` to ``new_index`` and copy all model files into ``new_dir``, updating the index paths. This is useful for pulling individual models out of a pool. Returns the path to the new index. """ os.makedirs(new_dir, exist_ok=True) index = _sanitize(old_index)[0].copy() configs = index["model_configs"] for config in configs: path = pathlib.Path(config["filename"]) config["filename"] = str(shutil.copy2(path, new_dir)) return save(index, new_index) def move(old_index: str, new_index: str, new_dir: str) -> str: """This is the same as ``copy`` followed by ``delete`` on the old index.""" index = copy(old_index, new_index, new_dir) delete(old_index) return index def _sanitize(*model_indexes: Union[str, dict]) -> List[dict]: r""" Helper function to load indexes if strings are provided. If already loaded, this is a no-op. """ if not model_indexes: raise ValueError("No model indexes were provided.") indexes = [] # Load any paths provided and sanity check all inputs. for index in model_indexes: if not index: raise ValueError("An empty value was received, but expected a model index.") if isinstance(index, str): index = load(index) if not isinstance(index, dict): raise TypeError("Expected a model index, but received '{}'.".format(str(type(None)))) if not len(index) > 0: raise ValueError("Received an empty model index.") indexes.append(index) # Check versions are all the same, and emit a warning if they aren't. versions = set(map(lambda x: x["NeuronPerf_version"], indexes)) if len(versions) > 1: log.warning("Received model with different versions: '{}'.".format(str(versions))) model_name = indexes[0]["model_name"] # Ensure model names are matching. if not all(model_name == index["model_name"] for index in indexes): model_names = list(set(map(lambda x: x["model_name"], indexes))) log.warning("Received model indexes with different model names: {}".format(model_names)) return indexes def append(*model_indexes: Union[str, dict]) -> dict: r""" Appends the model indexes non-destructively into a new model index, without modifying any of the internal data. This is useful if you have benchmarked multiple related models and wish to combine their respective model indexes into a single index. Model name will be taken from the first index provided. Duplicate configs will be filtered. :param Union[str, dict] model_indexes: Model indexes or paths to model indexes to combine. :return: A new dictionary representing the combined model index. :rtype: dict """ indexes = _sanitize(*model_indexes) # Extract the model configs from the indexes config_iter = map(lambda index: copy_module.deepcopy(index["model_configs"]), indexes) # Combine the model configs combined = list(itertools.chain.from_iterable(config_iter)) # Split unique and duplicate configs duplicate = [] unique = [] for config in combined: if config in unique: duplicate.append(config) else: unique.append(config) if len(duplicate) > 0: log.warning( ( f"There were {len(duplicate)} duplicate model configs " "filtered. The duplicates were:\n" "{}".format("\n".join(map(lambda c: str(c), duplicate))) ) ) # Build new index from configs return _create(indexes[0]["model_name"], unique) def save(model_index: dict, filename: str = None, root_dir=None) -> str: r"""Save a NeuronPerf model index to a file.""" if not filename: model_name = model_index["model_name"] filename = generate_name(model_name) if not filename.lower().endswith(MODEL_INDEX_SUFFIX): filename += MODEL_INDEX_SUFFIX if not root_dir: root_dir = "." try: with open(os.path.join(root_dir, filename), "w") as fp: json.dump(model_index, fp) except OSError: log.exception("Failed to write '{}'.".format(filename)) return filename def load(filename) -> dict: """Load a NeuronPerf model index from a file.""" model_index = None try: with open(filename, "r") as fp: model_index = json.load(fp) except OSError: # file is probably not a model index log.exception("Failed to load model index '{}'".format(filename)) else: from distutils.version import LooseVersion try: if LooseVersion(model_index["NeuronPerf_version"]) > LooseVersion(__version__): log.warning( "Model index newer than NeuronPerf (version {} > {}). Try updating NeuronPerf.".format( model_index["NeuronPerf_version"], __version__ ) ) except TypeError: log.warning( "Couldn't compare model index version ({}) to NeuronPerf version ({}), continuing anyway.".format( model_index["NeuronPerf_version"], __version__ ) ) return model_index def filter_configs(configs, filter_name, filter_values) -> List: """Filters provided configs on specified filter and value and returns a new config list.""" if filter_values is None: return configs.copy() # Filter on configs that have the filter_name and value is in filter_values if not isinstance(filter_values, list): filter_values = [filter_values] return list( builtins.filter( lambda config: filter_name in config and config[filter_name] in filter_values, configs ) ) def filter(index: Union[str, dict], **kwargs) -> dict: r""" Filters provided model index on provided criteria and returns a new index. Each kwarg is a standard (k, v) pair, where k is treated as a filter name and v may be one or more values used to filter model configs. """ index = _sanitize(index)[0].copy() # Filter each config on provided kwargs pairs. configs = index["model_configs"] for k, v in kwargs.items(): configs = filter_configs(configs, k, v) index["model_configs"] = configs return index ================================================ FILE: src/neuronperf/src/neuronperf/mxnet/__init__.py ================================================ from neuronperf.mxnet.mxnet import benchmark, compile ================================================ FILE: src/neuronperf/src/neuronperf/mxnet/mxnet.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.mxnet ~~~~~~~~~~~~~~~~~~~~~~~ Provides Apache MXNet support. """ import contextlib import functools import os import threading # handle different API versions of mxnet import mxnet as mx from distutils.version import LooseVersion if LooseVersion(mx.__version__) >= LooseVersion("1.8"): _mx_version = 1.8 import mx_neuron as neuron else: _mx_version = 1.5 from mxnet.contrib import neuron from .. import benchmarking class _MXNetModelWrapper: def __init__(self, device_id, sym, args, aux): self.device_id = device_id self.sym = sym self.args = args self.aux = aux self.ctx = None self.exes = {} self.lock = threading.Lock() def __call__(self, inputs): # on the first inference, do prep work if not self.ctx: self.ctx = mx.neuron(self.device_id) # prepare inputs for model for k, v in inputs.items(): inputs[k] = mx.nd.array(v) self.args.update(inputs) # obtain an executor for this thread thread_id = threading.get_ident() if thread_id not in self.exes: with self.lock: exe = self.sym.bind( ctx=self.ctx, args=self.args, aux_states=self.aux, grad_req="null" ) self.exes[thread_id] = exe else: exe = self.exes[thread_id] # run inference outputs = exe.forward(**inputs) mx.nd.waitall() return outputs[0] @contextlib.contextmanager def change_dir(new_dir): old_dir = os.getcwd() os.chdir(os.path.join(old_dir, new_dir)) try: yield finally: os.chdir(old_dir) def _load_fn(model_filename, **kwargs): device_id = kwargs.get("device_id", 0) sym, args, aux = mx.model.load_checkpoint(model_filename, 0) return _MXNetModelWrapper(device_id, sym, args, aux) def _compile_fn(model, example_inputs, models_dir, model_name, **kwargs): _sym, _args, _aux = model model_filename = os.path.join(models_dir, model_name) compiler_args = kwargs.pop("compiler_args", {}) # MXNet passes additional kwargs directly to compiler _sym, _args, _aux = neuron.compile( _sym, _args, _aux, example_inputs, **compiler_args, ) with change_dir(models_dir): mx.model.save_checkpoint(model_name, 0, _sym, _args, _aux) return model_filename def compile(model, inputs, *args, **kwargs): return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs) def benchmark(model_filename, inputs, *args, **kwargs): env_setup_fn = kwargs.pop("env_setup_fn", lambda *_: None) # Use a custom setup function to handle MXNet concurrency requirements. @functools.wraps(env_setup_fn) def _env_setup_fn(id, config): workers_per_model = str(config["workers_per_model"]) os.environ["MXNET_CPU_TEMP_COPY"] = workers_per_model os.environ["MXNET_EXEC_NUM_TEMP"] = workers_per_model os.environ["MXNET_CPU_WORKER_NTHREADS"] = workers_per_model os.environ["MXNET_MP_WORKER_NTHREADS"] = workers_per_model # Remember to call any additional custom setup provided. env_setup_fn(id, config) kwargs["env_setup_fn"] = _env_setup_fn return benchmarking.benchmark(_load_fn, model_filename, inputs, *args, **kwargs) ================================================ FILE: src/neuronperf/src/neuronperf/py.typed ================================================ # Marker file that indicates this package supports typing ================================================ FILE: src/neuronperf/src/neuronperf/reporting.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.reporting ~~~~~~~~~~~~~~~~~~~~ Provides utilities for producing reports from benchmarking results. """ from typing import List import csv import itertools import json import logging import time import numpy as np from . import __version__ log = logging.getLogger(__name__) CSV_COLS = [ "model_name", "n_models", "workers_per_model", "pipeline_size", "batch_size", "throughput_avg", "throughput_peak", "latency_ms_p0", "latency_ms_p50", "latency_ms_p90", "latency_ms_p95", "latency_ms_p99", "latency_ms_p100", "cpu_avg_percent", "cpu_percent_p50", "mem_avg_percent", "mem_percent_p50", "e2e_avg_ms", "infer_avg_ms", "total_infs", "total_s", "performance_level", "model_filename", "device_type", "instance_type", "cost_per_1m_inf", ] PRINT_COLS = [ "throughput_avg", "latency_ms_p50", "latency_ms_p99", "n_models", "pipeline_size", "workers_per_model", "batch_size", "model_filename", ] REQUIRED_CONFIG_KEYS = [ "multiprocess", "multiinterpreter", "device_type", "batch_size", "model_filename", "model_name", "n_models", "pipeline_size", ] REQUIRED_RESULTS_KEYS = [ "workers_per_model", "status", "timers", "n_infs", "total_s", ] def _validate_config(config): for required_key in REQUIRED_CONFIG_KEYS: if required_key not in config: raise ValueError( ( f"Model config is missing required key '{required_key}'. " "Something probably went wrong during benchmarking. Provided:\n{config}" ) ) def _validate_results(results): for required_key in REQUIRED_RESULTS_KEYS: if required_key not in results: raise ValueError( ( f"Benchmarking results are missing required key '{required_key}'. " "Something probably went wrong during benchmarking. Provided:\n{results}" ) ) def _get_report_name(model_name: str) -> str: return "{}.results-{}".format(model_name, time.strftime("%Y%m%d-%H%M%S")) def get_report( benchmark_results, cost_per_hour: float = None, window_size: int = 1, verbosity: int = 0 ) -> dict: r"""Get a performance report from benchmarker results. :param benchmark_results: Results from a :class:`benchmarking:Benchmarker` object. :param float cost_per_hour: The cost / hour for this device. :param int window_size: Window size in seconds used to measure throughput. :param int verbosity: Controls logging during report generation. Use 0 (default), 1, or 2. :returns: A dictionary containing performance information. """ report = {} config, results = benchmark_results _validate_config(config) _validate_results(results) try: report["NeuronPerf_version"] = __version__ # copy benchmarker info from config into report for k, v in config.items(): report[k] = v # number of intervals is the same across all stats, so we can use this as a proxy report["n_stats_intervals"] = len(results["cpu_percents"]) report["workers_per_model"] = results["workers_per_model"] report["status"] = results["status"] # timing stats report["load_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["load"]), float ).mean() report["input_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["input"]), float ).mean() report["warmup_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["warmup"]), float ).mean() report["env_setup_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["env_setup"]), float ).mean() report["setup_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["setup"]), float ).mean() report["preprocess_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["preprocess"]), float ).mean() report["infer_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["infer"]), float ).mean() report["postprocess_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["postprocess"]), float ).mean() report["e2e_avg_ms"] = np.fromiter( (t.avg("ms") for t in results["timers"]["e2e"]), float ).mean() report["worker_avg_s"] = round( np.fromiter((t.avg("s") for t in results["timers"]["worker"]), float).mean(), 2 ) report["total_infs"] = results["n_infs"] * config["batch_size"] report["total_s"] = round(results["total_s"], 2) percentiles = [0, 50, 90, 95, 99, 100] cpu_percents = np.fromiter(results["cpu_percents"], float) if cpu_percents.size > 2: cpu_percentiles = np.percentile(cpu_percents[1:-1], percentiles) report["cpu_avg_percent"] = cpu_percentiles.mean() for i, p in enumerate(percentiles): report[f"cpu_percent_p{p}"] = cpu_percentiles[i] mem_percents = np.fromiter(results["mem_percents"], float) if mem_percents.size > 2: mem_percentiles = np.percentile(mem_percents[1:-1], percentiles) report["mem_avg_percent"] = mem_percentiles.mean() for i, p in enumerate(percentiles): report[f"mem_percent_p{p}"] = mem_percentiles[i] # latency latencies = np.fromiter( itertools.chain.from_iterable(t.durations("ms") for t in results["timers"]["e2e"]), float, ) latency_percentiles = np.percentile(latencies, percentiles) for i, p in enumerate(percentiles): report["latency_ms_p{}".format(p)] = latency_percentiles[i] # bucketize ending timestamps end_timestamps = np.fromiter( itertools.chain.from_iterable(t.end_timestamps("s") for t in results["timers"]["e2e"]), float, ) bucket_ends = np.floor(end_timestamps / window_size) # group timestamps by window and correct for batch size _, bucket_counts = np.unique(bucket_ends, return_counts=True) bucket_counts *= config["batch_size"] # find max and normalize by window size report["throughput_peak"] = bucket_counts.max() / window_size report["throughput_avg"] = bucket_counts[1:-1].mean() / window_size if verbosity > 0: report["throughput_hist"] = bucket_counts if verbosity > 1: report["e2e_durations_ms"] = np.fromiter( (t.durations("ms") for t in results["timers"]["e2e"]), float ) # Try to estimte cost / inference if cost_per_hour: try: infs_per_hour = 3600 * report["throughput_avg"] report["cost_per_1m_inf"] = cost_per_hour * (1_000_000 / infs_per_hour) except: # We'll ignore this, as it's caused by a missing field that would have # already generated an earlier error log. We should continue producing # a report nonetheless. pass # Truncate floats to 3 places for readability. for key, value in report.items(): if isinstance(value, float): report[key] = round(value, 3) except: log.exception( ( "Failed to produce a report from benchmarking results. " "Something probably went wrong during benchmarking." ) ) return report def get_reports(results, cost_per_hour: float = None) -> List[dict]: r""" Summarizes and combines the detailed results from ``neuronperf.benchmark``, when run with ``return_timers=True``. One report dictionary is produced per model configuration benchmarked. The list of reports can be fed directly to other reporting utilities, such as ``neuronperf.write_csv``. :param results: Benchmarker results. :param float cost_per_hour: The cost / hour for this device. """ reports = [] for idx, (config, result) in enumerate(results): try: _validate_config(config) _validate_results(result) except ValueError: log.exception(f"Result {idx} is missing required information, skipping.") continue report = get_report((config, result), cost_per_hour) reports.append(report) return reports def print_reports(reports: List[dict], cols=PRINT_COLS, sort_by="throughput_peak", reverse=False): r"""Print a subset of report cols to the terminal. :param reports: Results from `get_reports`. :param cols: The columns in the report to be displayed. :param sort_by: Sort the cols by the specified key. :param reverse: Sort order. """ if not reports: print("No reports were found. Did benchmarking succeed?") return # Print headers. col_width = max(map(lambda col: len(col), cols)) + 1 row_format = "{{:<{}}}".format(col_width) * len(cols) print(row_format.format(*cols)) # Extract all rows. rows = [] for report in reports: row = [] for col in cols: row.append(report[col] if col in report else "N/A") rows.append(row) # Sort rows by the specified key, if the key exists. if sort_by in cols: sort_index = cols.index(sort_by) rows = sorted(rows, key=lambda row: row[sort_index], reverse=reverse) # Print all rows. for row in rows: print(row_format.format(*row)) def write_csv(reports: List[dict], filename: str = None, cols=CSV_COLS): r"""Write a benchmarking report to CSV file. :param reports: Results from `get_reports`. :param filename: File name to write out. If not provided, generated from model_name in report and current timestamp. :param cols: The columns in the report to be kept. """ if not filename: filename = "{}.csv".format(_get_report_name(reports[0]["model_name"])) try: with open(filename, "w", newline="", encoding="utf-8") as csvfile: writer = csv.writer(csvfile) writer.writerow(cols) for idx, report in enumerate(reports): row = [] for col in cols: if col in report: row.append(report[col] if report[col] is not None else "N/A") else: log.debug(f"Report {idx} is missing field '{col}'.") row.append("N/A") writer.writerow(row) return filename except OSError: log.exception(f"Failed to write '{filename}'. Check that you have write permissions.") def write_json(reports: List[dict], filename: str = None): if not filename: filename = "{}.json".format(_get_report_name(reports[0]["model_name"])) try: with open(filename, "w", encoding="utf-8") as jsonfile: json.dump(reports, jsonfile) return filename except OSError: log.exception( ( f"Failed to write '{filename}'. Check that the report " "contains data and that you have write permissions." ) ) ================================================ FILE: src/neuronperf/src/neuronperf/scripts/__init__.py ================================================ ================================================ FILE: src/neuronperf/src/neuronperf/scripts/run_benchmark_file.py ================================================ import argparse import dill import neuronperf def main(): parser = argparse.ArgumentParser( prog="benchmark", description="Run a serialized Benchmarker for a given `duration`. Upon " "success overwrite `filename` with the updated Benchmarker", ) parser.add_argument("filename", type=str, help="The serialized Benchmarker") parser.add_argument("duration", type=float, help="The duration of each config (seconds)") parser.add_argument("--model_class_name", type=str, help="The name of a model class to load") parser.add_argument("--model_class_file", type=str, help="Path to Python module defining model_class_name") args = parser.parse_args() try: # If we were provided with a model class to import before deserialization, we need # to handle that now. The class will be manually imported. if args.model_class_name and args.model_class_file: import importlib.util spec = importlib.util.spec_from_file_location( args.model_class_name, args.model_class_file ) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) globals()[args.model_class_name] = getattr(module, args.model_class_name) # Load the benchmarker object with open(args.filename, "rb") as f: benchmarker = dill.load(f) # Execute the benchmarker result = neuronperf.benchmarking.run_benchmarker(benchmarker, args.duration) # Write the result back to the same file with open(args.filename, "wb") as f: dill.dump(result, f) except: # Dump traceback to a file for debugging. import os import sys import traceback from pathlib import Path path = Path(args.filename) filename = os.path.join(path.parent, "neuronperf_error_{}".format(path.stem)) trace = "".join(traceback.format_exception(*sys.exc_info())) with open(filename, "wt") as err_fp: err_fp.write(trace) if __name__ == "__main__": main() ================================================ FILE: src/neuronperf/src/neuronperf/tensorflow/__init__.py ================================================ from neuronperf.tensorflow.tensorflow import benchmark, compile ================================================ FILE: src/neuronperf/src/neuronperf/tensorflow/tensorflow.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.tensorflow ~~~~~~~~~~~~~~~~~~~~~~~ Provides TensorFlow support. """ import itertools import logging import os import threading from .. import benchmarking log = logging.getLogger(__name__) _lock = threading.Lock() def _load_fn(model_file, **kwargs): with _lock: import tensorflow as tf if tf.__version__.startswith("1"): return tf.contrib.predictor.from_saved_model(model_file) else: import tensorflow.keras as keras return keras.models.load_model(model_file) def _compile_fn(model, inputs, models_dir, model_name, **kwargs): import tensorflow as tf import tensorflow.neuron as tfn model_filename = os.path.join(models_dir, model_name) # NeuronPerf provides compiler_args as a dictionary, but framework expects a different format. compiler_args = kwargs.pop("compiler_args", {}) if tf.__version__.startswith("1"): compiler_args_flattened = list(itertools.chain.from_iterable(compiler_args.items())) kwargs["compiler_args"] = compiler_args_flattened kwargs["model_feed_dict"] = inputs # For TF 1.x, the saved model path is expected instead of a loaded model. tfn.saved_model.compile(model, model_filename, **kwargs) else: if compiler_args: compiler_args_flattened = " ".join( ["{}={}".format(k, v) for k, v in compiler_args.items()] ) os.environ["NEURON_CC_FLAGS"] = compiler_args_flattened else: os.environ["NEURON_CC_FLAGS"] = "" model_neuron = tfn.trace(model, inputs, **kwargs) model_neuron.save(model_filename) return model_filename def compile(model, inputs, *args, **kwargs): return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs) def benchmark(model_filename, inputs, *args, **kwargs): # Tensorflow-neuron is not currently fork safe, so we workaround this during benchmarking # by spawning a fresh interpreter session for each model we benchmark. if "multiinterpreter" in kwargs and not kwargs["multiinterpreter"]: log.warning( "Setting multiinterpreter=False is not safe with TensorFlow. Use at your own risk." ) else: kwargs["multiinterpreter"] = True return benchmarking.benchmark(_load_fn, model_filename, inputs, *args, **kwargs) ================================================ FILE: src/neuronperf/src/neuronperf/timing.py ================================================ # -*- coding: utf-8 -*- """ neuronperf._timing ~~~~~~~~~~~~~~~~~~~~~~~ Provides utility functions for timing and time unit conversions. """ from typing import Any, Callable import sys import time import typing import numpy as np time_unit_ratios = { 'ns': { 'ns': 1, 'us': 1e-3, 'ms': 1e-6, 's': 1e-9 }, 'us': { 'ns': 1e3, 'us': 1, 'ms': 1e-3, 's': 1e-6 }, 'ms': { 'ns': 1e6, 'us': 1e3, 'ms': 1, 's': 1e-3 }, 's': { 'ns': 1e9, 'us': 1e6, 'ms': 1e3, 's': 1 } } supported_time_units = time_unit_ratios.keys() def timestamp_convert(timestamps, input_time_unit: str, output_time_unit: str): """Convert timestamp(s) from one time unit to another. :param ts: A timestamp or iterable of timestamps. :param input_time_unit: A string specifying the input time unit. :param output_time_unit: A string specifying the output time unit. :returns: A single timestamp or container of timestamps in the output time unit. """ try: ratio = time_unit_ratios[input_time_unit][output_time_unit] except: raise ValueError(f"Can't convert {input_time_unit} to {output_time_unit}") return timestamps * ratio class Timer(): def __init__(self, timer_fn: Callable[[], Any] = time.perf_counter, timer_unit: str = 's'): self.timer_fn = timer_fn self.timer_unit = timer_unit self._start = [] self._end = [] def __enter__(self): self.start() def __exit__(self, type, value, traceback): self.stop() def __delitem__(self, index): del self._start[index] del self._end[index] def __getitem__(self, index): # it's possible that start and end won't match if negative indices are used, # b/c timer may have started and not stopped yet if index < 0: index = index % len(self._end) return self._start[index], self._end[index] def __iter__(self): return zip(self._start, self._end) def __len__(self): return len(self._end) def __str__(self): return str(self.timestamps()) def start(self): # If we've already started, consider this a request to restart. # This also handles partial timestamps due to a Timer-unrelated error. if len(self._start) > len(self._end): self._start.pop() self._start.append(self.timer_fn()) def stop(self): # if we haven't started, ignore this if 0 == len(self._start): return self._end.append(self.timer_fn()) def next(self): """Manually advance the timer to the next timestamp measurement.""" self.stop() self.start() def reset(self): self._start.clear() self._end.clear() def insert(self, timestamps: tuple, time_unit: str): """Manually insert a timestamp pair. Does not affect ongoing timing. :param timestamps: Timestamp pair to insert. :param time_unit: The time unit of the incoming timestamps. """ if len(timestamps) != 2 or not time_unit: raise ValueError() timestamps = timestamp_convert(np.array(timestamps), time_unit, self.timer_unit) self._start.insert(0, timestamps[0]) self._end.insert(0, timestamps[1]) def start_timestamps(self, time_unit: str = None): if not time_unit: return np.array(self._start) return timestamp_convert(np.array(self._start), self.timer_unit, time_unit) def end_timestamps(self, time_unit: str = None): if not time_unit: return np.array(self._end) return timestamp_convert(np.array(self._end), self.timer_unit, time_unit) def timestamps(self, time_unit: str = None): """Returns a list of pairs of timestamps (start, end). :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit. """ starts, ends = self.start_timestamps(time_unit), self.end_timestamps(time_unit) return np.stack((starts[:len(ends)], ends), axis=-1) def durations(self, time_unit: str = None): """Returns an `ndarray` of timestamp deltas, optionally converted into a provided time unit. :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit. :returns: An `ndarray` of timestamp deltas. """ starts, ends = self.start_timestamps(), self.end_timestamps() return timestamp_convert(ends - starts[:len(ends)], self.timer_unit, time_unit) def total_duration(self, time_unit: str = None): """Returns total duration of all time measurements, optionally converted into a provided time unit. :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit. : """ starts, ends = self.start_timestamps(), self.end_timestamps() total = np.sum(ends - starts[:len(ends)]) return total if not time_unit else timestamp_convert(total, self.timer_unit, time_unit) def avg(self, time_unit: str = None): """Returns average duration, optionally converted into a provided time unit. :param time_unit: The time unit of the output timestamp(s). `None` will use the timer's native unit. :returns: The average duration. """ return self.durations(time_unit).mean() if len(self._end) > 0 else 0 ================================================ FILE: src/neuronperf/src/neuronperf/torch/__init__.py ================================================ from neuronperf.torch.torch import benchmark, compile ================================================ FILE: src/neuronperf/src/neuronperf/torch/torch.py ================================================ # -*- coding: utf-8 -*- """ neuronperf.torch ~~~~~~~~~~~~~~~~~~~~~~~ Provides PyTorch support. """ import functools import itertools import logging import math import os import types import torch from .. import benchmarking log = logging.getLogger(__name__) def _compile_fn(model, example_inputs, models_dir, model_name, **kwargs): import torch_neuron """Compiles a model for Neuron.""" model_filename = os.path.join(models_dir, "{}.pt".format(model_name)) model.eval() # NeuronPerf provides compiler_args as a dictionary, but framework expects a different format. compiler_args = kwargs.get("compiler_args", {}) compiler_args_flattened = list(itertools.chain.from_iterable(compiler_args.items())) kwargs["compiler_args"] = compiler_args_flattened model_neuron = torch.neuron.trace( model, example_inputs, **kwargs, ) model_neuron.save(model_filename) return model_filename def _load_fn(model_filename, **kwargs): import torch_neuron model = torch.jit.load(model_filename) model.eval() return model def _class_load_fn(model_class, **kwargs): model = model_class() model.eval() return model def compile(model, inputs, *args, **kwargs): return benchmarking.compile(_compile_fn, model, inputs, *args, **kwargs) # See: https://pytorch.org/docs/stable/data.html#dataset-types def _get_dataset_loader_fn(dataset, loop): def _worker_init_fn(worker_id): # This function will be called for each worker by torch. worker_info = torch.utils.data.get_worker_info() worker_id = worker_info.id num_workers = worker_info.num_workers dataset = worker_info.dataset # the dataset copy in this worker process per_worker = int(math.ceil(len(dataset) / float(num_workers))) start = worker_id * per_worker end = min(start + per_worker, len(dataset)) log.debug( "worker_id={}, num_workers={}, per_worker={}, start={}, end={}".format( worker_id, num_workers, per_worker, start, end ) ) # We monkey-patch the dataset __iter__ function to support a multi-worker config. def _iter(self, start, end, loop): if loop: return itertools.cycle(range(start, end)) else: return iter(range(start, end)) __iter__ = functools.partial(_iter, start, end, loop) dataset.__iter__ = types.MethodType(__iter__, dataset) def dataset_loader_fn(dataset, num_workers): return iter( torch.utils.data.DataLoader( dataset, num_workers=num_workers, worker_init_fn=_worker_init_fn ) ) return dataset_loader_fn def benchmark(model_filename, inputs, *args, dataset_inputs=False, loop_dataset=False, **kwargs): # These functions may need to be overridden or wrapped, depending upon config requested. load_fn = _load_fn setup_fn = kwargs.get("setup_fn", lambda *args, **kwargs: None) preprocess_fn = kwargs.get("preprocess_fn", lambda *args: (*args,)) # If cuda is requested, ensure it's available and provide smart wrappers for CUDA device loading. device_type = kwargs.get("device_type", None) use_cuda = device_type and ("cuda" in device_type.lower() or "gpu" == device_type.lower()) if use_cuda: if not torch.cuda.is_available(): raise ValueError( "You requested CUDA benchmarking, but torch is unable to locate a CUDA device." ) # Must use multiinterpreter for CUDA. if "multiinterpreter" in kwargs and not kwargs["multiinterpreter"]: log.warning( ( "You set multiinterpreter to False, but it is required for safe CUDA benchmarking.\n" "Your preference has been overridden so that benchmarking may continue." ) ) kwargs["multiinterpreter"] = True # If we received a non-string, use class-based load function if not isinstance(model_filename, str): # In GPU benchmarking, a model class is expected. This line is for clarity. model_class = model_filename if not isinstance(model_class, type): raise TypeError("GPU benchmarking expects a model class to be provided instead of a filename.") # We must also know the name of the file to import from, so that serialization can succeed. import inspect try: model_class_file = inspect.getfile(model_class) kwargs["model_class_file"] = model_class_file kwargs["model_class_name"] = model_class.__name__ except: raise ValueError( ( "Your model class must be defined in a Python module so that it can be serialized properly.\n" "Please add your model to a simple Python file along with any required imports." ) ) @functools.wraps(_class_load_fn) def load_fn(*args, **kwargs): return _class_load_fn(model_class, **kwargs) # Now swap the class object for its name so the benchmarker still receives a string. model_filename = model_class.__name__ # Wrap setup_fn so that it moves the model to CUDA device. @functools.wraps(setup_fn) def _setup_fn(id, config, model): setup_fn(id, config, model) model.to("cuda") kwargs["setup_fn"] = _setup_fn # Wrap preprocess_fn with one that moves inputs to CUDA. @functools.wraps(preprocess_fn) def _preprocess_fn(*inputs): inputs = preprocess_fn(*inputs) for input in inputs: input.to("cuda") return (*inputs,) kwargs["preprocess_fn"] = _preprocess_fn # When custom datasets are used, a loader function will need to be available in subprocesses. dataset_loader_fn = None if dataset_inputs: dataset_loader_fn = _get_dataset_loader_fn(example_inputs, loop_dataset) kwargs["dataset_loader_fn"] = dataset_loader_fn with torch.no_grad(): return benchmarking.benchmark( load_fn, model_filename, inputs, *args, **kwargs, ) ================================================ FILE: src/neuronperf/test/test_neuronperf.py ================================================ # -*- coding: utf-8 -*- import json import os import pathlib import shutil import time import numpy as np import pytest import neuronperf @pytest.mark.sanity def test_timer(): timer = neuronperf.Timer() with timer: time.sleep(1) # sanity check assert timer.total_duration("s") > 0.5 and timer.total_duration("s") < 1.5 # check conversions are functional assert ( timer.total_duration("ns") > timer.total_duration("us") > timer.total_duration("ms") > timer.total_duration("s") ) # check timestamp deltas are close to total assert timer.total_duration("s") == pytest.approx(timer.durations("s").sum()) # check iteration functions for _ in range(10): with timer: time.sleep(0.01) assert len(timer) > 10 # check that timer always returns pairs timestamps = timer.timestamps() for pair in timestamps: assert 2 == len(pair) assert pair[1] > pair[0] # check that len is functional assert len(timer) == len(timestamps) @pytest.mark.sanity def test_timestamp_convert(): # test scalar behavior assert 1000 == pytest.approx(neuronperf.timestamp_convert(1, "s", "ms")) assert 1.5 == pytest.approx(neuronperf.timestamp_convert(1500, "ms", "s")) assert 2.3e6 == pytest.approx(neuronperf.timestamp_convert(2.3, "s", "us")) # test array behavior times = np.array([1, 2, 3]) times_ms = neuronperf.timestamp_convert(times, "s", "ms") assert 1000 == pytest.approx(times_ms[0]) @pytest.mark.sanity def test_model_index_create_from_file(): filename = "dummy_model.ext" model_name = "dummy" index = neuronperf.model_index.create(filename, model_name=model_name) assert index["model_name"] == model_name assert len(index["model_configs"]) == 1 assert index["model_configs"][0]["filename"] == filename @pytest.mark.sanity def test_model_index_create_delete_save_load(): filename = "dummy_index.json" if os.path.exists(filename): neuronperf.model_index.delete(filename) model_name = "Dummy" model_filename = os.path.join("models", "dummy.model") model_index = neuronperf.model_index.create(model_filename, model_name=model_name) neuronperf.model_index.save(model_index, filename=filename) assert os.path.exists(filename) model_index_loaded = neuronperf.model_index.load(filename) assert model_index_loaded == model_index assert model_index_loaded["model_name"] == model_name assert model_index_loaded["model_configs"][0]["batch_size"] == 1 neuronperf.model_index.delete(filename) assert not os.path.exists(filename) @pytest.mark.sanity def test_model_index_copy(): filename = "dummy_index.json" if os.path.exists(filename): neuronperf.model_index.delete(filename) model_filename = os.path.join("models", "dummy.model") os.makedirs("models", exist_ok=True) pathlib.Path(model_filename).touch() model_name = "Dummy" model_index = neuronperf.model_index.create(model_filename, model_name=model_name) neuronperf.model_index.save(model_index, filename=filename) # Test copy API using a pre-loaded model inndex neuronperf.model_index.copy(model_index, "new_index.json", "new_models") assert os.path.exists("models") assert os.path.exists(model_filename) assert os.path.exists("new_index.json") assert os.path.exists(os.path.join("new_models", "dummy.model")) new_index = neuronperf.model_index.load("new_index.json") assert new_index["model_configs"][0]["filename"] == os.path.join("new_models", "dummy.model") neuronperf.model_index.delete(filename) neuronperf.model_index.delete("new_index.json") shutil.rmtree("new_models") shutil.rmtree("models") @pytest.mark.sanity def test_model_index_copy_2(): filename = "dummy_index.json" if os.path.exists(filename): neuronperf.model_index.delete(filename) model_filename = os.path.join("models", "dummy.model") os.makedirs("models", exist_ok=True) pathlib.Path(model_filename).touch() model_name = "Dummy" model_index = neuronperf.model_index.create(model_filename, model_name=model_name) neuronperf.model_index.save(model_index, filename=filename) # Test copy API using a file neuronperf.model_index.copy(filename, "new_index.json", "new_models") assert os.path.exists("models") assert os.path.exists(model_filename) assert os.path.exists("new_index.json") assert os.path.exists(os.path.join("new_models", "dummy.model")) new_index = neuronperf.model_index.load("new_index.json") assert new_index["model_configs"][0]["filename"] == os.path.join("new_models", "dummy.model") neuronperf.model_index.delete(filename) neuronperf.model_index.delete("new_index.json") shutil.rmtree("new_models") shutil.rmtree("models") @pytest.mark.sanity def test_model_index_move(): filename = "dummy_index.json" if os.path.exists(filename): neuronperf.model_index.delete(filename) model_filename = os.path.join("models", "dummy.model") os.makedirs("models", exist_ok=True) pathlib.Path(model_filename).touch() model_name = "Dummy" model_index = neuronperf.model_index.create(model_filename, model_name=model_name) neuronperf.model_index.save(model_index, filename=filename) neuronperf.model_index.move(filename, "new_index.json", "new_models") assert not os.path.exists(filename) assert not os.path.exists(model_filename) assert os.path.exists("new_index.json") assert os.path.exists(os.path.join("new_models", "dummy.model")) new_index = neuronperf.model_index.load("new_index.json") assert new_index["model_configs"][0]["filename"] == os.path.join("new_models", "dummy.model") neuronperf.model_index.delete("new_index.json") shutil.rmtree("new_models") shutil.rmtree("models") @pytest.mark.sanity def test_model_index_append(): model_indexes = [ neuronperf.model_index.create(f"Dummy_{x}", model_name="Dummy") for x in range(10) ] combined_index = neuronperf.model_index.append(*model_indexes) # Assert that combination apparently did happen. assert len(combined_index["model_configs"]) == len(model_indexes) # Check that batch_sizes haven't been modified. assert all(1 == config["batch_size"] for config in combined_index["model_configs"]) # Test for duplicate filtering behavior model_indexes = [neuronperf.model_index.create("Dummy") for _ in range(10)] combined_index = neuronperf.model_index.append(*model_indexes) assert len(combined_index["model_configs"]) == 1 @pytest.mark.sanity def test_model_index_filter(): idx_1 = neuronperf.model_index.create("fake", performance_level=2, compile_s=1) idx_2 = neuronperf.model_index.create("fake2", compile_s=2) idx = neuronperf.model_index.append(idx_1, idx_2) filtered = neuronperf.model_index.filter(idx, filename="fake") print(filtered) assert 1 == len(filtered["model_configs"]) assert "fake" == filtered["model_name"] filtered = neuronperf.model_index.filter(idx, performance_level=2) assert 1 == len(filtered["model_configs"]) assert "fake" == filtered["model_name"] # None key should filter nothing filtered = neuronperf.model_index.filter(idx, compile_s=None) assert 2 == len(filtered["model_configs"]) @pytest.mark.sanity @pytest.mark.slow def test_benchmarker(): dummy_model = lambda x: None dummy_load = lambda path, device_id: dummy_model b = neuronperf.benchmarking.Benchmarker( id=0, device_id=0, load_fn=dummy_load, model_filename="test", inputs=[], workers_per_model=2 ) b.start() time.sleep(1.5) b.stop() assert b.status == "finished" assert all(n_infs > 100 for n_infs in b.n_infs) @pytest.mark.slow def test_benchmark_multithread(): benchmarker_results = neuronperf.cpu.benchmark( neuronperf.DummyModel, [np.array([1, 2, 3, 4])], duration=2, n_models=4, multiprocess=False, multiinterpreter=False, verbosity=2, return_timers=True, ) # Return value is a list of tuples: # [(config, results), (config, results), ...] # Each config is a dict. Each result is a dict. # A single configuration without workers_per_model set will produce 2 results assert len(benchmarker_results) == 2 for benchmarker_result in benchmarker_results: config, results = benchmarker_result assert "cpu_percents" in results assert "mem_percents" in results assert not config["multiprocess"] assert not config["multiinterpreter"] assert results["status"] == "finished" assert results["n_infs"] > 100 @pytest.mark.slow def test_benchmark_multithread_2(): dummy_model = lambda x: None dummy_load = lambda path, device_id: dummy_model reports = neuronperf.benchmark( load_fn=dummy_load, model_filename="dummy_filename", inputs=[[1]], duration=2, n_models=4, multiprocess=False, multiinterpreter=False, verbosity=2, ) # A single configuration without workers_per_model set will produce 2 results assert len(reports) == 2 report = reports[0] assert not report["multiprocess"] assert not report["multiinterpreter"] assert report["status"] == "finished" assert report["total_infs"] > 100 @pytest.mark.slow def test_benchmark_multiprocess(): n_models = 16 benchmarker_results = neuronperf.cpu.benchmark( neuronperf.DummyModel, inputs=[np.array([1, 2])], batch_sizes=[1], duration=2, n_models=n_models, multiprocess=True, multiinterpreter=False, verbosity=2, return_timers=True, ) # A single configuration will produce a single result tuple assert len(benchmarker_results) == 2 # Extract the benchmarker results config, results = benchmarker_results[0] # Confirm that there is least 1 timer / model for each benchmarker assert len(next(iter(results["timers"].values()))) >= n_models assert config["multiprocess"] assert not config["multiinterpreter"] assert results["status"] == "finished" assert results["n_infs"] > 100 @pytest.mark.slow def test_benchmark_multiinterpreter(): benchmarker_results = neuronperf.cpu.benchmark( neuronperf.DummyModel, inputs=[np.array([1, 2])], duration=2.5, n_models=2, multiprocess=False, multiinterpreter=True, verbosity=2, return_timers=True, ) # A single configuration without workers_per_model set will produce 2 results assert len(benchmarker_results) == 2 # Extract the benchmarker results config, results = benchmarker_results[0] assert config["multiinterpreter"] assert results["status"] == "finished" assert results["n_infs"] > 100 @pytest.mark.slow def test_reporting(): benchmarker_results = neuronperf.cpu.benchmark( neuronperf.DummyModel, inputs=[np.array([1, 2, 3, 4])], n_models=[1, 4], duration=2, verbosity=2, return_timers=True, ) assert len(benchmarker_results) == 4 reports = neuronperf.get_reports(benchmarker_results) assert len(reports) == len(benchmarker_results) assert all("total_infs" in report for report in reports) neuronperf.print_reports(reports) csv_file = neuronperf.write_csv(reports) os.remove(csv_file) json_file = neuronperf.write_json(reports) with open(json_file, "rt") as fp: json.load(fp) os.remove(json_file) ================================================ FILE: static/google673a8c4fbaa024d8.html ================================================ google-site-verification: google673a8c4fbaa024d8.html ================================================ FILE: static/robots.txt ================================================ User-agent: * Disallow: /en/v2.24.0/ Disallow: /en/v2.23.0/ Disallow: /en/v2.22.1/ Disallow: /en/v2.22.0/ Disallow: /en/v2.21.1/ Disallow: /en/v2.21.0/ Disallow: /en/v2.20.2/ Disallow: /en/v2.20.1/ Disallow: /en/v2.20.0/ Disallow: /en/v2.19.1/ Disallow: /en/v2.19.0/ Disallow: /en/v2.18.2/ Disallow: /en/v2.18.1/ Disallow: /en/v2.18.0/ Disallow: /en/v2.17.0/ Disallow: /en/v2.16.1/ Disallow: /en/v2.16.0/ Disallow: /en/v2.15.2/ Disallow: /en/v2.15.1/ Disallow: /en/v2.15.0/ Disallow: /en/v2.14.1/ Disallow: /en/v2.14.0/ Disallow: /en/v2.13.2/ Disallow: /en/v2.13.1/ Disallow: /en/v2.13.0/ Disallow: /en/v2.12.2/ Disallow: /en/v2.12.1/ Disallow: /en/v2.12.0/ Disallow: /en/v2.11.0/ Disallow: /en/v2.10.0/ Disallow: /en/v2.9.0/ Disallow: /en/v2.8.0/ Disallow: /en/v2.7.0/ Disallow: /en/v2.6.0/ Disallow: /en/v2.5.0/ Disallow: /en/v2.4.0/ Disallow: /en/v2.3.0/ Disallow: /en/v1.19.2/ Disallow: /en/v1.19.1/ Disallow: /en/v1.19.0/ Disallow: /en/v1.18.0/ Disallow: /en/v1.17.2/ Disallow: /en/v1.17.1/ Disallow: /en/v1.17.0/ Disallow: /en/v1.16.3/ Disallow: /en/v1.16.2/ Disallow: /en/v1.16.1/ Disallow: /en/v1.16.0/ Disallow: /en/v1.15.2/ Disallow: /en/1.15.1/ Disallow: /en/1.15.0/ Disallow: /en/1.14.2/ Disallow: /en/1.14.1/ Disallow: /en/1.14.0/ Disallow: /en/1.13.0/ Disallow: /en/1.12.2/ Disallow: /en/1.12.1/ Disallow: /en/1.12.0/ Disallow: /en/1.11.0/ Sitemap: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/sitemap1.xml ================================================ FILE: static/sitemap1.xml ================================================ https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/index.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/misc-customops.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/third-party-solutions.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/dlami/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/nki_faq.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron-ubuntu20.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/mxnet-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax-neuronx.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-troubleshooting.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/multiframework-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/setup-rocky-linux-9.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/troubleshooting.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ecs-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/third-party-solutions.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/dlc-then-customize-devflow.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/eks-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/aws-batch-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/sagemaker-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/parallelcluster-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/ec2-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/index.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/releasecontent.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/2.29.0.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-developer-guide.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-configurable-parameters.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/configuration-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/rn.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/faq.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-ecs-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-customize-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neo-then-hosting-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-k8s-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorial-docker-runtime1.0.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-ec2-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/faq-troubleshooting-releasenote.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neuron-dra.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/ec2.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/dlc-then-eks-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/container-deployment-flows.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/ec2-then-ec2-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/k8.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/troubleshooting.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/developerflows.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/locate-neuron-dlc-image.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/faq.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/neuron-plugins.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/container-sm-hosting-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/getting-started.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/beta-participation.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/amazonq-getstarted.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/whats-new.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/troubleshooting.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/profiling-tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/monitoring-tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/sdk-policy.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/what-is-neuron.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/security.html 2026-02-13 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/tensorflow_serving_tutorial.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/how-to-convolution-in-unet.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/developer-guide.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/faq.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/api-reference-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/command-line-reference.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/developer-guide.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuron-cc/faq.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF005.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF011.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESPP047.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF010.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF004.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF006.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF007.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF013.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF017.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EBVF030.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF016.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EHCA005.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EARG001.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF015.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF001.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EBIR023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EUOC002.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EXTP004.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EOOM001.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESPP004.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EOOM002.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF018.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF024.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF031.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF019.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF022.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EVRF009.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/EXSP001.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/error-codes/ESFH002.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/compiler/neuronx-cc/api-reference-guide/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/calculator/neuron-calculator.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/neuron2-intro-faq.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/contributing-faq.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/onnx-faq.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/index.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/index.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/inference-inf1-samples.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/training-trn1-samples.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/models/inference-inf2-trn1-samples.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/index.html 2025-11-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/glossary.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/oss/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/mxnet-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/torch-neuron-tab-training.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/tensorflow-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/docs-quicklinks.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/user-guide-quickstart.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/github-samples.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/tab-inference-tensorflow-neuron.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/inference-quickstart.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/quick-start/training-quickstart.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/news-and-blogs/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn1-arch.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inferentia2.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium3.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inf1-arch.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium2.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inferentia.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn2-arch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trn3-arch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v4.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v1.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/inf2-arch.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/trainium.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v2.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-hardware/neuron-core-v3.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/rounding-modes.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/data-types.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuron-caching.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/custom-c++-operators.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuroncore-pipeline.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/neuroncore-batching.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/arch/neuron-features/logical-neuroncore-config.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/inf2/inf2-performance.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/trn1/trn1-training-performance.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/trn1/trn1-inference-performance.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/benchmarks/inf1/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-transition-pytorch-trainium.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-multiframework-dlamis-inf1.html 2025-10-17 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-nxdt-nxd-core-training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-7-2-8.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow-2-8-9.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-1-3.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-python38-no-longer-support.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/github-changes.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/sm-training-trn1-introduce.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-nemo-megatron.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nemo.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-beta-pytorch-neuroncore-placement-apis.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pt2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nxdi-changes.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-megatron-lm.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-device-version.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-torch-neuronx-nki-jit.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-xla-bf16.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-llama3-2-checkpoint.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-jax-neuronx-nki-call.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-ubuntu-20-base.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-tensorflow-tutorial-inf.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-pytorch-2-6.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-nxdi-nxd-core-inference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-neurondevice.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eol-nemo-arg.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-tnx.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-python38.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-113.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-nki-library-kernel-migration.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-ubuntu-18.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-v230.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-1-9.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-al2.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-u20-dlamis.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/sm-training-dlc-2.9.1.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-inf1-virtual-environments.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-mxnet.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-nki-library-namespace-changes.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-profiling-api.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron2-intro.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-nxd-examples.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-dlami-ubuntu-22-04.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/gpg-expiration.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-tensorflow-inf2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-torch-neuron-versions.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorboard-tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-component-change.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nxd-examples.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow1-x.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-package-change.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-mllama-checkpoint.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-nxdt-nxd-core.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-pt2-6.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8-v229.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-driver-support-inf1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-neuron-2.10.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-neuronxcc-nki.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-tensorflow-inf2.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pt-versions.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-pytorch-introduce.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-torch-neuron.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/dlami-neuron-2.12.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tf-versions.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-pytorch-2-1.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-profiler-2.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron250-packages-changes.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-neuron-det.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-vllm-v0.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorboard-plugin.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neurondevice.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-vllm-v0.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-nxd-path-trace-api.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-nki-jit-torch.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-maintenance-tf.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-python-3-9-eol.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-transformer-flag.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neuron-det.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nki-library-namespace-changes-2-28.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorflow2-10.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-moving-samples.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-opt.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-megatronlm-2-13.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-neurondevice-version.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-block-dimension-nki.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-1.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announcement-end-of-support-parallel-model-trace.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-probuf.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-deprecation-containers-rtd.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-correction-neuron-driver-support-inf1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/end-of-support-pt2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron230-packages-changes.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-al2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-nxdt-nxd-core-training.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-maintenance-tnx.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-support-tensorflow1-x.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/neuron-rtd-eol.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/release-neuron2.4.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-no-longer-support-u20-dlc-dlami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-jax-neuronx-nki-call.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eol-python-3-7.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-bf16-vars.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-pytorch-2-7-2-8.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-nki-namespace-migration.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-eos-dlami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron2.x/announce-intent-eos-pt-version.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-pt-15.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-pt-before-1-8.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-tf-21-24.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-pt-1-5.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-mx-before-1-5.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-5.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announcements.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/eol-ncgs-env_2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/announcements/neuron1.x/announce-eol-tf-before-2-7.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-7.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/torch-neuronx-graph-partitioner-app-note.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-6.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/torch-neuronx-dataparallel-app-note.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-8.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-9.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/introducing-pytorch-2-x.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuronx/migration-from-xla-downcast-bf16.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/mxnet-neuron/flex-eg.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/transformers-neuronx/generative-llm-inference-with-neuron.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-cc/neuronx-cc-training-mixed-precision.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuron-cc/mixed-precision.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-distributed/introducing-nxd-inference.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuronx-distributed/introducing-nxdt-training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/index.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/bucketing-app-note.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/torch-neuron-dataparallel-app-note.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/torch-neuron/rcnn-app-note.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/neuron1x/introducing-libnrt.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/perf/neuron-cc/performance-tuning.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/appnotes/perf/neuron-cc/parallel-ncgs.html 2025-10-20 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/training/neuron-training.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/inference/neuron-faq.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/about-neuron/faq/inference/trouble-shooting-faq.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/inference-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/guide-torch-neuron-vs-torch-neuronx-inference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/pytorch-native-overview.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/training-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/setup/jax-neuronx-known-issues.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/jax/api-reference-guide/neuron-envvars.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-disable-dynamic-batching.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/training-troubleshooting.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dynamic-batching.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/pytorch-neuron-supported-operators.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-default.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/misc-inference-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/additional-examples-training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/additional-examples-inference-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-dim-neq-zero.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/torch-neuronx-dataparallel-example-specify-ncs.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/misc-training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/about/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u24.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u24.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/note-setup-general.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-al2-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-neuronx-install-cxx11.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-install-prev-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/pytorch-update-u20-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/torch-neuronx-profiling-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/torch-neuronx-profiling-dev-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/analyze_for_training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/tutorials-training-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/finetune_hftrainer.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/mlp.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/zero1_gpt2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/training/bert.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorials-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-programming-guide.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/training/pytorch-neuron-debug.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/autobucketing-dev-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/trace-vs-xla-lazytensor.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/pytorch-neuron-parallel-compile.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/training/torch-neuron-envvars.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-async-lazy-load.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-replace-weights.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-core-placement.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/inference-api-guide-torch-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-trace.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-data-parallel.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/api-reference-guide/inference/api-torch-neuronx-analyze.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.8.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.7.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup/prev-releases/neuronx-2.9.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/.git/logs/refs/remotes/origin/VRF004.html 2026-01-27 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/how-to/how-to-ultraserver.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/files/index-dra.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/tutorial-oci-hook.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-device-plugin.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-monitor.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/build-run-neuron-container.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-multiple-scheduler.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-scheduler.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-problem-detector-and-recovery.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/tutorial-docker-env-setup.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-problem-detector-and-recovery-irsa.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-default-scheduler.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-helm-chart.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-setup.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-prerequisite.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/k8s-neuron-scheduler-flow.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/get-started/quickstart-configure-deploy-dlc.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/get-started/quickstart-pytorch-inference-dlc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/training/Dockerfile-trainium-dlc.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/training/mlp.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/config-properties.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/torchserve-neuron.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/dockerd-libmode-entrypoint.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-tf-serving.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-libmode.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/inference/Dockerfile-inference-dlc.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-torch-neuron.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-app-rt-same.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-app-rt-diff.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/dockerd-entrypoint-app-rt-same.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/docker-example/v1/inference/Dockerfile-neuron-rtd.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/tutorial-training.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/training/k8s_mlp_train_demo.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/k8s_rn50_demo.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/tutorials/inference/tutorial-infer.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/core-dump.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/about/collectives.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/compute-comm-overlap.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/work-with-neff-files.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/direct-hbm-tensor-alloc.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/core-dump-deep-dive.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/intranode-collective-comm.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/device-memory.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/runtime-performance-tips.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/explore/internode-collective-comm.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_async_sendrecv.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_status.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-best-practices.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_async.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/ndebug_stream.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_sys_trace.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/debug-stream-api.html 2026-02-05 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-examples.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/ndl.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt-async-api-overview.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nec.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_driver_shared.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_experimental.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_ds.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_profile.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/neuron_driver_shared_tensor_batch_op.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/api/nrt_version.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/content.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.28.1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.28.0.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/rn.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/mxnet-neuron.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorboard-neuron.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/libneuronxla.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/index.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/runtime.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-inference.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nki-lib.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/dlamis.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/dev-tools.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nki.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/containers.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-training.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/compiler.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/jax.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/pytorch.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/components/nxd-core.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/nemo/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/nemo/neuronx-nemo.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/customcxxps/gpsimd-tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/customcxxps/gpsimd-customop-lib.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/prev/content.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/prev/rn.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron1/neuronrelease/previous-content.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuron/tensorflow-neuron-v2.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuronx.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-modelserver-neuron/tensorflow-modelserver-neuron-v2.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/tensorflow/tensorflow-neuronx/tensorflow-neuronx.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-pytorch.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-xla.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-tensorflow.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/archive/neuron-cc/neuron-cc-ops/neuron-cc-ops-mxnet.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nx-jax.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/runtime.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/dlami.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-inference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nx-pytorch.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/tools.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/docs-and-samples.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/containers.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-training.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/compiler.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.25.0/nxd-core.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nx-jax.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/runtime.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/dlami.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nxd-inference.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nx-pytorch.html 2025-12-19 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nki.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/containers.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.26.0/nxd-core.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/runtime.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/dlami.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nxd-inference.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nki-lib.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nx-pytorch.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/tools.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/nki.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/containers.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/prev/2.27.0/compiler.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/plugins/npd-ecs-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/dlc-then-ecs-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/aws-batch-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/sagemaker-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/ec2-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/setup/ecs-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/setup/eks-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-ecs-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/byoc-hosting-devflow-inf2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/neo-then-hosting-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-k8s-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-ec2-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/aws-batch-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dlc-then-eks-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-then-ec2-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/sagemaker-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/dev-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/byoc-hosting-devflow.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/parallelcluster-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/env-setup-text.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-then-ec2-devflow-inf2.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/ec2-flows.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/inference/container-sm-hosting-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/sm-devflow/sm-training-devflow.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/batch/batch-training.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/parallelcluster/parallelcluster-training.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/devflows/training/ec2/ec2-training.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nemo-megatron/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/index.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/index.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/overview-index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/nxdi-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/neuron-inference-overview.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/api-reference-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/overview.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/misc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer-guide.html 2025-11-11 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tp_developer_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/context_parallelism_overview.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pp_developer_guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/model_optimizer_wrapper_developer_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction_developer_guide.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide-training.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/neuronx_distributed_inference_developer_guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide-inference.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api-reference-guide-inference.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/model_builder_v2_api_reference.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide-training.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/lora_finetune_developer_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/app_notes.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/pipeline_parallelism_overview.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/standard_mixed_precision.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/ptl_developer_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index-training.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/developer-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/activation_memory_reduction.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/save_load_developer_guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/api_guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/index-inference.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/neuronx-distributed-misc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/setup/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/inference.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_tutorials.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_pp.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training_llama_tp_zero1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/inference_tutorials.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/finetune_llama3_8b_ptl_lora.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tutorials/training.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/config_overview.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/features.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/installation_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/general/known_issues.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-tp-appnote.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-cp-appnote.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-amr-appnote.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/app_notes/nxd-training-pp-appnote.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_70B_pretraining.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_DPO_ORPO.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_SFT.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_SFT_LORA.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/checkpoint_conversion.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/tutorials/hf_llama3_8B_pretraining.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/index.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/migration_nnm_nxdt.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/cpu_mode_developer_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/optimizer_lr_scheduler_flow.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/new_model_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/new_dataloader_guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-training/developer_guides/migration_nemo_nxdt.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/misc/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/misc/nxdi-troubleshooting.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/app-notes/parallelism.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/app-notes/index.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/index.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-offline-serving.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/vllm/quickstart-vllm-online-serving.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/disaggregated-inference-tutorial.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn3-gpt-oss-120b-tutorial.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-tutorial.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/sd-inference-tutorial.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-tutorial.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.3-70b-fp8.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/trn2-llama3.1-405b-speculative-tutorial.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/tutorials/disaggregated-inference-tutorial-1p1d.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/index.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/nxd-examples-migration-guide.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/model-reference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/weights-sharding-guide.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide-v1.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/performance-cli-params.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/moe-arch-deep-dive.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/onboarding-models.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/vllm-user-guide.html 2026-02-26 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/llm-inference-benchmarking-guide.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/disaggregated-inference.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/custom-quantization.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/migrate-from-tnx-to-nxdi.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/how-to-use-fpem.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/accuracy-eval-with-datasets.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/writing-tests.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/developer_guides/feature-guide.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/api-guides/index.html 2025-11-12 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/api-guides/api-guide.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/llama3/llama_33_70b.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/nxd-inference/models/qwen3/qwen3_moe_235b.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/al2-python.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/launch-trn1-dlami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/legacy-inf1/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/legacy-inf1/pytorch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/setup-jupyter-notebook-steps-troubleshooting.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/notebook/running-jupyter-notebook-as-script.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlc.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/manual.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-manual.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/update-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/pytorch/dlc.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/manual.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/jax/dlc.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/note-setup-libnrt-warning.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/launch-inf2-dlami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf2/dlami-enable-neuron-pytorch.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/trn1/dlami-notes.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/neuron-pip-install.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-cntr.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-dlami-aws-cli.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-dlami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/launch-inf1-ami.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-general.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/neuron-pip-setup.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/compile_mode.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/develop_mode.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/tensorboard-plugin-neuron-pip-install.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/dlami-enable-neuron-mxnet.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/note-setup-libnrt-warning.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/dlami-enable-neuron-pytorch.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/setup/install-templates/inf1/deploy_mode.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dge.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/index.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dynamic-loops.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-compiler.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/mxfp-matmul.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/use-neuron-profile.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_block_dimension_migration_guide.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-dma-bandwidth-guide.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-beta2-migration-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-aps.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-hbm-crc-hashing.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki-0-3-0-update-guide.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/deep-dives/nki_perf_guide.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/how-to-scheduling-apis.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/index.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/nki_simulator.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/framework_custom_op.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.isa.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.collectives.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.api.shared.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.simulate.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/nki.language.tile_size.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/quickstart-implement-run-kernel.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/nki-language-guide.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/setup-env.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/index.html 2026-04-08 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/tiling-overview.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/indexing-overview.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/data-representation-overview.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/lnc.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/nki-dma-overview.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/get-started/about/memory-hierarchy-overview.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.set_rng_seed.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.memset.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_n_gather.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_tanh.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_sigmoid.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.bn_stats.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.erf_dx.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ceil.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_hbm.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float32.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dge_mode.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rms_norm.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.arctan.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_engine.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.greater.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.mish.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.store.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.maximum.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.activation.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sequential_range.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dropout.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_store.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_identity_matrix.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar_cumulative.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.trunc.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rng.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_transpose.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tan.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.affine_select.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.matmul_perf_mode.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.VirtualRegister.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.exp.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_compute.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.less_equal.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_matmul_mx.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_move.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_apprx_sigmoid_dx.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.broadcast_to.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.sendrecv.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.max.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_not.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.reduce_scatter.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.softplus.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.static_range.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_gather.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.subtract.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_transpose.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e5m2.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.load.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_constant.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_or.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.quantize_mx.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float4_e2m1fn_x4.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.softmax.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.program_id.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int8.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.greater_equal.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_version.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.invert.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint32.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.core_barrier.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sign.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.shared_hbm.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.negative.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.affine_range.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nonzero_with_count.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tfloat32.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.zeros.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.oob_mode.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.square.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_on_chip.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ndarray.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.matmul.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ones.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.num_programs.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.rank_id.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.exponential.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.where.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_psum.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3fn.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.power.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_stream_shuffle.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.erf.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_alloc.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand_set_state.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rand.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.reciprocal.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.load_transpose2d.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int32.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sbuf.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sum.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.log.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.get_nc_version.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.equal.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.select_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.dma_copy.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.register_load.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.int16.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.engine.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.less.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e4m3fn_x4.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.scalar_tensor_tensor.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bfloat16.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_copy_predicated.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.private_hbm.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.not_equal.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_copy.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_and.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_or.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.multiply.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint8.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_find_index8.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.max8.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.floor.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.right_shift.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float8_e5m2_x4.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.prod.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_to_all.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.rsqrt.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bool_.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.mean.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.expand_dims.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.min.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.left_shift.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.add.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.hbm.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand_get_state.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.jit.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.relu.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.range_select.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.ReplicaGroup.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_match_replace8.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_partition_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_to_all_v.html 2026-04-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.logical_xor.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.uint16.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.nc_matmul.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_tensor_scan.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.program_ndim.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.local_gather.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tanh.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.bn_aggr.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sigmoid.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_and.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_tensor.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.is_sbuf.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.transpose.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.simulate.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.ds.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.reduce_cmd.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.all_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.no_reorder.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.tensor_scalar_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.cos.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.copy.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.activation_reduce.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.device_print.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.full.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.silu_dx.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.collectives.collective_permute_implicit_current_processing_rank_id.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.psum.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.tile_size.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.empty_like.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sin.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.silu.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.iota.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.sequence_bounds.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.minimum.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.var.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.abs.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.reciprocal.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gather_flattened.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.bitwise_xor.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.random_seed.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.isa.rand2.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.zeros_like.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.sqrt.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.gelu_dx.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.dropout.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.dynamic_range.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.all.html 2026-04-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/api/generated/nki.language.float16.html 2026-02-24 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/index.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium2_arch.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium3_arch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/architecture/trainium_inferentia2_arch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/matrix_multiplication.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_multiple_nc_tensor_addition.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/fused_mamba.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/kernel-optimization.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/transpose2d.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/average_pool2d.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/guides/tutorials/spmd_tensor_addition.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/index.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tiled-range.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/tensor-view.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/allocator.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/kernel-utils/stream-shuffle-broadcast.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/specs/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/specs/design-rmsnorm-quant.html 2026-02-17 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/about/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/transformer-tkg.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/index.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cross-entropy.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/conv1d.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/find-nonzero-indices.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-cte.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/dynamic-elementwise-add.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/blockwise-mm-backward.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/mlp.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fgcc.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-cte.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/depthwise-conv1d.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rmsnorm-quant.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/qkv.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-tkg.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/output-projection-tkg.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-block-tkg.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/fg-allgather.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/topk-reduce.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/sb2sb-allgather.html 2026-04-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/cumsum.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/moe-cte.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/attention-tkg.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/router-topk.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/nki/library/api/rope.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/api-reference-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/mxnet-neuron-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/misc-mxnet-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/neo-then-hosting-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/ec2-then-ec2-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/api-compilation-python-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/developer-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/troubleshooting-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/inference-mxnet-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_terminology.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_benchmark_guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_troubleshooting.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_overview.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_examples.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_model_index_guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_evaluate_guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_faq.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/rn.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_compile_guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_framework_notes.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/neuronperf/neuronperf_install.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/api-reference-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/index.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-tutorials.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-developer-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-api-reference.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-misc.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/developer-guide.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/transformers-neuronx-developer-guide-for-continuous-batching.html 2025-10-28 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-specify-ncs.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-dynamic-batching.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-reference-guide-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-disable-dynamic-batching.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/inference-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-torch-neuron-dataparallel-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/developer-guide-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-dim-neq-zero.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/additional-examples-inference-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-compilation-python-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/torch-neuron-dataparallel-example-default.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/troubleshooting-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/api-core-placement.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/misc-inference-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/index.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/tutorial-neuron-check-model.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/helper-tools/tutorial-neuron-gatherinfo.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/setup-legacy-inf1-tensorflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron-inference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx-inference.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorboard/getting-started-tensorboard-neuron-plugin.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training-gpt-neox.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/gpt3_neuronx_nemo_megatron_pretraining.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training_llama2_tp_pp_ptl.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/megatron_gpt_pretraining.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training-gpt-neox-20b.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/finetuning_llama2_7b_ptl.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/finetune_t5.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/multinode-training-model-profiling.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/training_codegen25_7b.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tutorials/ssd300_demo/ssd300_demo.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-reference-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-tracing-python-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tensorflow2-accelerated-ops.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/additional-examples.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-ecs-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/neo-then-hosting-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-ec2-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-auto-replication-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/dlc-then-eks-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/ec2-then-ec2-devflow.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/misc-tensorflow-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tf2_faq.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-tfn-analyze-model-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/api-compilation-python-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/api-reference-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tfnx-analyze-model-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tf-neuronx-auto-replication-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tfneuronx-python-tracing-api.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/misc-tensorflow-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-install-prev-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-neuronx-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/tensorflow-update-al2-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tutorials/tutorial-tensorflowx-serving-NeuronRT-Visible-Cores.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/tutorials/tutorials-tensorflow-neuronx.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.9.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuronx/setup/prev-releases/neuronx-2.8.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-install-prev-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/tensorflow-update-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-nlp.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tensorflow-tutorial-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/tutorials-tensorflow-utilizing-neuron-capabilities.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/tutorials/bert_demo/bert_demo.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.2-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.16.3-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.17.1-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.2-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.1-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.18.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.19.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.15.0-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/tensorflow/tensorflow-neuron/setup/prev-releases/neuron-1.14.2-tensorflow-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-cxx11.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-al2-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-install-prev-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/pytorch-update-u20-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/guides/torch-lstm-support.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-torch-neuron-nlp.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/transformers-marianmt.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-utilizing-neuron-capabilities.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-torch-neuron-computervision.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/neuroncore_pipeline_pytorch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-libtorch.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/pytorch-tutorial-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorial-torchserve.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/tutorials/tutorials-inference-torch-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/guides/core-placement/torch-core-placement.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.19.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.17.2-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.4.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.2-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.1-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.15.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.18.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.3.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.1-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-2.5.0-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.2-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.16.3-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/torch-neuron/setup/prev-releases/neuron-1.14.2-pytorch-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/transformers-neuronx/setup/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-update-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2-base-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-al2.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-u20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-u22.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-neuron-ubuntu20-base-dlami.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-install-prev-al2023.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/mxnet-update.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-utilizing-neuron-capabilities.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-computervision.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorial-model-serving.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-nlp.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/tutorials-mxnet-neuron.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/tutorials/mxnet-tutorial-setup.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.17.2-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.2-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.14.2-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.19.0-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.16.3-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.18.0-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.0-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/archive/mxnet-neuron/setup/prev-releases/neuron-1.15.1-mxnet-install.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/index.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/nccom-test.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-top-user-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-sysfs-user-guide.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-monitor-user-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-sys-tools/neuron-ls.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tensorboard/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tensorboard/getting-started-tensorboard-neuronx-plugin.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/torch-neuronx-profiling-with-tb.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/tutorial-neuron-monitor-mnist.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/tutorial-tensorboard-scalars-mnist.html 2025-10-09 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/tutorials/performance-profiling-vllm.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/profiler/neuron-profile-user-guide.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/profiler/neuron-profiler-2-0-beta-user-guide.html 2025-12-01 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/index.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-system-profiles.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-hierarchy-view.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-database-viewer.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-link-view-source-code.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/migration-faq.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-device-profiles.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/get-started.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/how-to-profile-workload.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-summary-page.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-memory-viewer.html 2026-04-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/view-perfetto.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-tensor-viewer.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/tools/neuron-explorer/overview-ai-recommendations.html 2026-02-25 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/api-reference-guide/api-reference-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/api-reference-guide/custom-ops-ref-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/custom-c++-operators-devguide.html 2026-02-03 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/programming-guide/programming-guide.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/customop-mlp-perf-opt.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/tutorials.html 2025-10-07 https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-customops/tutorials/customop-mlp-training.html 2025-10-07 ================================================ FILE: tools/index.rst ================================================ .. _neuron-tools: .. meta:: :description: Developer tools for profiling, monitoring, and analyzing machine learning workloads on AWS Neuron devices. :keywords: AWS Neuron, developer tools, profiler, monitoring, analysis, TensorBoard, visualization, debugging, optimization :date-modified: 12/02/2025 Developer Tools ================ AWS Neuron provides a comprehensive suite of developer tools for optimizing, monitoring, and debugging machine learning workloads on AWS Inferentia and Trainium accelerators. These tools enable developers to gain deep insights into model performance, system utilization, and hardware behavior to maximize the efficiency of ML applications running on Neuron-enabled instances. .. grid:: 1 :gutter: 3 .. grid-item-card:: Neuron Explorer :link: /tools/neuron-explorer/index :link-type: doc :class-header: sd-bg-primary sd-text-white Neuron Explorer is a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium, from model development through debugging, profiling, analysis, and optimization. .. grid-item-card:: Neuron Profiler 2.0 :link: /tools/profiler/neuron-profiler-2-0-beta-user-guide :link-type: doc :class-header: sd-bg-primary sd-text-white Neuron Profiler 2.0 offers a user-friendly experience for capturing and analyzing application performance through both high-level system profiles and detailed device-level profiles. .. grid-item-card:: Neuron Profiler :link: /tools/profiler/neuron-profile-user-guide :link-type: doc :class-header: sd-bg-primary sd-text-white The Neuron Profiler is a tool to profile and analyze performance of a ML model compiled with the Neuron compiler and run on NeuronDevices. .. grid-item-card:: System Tools :link: /tools/neuron-sys-tools/index :link-type: doc :class-header: sd-bg-primary sd-text-white Command-line utilities for monitoring, debugging, and managing AWS Neuron devices, including neuron-monitor, neuron-top, neuron-ls, and more. .. grid-item-card:: Third Party Tools :link: /tools/third-party-solutions :link-type: doc :class-header: sd-bg-primary sd-text-white Third-party tools and integrations that support the AWS Neuron development experience, including monitoring, visualization, and optimization solutions. .. .. grid-item-card:: AP Visualizer :link: ap-visualizer/ap-visualizer.html :link-type: url :class-header: sd-bg-primary sd-text-white Visualize access patterns of tensors on Neuron devices. .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Tutorials :link: /tools/tutorials/index :link-type: doc :class-header: sd-bg-secondary sd-text-white Tutorials for how to utilize all Neuron Tools. .. grid-item-card:: Release Notes :link: /release-notes/components/dev-tools :link-type: doc :class-header: sd-bg-secondary sd-text-white Latest updates, new features, and improvements to Neuron Tools and Neuron Explorer. .. toctree:: :maxdepth: 1 :hidden: Neuron Profiler 2.0 Neuron Profiler System Tools Third-party Tools Tutorials Release Notes ================================================ FILE: tools/neuron-explorer/get-started.rst ================================================ .. meta:: :description: Setup and get started guide for new Neuron SDK profiler :date_updated: 12/02/2025 .. _new-neuron-profiler-setup: Get Started with Neuron Explorer ======================================== In this guide, you'll learn how to set up and launch Neuron Explorer, including the web-based UI for interactive analysis. By the end of this guide, you'll be able to visualize and analyze performance data for your models directly in your browser. Overview --------- In this guide, you'll launch an AWS Trainium or Inferentia EC2 instance using the AWS Deep Learning AMI (DLAMI) for Neuron, install and verify Neuron Explorer, start both the API and UI servers, and set up secure SSH tunneling to view the Neuron Explorer interface in your local browser. Use this tool when you want to collect, inspect, and visualize Neuron profiling data from model training or inference jobs running on Neuron-compatible instances. At a high level, you will: 1. Launch a Neuron DLAMI instance 2. Verify Neuron Explorer installation 3. Start the Neuron Explorer servers 4. Configure SSH tunneling 5. Access the Neuron Explorer UI locally Prerequisites -------------- * An AWS account with permissions to launch EC2 instances. * Access to an AWS Trainium or Inferentia instance type (such as trn1.2xlarge, inf2.xlarge). * AWS Neuron DLAMI with the latest Neuron SDK preinstalled. * SSH key pair (``.pem`` file) to securely connect to your EC2 instance. * Local machine with SSH client and web browser installed. Before you begin ----------------- Complete these steps before starting the task in this document: 1. Make sure you have an active AWS account and `a default VPC available in your region `_. 2. Create or locate your SSH key pair (``.pem`` file) that allows access to your EC2 instance. Instructions ------------- 1. Launch a Neuron-compatible EC2 instance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Launch an EC2 instance with either a Trainium or Inferentia instance type using the AWS Neuron DLAMI. You can do this from the AWS Management Console or CLI. For more instructions on how to launch an instance with Neuron DLAMI, refer to the instructions here. **Expected outcome** Your instance should start and appear in the EC2 dashboard as "Running." 2. Verify that Neuron Explorer is installed ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once you've connected to your EC2 instance with SSH, verify that Neuron Explorer and the associated tools are installed: .. code-block:: bash apt list --installed | grep neuronx-tools **Expected outcome** You should see neuronx-tools listed among the installed packages, confirming that Neuron Explorer is available on your instance. 3. Launch the API and UI SPA servers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Start the Neuron Explorer web servers using the following command: .. code-block:: bash neuron-explorer view -v 2 --data-path ./parquet_files This command starts: * The UI SPA (Single Page Application) server (default port: 3001) * The API server (default port: 3002) **Expected outcome** You'll see terminal logs confirming that both the UI and API servers are running. 4. Set up SSH tunneling ^^^^^^^^^^^^^^^^^^^^^^^^ By default, Neuron Explorer runs locally on the EC2 instance. To securely access it from your local computer, you must create SSH tunnels for ports 3001 and 3002. Run the following command from your local machine terminal (replace placeholders such as ``your-key`` and ``public_ip_address_of_your_instance``): .. code-block:: bash ssh -i ~/your-key.pem -L 3001:localhost:3001 -L 3002:localhost:3002 ubuntu@[public_ip_address_of_your_instance_] -fN **Explanation:** * ``-L 3001:localhost:3001`` forwards the UI server. * ``-L 3002:localhost:3002`` forwards the API server. * ``-fN`` keeps the tunnel open in the background. **Expected outcome** No error messages should appear, indicating that your SSH tunnels are active. .. note:: Replace ``ubuntu`` with the appropriate username for your AMI (for example, ``ec2-user`` on Amazon Linux). 5. Connect to the Neuron Explorer UI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Once your tunnel is active, open your preferred web browser and navigate to: .. code-block:: text http://localhost:3001 **Expected outcome** The Neuron Explorer UI loads in your browser, displaying an interactive dashboard for exploring profiling data. Confirm your work ------------------ You've successfully set up Neuron Explorer! To confirm everything is working: 1. The browser should display the Neuron Explorer interface. 2. The terminal running the profiler command should show log activity when interacting with the UI. 3. You can explore profiling sessions from your ``./parquet_files`` directory. If all these checks pass, you are ready to begin analyzing performance data using Neuron Explorer. Common issues --------------- If you encounter an error or other issue while working through this task, here are some commonly encountered issues and how to address them: * **Neuron Explorer UI doesn't load**: Check that your SSH tunnel is configured correctly. Make sure ports 3001 and 3002 are forwarded using the ``-L`` flags in your SSH command, and verify the EC2 instance is running. * **No profiling data displayed**: Double-check that the directory passed to ``--data-path`` contains valid .parquet profiling files generated by a prior Neuron profiling run. * **neuron-profile command not found**: Ensure that Neuron SDK is installed. Please ensure that you have launched your instance with Neuron DLAMI or you have set up your instance based on the instructions mentioned here. * **Connection refused on port 3001 or 3002**: Confirm that your EC2 security group allows outbound traffic and that the SSH tunnel was created from your local machine, not from inside the instance. ================================================ FILE: tools/neuron-explorer/how-to-link-view-source-code.rst ================================================ .. meta:: :description: Learn how to use source code linking in Neuron Explorer to understand code performance and optimize your applications :date-modified: 11/21/2025 .. _neuron-explorer-source-code: Source Code Viewer ==================== In this guide, you'll learn how to use Neuron Explorer's source code linking feature to visualize connections between your application code and device performance. Discover how to navigate between source code and device instructions, highlight performance-critical sections, view framework stack traces, and leverage interactive code decorations to optimize your AWS Neuron applications for maximum efficiency. Overview -------- Source code linking helps you understand how your code changes affect device performance and identify ways to optimize it. This feature creates interactive connections between source code files and other Neuron Explorer widgets. You can zoom to device instructions from selected code lines, navigate between instructions and source code, and highlight instructions for specific loop iterations. You can use source code linking in both the VS Code extension and standalone web application. This gives you flexibility for different developer workflows. The Framework Stack Trace feature shows up in the Event Details when an instruction on the device profile is clicked. This feature is used to map the device instructions back to framework level code in JAX or PyTorch to better understand what part of the application code resulted in a particular device instruction. .. image:: /tools/profiler/images/view-link-1.gif Instructions ------------- To enable the addition of the "NKI Source Location" field to a profile enable set this environment variable: ``NEURON_FRAMEWORK_DEBUG=1`` To enable tracking of the stack trace information, you set these environment variables before compiling your NEFF: .. code-block:: bash export XLA_IR_DEBUG=1 export XLA_HLO_DEBUG=1 Once you have the NEFF, you can simply capture the profile as usual. To view your source code while viewing the profile, use the ``--framework-source-root`` flag to pass the path to framework source files. This is optional and is only needed if you want to view your code alongside the displayedprofile. .. code-block:: bash neuron-explorer view -n file.neff -s profile.ntff --framework-source-root /path/to/framework/source/files Code Viewer Widget ------------------- Highlighting Instructions ~~~~~~~~~~~~~~~~~~~~~~~~~~ Select source code lines to highlight their corresponding instructions in the profiler view. You can select individual lines or multiple lines through block selection or multiple cursors. .. image:: /tools/profiler/images/view-link-2.png Navigating to Source Code ~~~~~~~~~~~~~~~~~~~~~~~~~~ (Ctrl/Cmd)+Click any instruction to jump to it's location in source code. If there are multiple matches, you will be prompted to select which file to navigate to. .. image:: /tools/profiler/images/view-link-3.png Source Code Decorations ~~~~~~~~~~~~~~~~~~~~~~~~ Performance metrics appear as decorations directly in your source code, updating automatically with the instruction profiler's time range. Configure which metrics to display and in the settings panel. Currently only instruction count and PE element count are supported. .. image:: /tools/profiler/images/view-link-4.png Navigating to Instructions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Select lines in your source code and navigate to their corresponding instructions using Ctrl+Shift+G, the context menu, or the "Zoom into Instructions" command from the command palette. The Device Trace Viewer will then zoom to show all instructions associated with your selection. .. image:: /tools/profiler/images/view-link-5.png Dependency Annotations ~~~~~~~~~~~~~~~~~~~~~~~ When enabled, selecting an instruction will highlight its dependent source code lines. The selected instruction's line will be highlighted in one color, with its dependencies shown in a different color. .. image:: /tools/profiler/images/view-link-6.png ================================================ FILE: tools/neuron-explorer/how-to-profile-workload.rst ================================================ .. meta:: :description: Learn how to capture a profile, launch the Neuron Explorer UI, and use the Profile Manager to analyze your workload performance. :date-modified: 12/02/2025 Capture and View Profiles in Neuron Explorer ================================================ Capturing Profiles ------------------ In this guide, you'll learn how to capture a profile, launch the Neuron Explorer, use the Profile Manager, and view Neuron Explorer in your IDE. To get a better understanding of your workload's performance, you must collect the raw device traces and runtime metadata in the form of an NTFF (Neuron Trace File Format) which you can then correlate with the compiled NEFF (Neuron Executable File Format) to derive insights. Set the following environment variables before compiling to capture more descriptive layer names and stack frame information. .. code-block:: bash export XLA_IR_DEBUG=1 export XLA_HLO_DEBUG=1 For NKI developers, set ``NEURON_FRAMEWORK_DEBUG`` in addition to the two above to enable kernel source code tracking: .. code-block:: bash export NEURON_FRAMEWORK_DEBUG=1 If profiling was successful, you will see NEFF (``.neff``) and NTFF (``.ntff``) artifacts in the specified output directory similar to the following: .. code-block:: bash output └── i-0ade06f040a13f2bf_pid_210229 ├── 395760075800974_instid_0_vnc_0.ntff └── neff_395760075800974.neff Device profiles for the first execution of each NEFF per NeuronCore are captured, and NEFF/NTFF pairs with the same prefix (for PyTorch) or unique hash (for JAX or CLI) must be uploaded together. See the section on :ref:`uploading profiles ` for more details. JAX Profiling API ~~~~~~~~~~~~~~~~~ When using the JAX context-managed profiling API, set two extra environment variables to signal the profile plugin to begin capturing device profile data when the profiling API is invoked. .. code-block:: python os.environ["NEURON_RT_INSPECT_DEVICE_PROFILE"] = "1" os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"] = "./output" Then, profile a block of code: .. code-block:: python with jax.profiler.trace(os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"]): Full code example: .. code-block:: python from functools import partial import os import jax import jax.numpy as jnp from jax.sharding import Mesh, NamedSharding, PartitionSpec as P from jax.experimental.shard_map import shard_map from time import sleep from functools import partial os.environ["NEURON_RT_INSPECT_DEVICE_PROFILE"] = "1" os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"] = "./output" jax.config.update("jax_default_prng_impl", "rbg") mesh = Mesh(jax.devices(), ('i',)) def device_put(x, pspec): return jax.device_put(x, NamedSharding(mesh, pspec)) lhs_spec = P('i', None) lhs = device_put(jax.random.normal(jax.random.key(0), (128, 128)), lhs_spec) rhs_spec = P('i', None) rhs = device_put(jax.random.normal(jax.random.key(1), (128, 16)), rhs_spec) @jax.jit @partial(shard_map, mesh=mesh, in_specs=(lhs_spec, rhs_spec), out_specs=rhs_spec) def matmul_allgather(lhs_block, rhs_block): rhs = jax.lax.all_gather(rhs_block, 'i', tiled=True) return lhs_block @ rhs with jax.profiler.trace(os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"]): out = matmul_allgather(lhs, rhs) for i in range(10): with jax.profiler.TraceAnnotation("my_label"+str(i)): out = matmul_allgather(lhs, rhs) sleep(0.001) expected = lhs @ rhs with jax.default_device(jax.devices('cpu')[0]): equal = jnp.allclose(jax.device_get(out), jax.device_get(expected), atol=1e-3, rtol=1e-3) print("Tensors are the same") if equal else print("Tensors are different") .. _neuron-explorer-capture-environment-variables: .. _neuron-explorer-non-framework-user-experience: Environment Variables ~~~~~~~~~~~~~~~~~~~~~ You can also control profiling with environment variables. This is useful when you can’t easily change your application code, such as when running an executable which calls the Neuron Runtime or in a containerized environment where the application code is built into the container image. .. _neuron-explorer-core-control-variables: Core Control Variables ^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Description - Default behavior * - ``NEURON_RT_INSPECT_ENABLE`` - Set to ``1`` to enable profiling - Enables system profiling and disables device profiling. To control which profile types are captured, see :ref:`Profile type selection ` * - ``NEURON_RT_INSPECT_OUTPUT_DIR`` - Directory for profile data output - Default directory for captured profile data is ``./output`` .. _neuron-explorer-profile-type-selection: Device or System Profile Type Selection ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: When ``NEURON_RT_INSPECT_ENABLE`` set to ``1``, ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` is enabled by default (set to 1) and ``NEURON_RT_INSPECT_DEVICE_PROFILE`` is disabled by default (set to ``0``). When ``NEURON_RT_INSPECT_ENABLE`` = 1, two different profile types are available: .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Profile type - Description - Enable capture - Disable capture * - ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` - System-level - Captures runtime system events and operations - Set to ``1`` - Set to ``0`` * - ``NEURON_RT_INSPECT_DEVICE_PROFILE`` - Device-level - Captures detailed NeuronCore hardware metrics - Set to ``1`` - Set to ``0`` .. note:: These variables have no effect if ``NEURON_RT_INSPECT_ENABLE`` is not set to ``1``. .. _neuron-explorer-advanced-config-vars: Advanced configuration for System Profiles ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Profile type - Description - Default behavior * - ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` - System-level - Maximum trace events per NeuronCore before oldest events are overwritten - 1,000,000 .. note:: Increasing the event limit will consume more host memory. Capture using nccom-test with Environment Variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Profiling can be enabled using environment variables. For simplicity, we have a quick way to generate a Neuron workload through using :ref:`nccom-test `. nccom-test is a benchmarking tool which is already available with Neuron AMI. .. code-block:: shell export NEURON_RT_INSPECT_ENABLE=1 export NEURON_RT_INSPECT_OUTPUT_DIR=./output nccom-test allr allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512 .. note:: If you have problems with nccom-test add the --debug flag. If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores. To understand the profiling output see this section: :ref:`Inspect Output ` Capture with EKS ^^^^^^^^^^^^^^^^ Capturing a profile on EKS is most easily done through setting of environment variables as described in the section :ref:`Non-framework specific User Experience `. By using environment variables, users do not need to change application code in their container image or modify their run commands. Update the deployment yaml to include the ``NEURON_RT_INSPECT_ENABLE`` and ``NEURON_RT_INSPECT_OUTPUT_DIR`` environment variables. For distributed workloads, it’s important that ``NEURON_RT_INSPECT_OUTPUT_DIR`` points to a directory on a shared volume which all workers have access to. .. code-block:: yaml apiVersion: v1 kind: Pod metadata: name: trn1-mlp spec: restartPolicy: Never schedulerName: default-scheduler nodeSelector: beta.kubernetes.io/instance-type: trn1.32xlarge containers: - name: trn1-mlp env: - name: NEURON_RT_INSPECT_ENABLE value: "1" - name: NEURON_RT_INSPECT_OUTPUT_DIR value: "/shared/output" command: ['torchrun'] args: - '--nnodes=1' - '--nproc_per_node=32' - 'train_torchrun.py' image: ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO}:mlp imagePullPolicy: IfNotPresent resources: limits: aws.amazon.com/neuron: 16 .. note:: EKS users running PyTorch and JAX applications are still free to change their application code and use the PyTorch or JAX Python profiling APIs if they want finer-grained control over profiling. However, using the environment variables conveniently allows profiling without modifying the container image or application code. CLI ~~~ In certain cases, you may want to profile the application without requiring code modifications such as when deploying a containerized application through EKS. Note that when capturing with the CLI, profiling will be enabled for the entire lifetime of the application. If more granular control is required for profiling specific sections of the model, it is recommended to use the PyTorch or JAX APIs. To enable profiling without code change, run your workload with the following environment variables set: .. code-block:: bash export NEURON_RT_INSPECT_ENABLE=1 export NEURON_RT_INSPECT_DEVICE_PROFILE=1 export NEURON_RT_INSPECT_OUTPUT_DIR=./output python train.py CLI reference for System Profiles ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In addition to controlling profiling with environment variables, you can use the ``neuron-explorer inspect`` command line interface for profiling applications. This provides the same functionality as environment variables but helps you avoid typos, invalid arguments, and provides a useful ``--help`` command to explain available options. .. code-block:: shell Usage: neuron-explorer [OPTIONS] inspect [inspect-OPTIONS] [userscript...] Application Options: -v, --version Show version and exit Help Options: -h, --help Show this help message [inspect command options] -o, --output-dir= Output directory for the inspection results (default: .) -n, --num-trace-events= Maximum number of trace events to capture when profiling. Once hitting this limit, old events are dropped [inspect command arguments] userscript: Run command/script that launches a Neuron workload. E.g. 'python app.py' or './runscript.sh' Example of using System Profiles CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ User can provide any type of their own script to generate a Neuron workload such as Pytorch to the System Profiles CLI. For simplicity, we have a quick way to generate a Neuron workload through using ``nccom-test``. ``nccom-test`` is a benchmarking tool which is already available with Neuron AMI and ``aws-neuronx-tools`` package. .. code-block:: shell ubuntu@ip-172-31-63-210:~$ neuron-explorer inspect -o inspect-output-nccom-test nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512 INFO[0000] Running command "nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512" with profiling enabled size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) 524288 131072 fp32 24.15 21.71 21.03 Avg bus bandwidth: 21.0339GB/s .. note:: If you have problems with nccom-test add the --debug flag. If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores. .. _neuron-explorer-inspect-output: ``neuron-explorer inspect`` Output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The above command traces a Neuron workload execution and saves the output to the ``inspect-output-nccom-test`` directory. You will see the output directory contains a single NEFF file and a device profile (NTFF) for all Neuron Cores which executed that NEFF. You will also see ``ntrace.pb`` and ``trace_info.pb`` files storing the system profile data. Below showing what the outputs will look like: .. code-block:: shell ubuntu@ip-172-31-63-210:~$ tree inspect-output-nccom-test inspect-output-nccom-test ├── i-012590440bb9fd263_pid_98399 │ ├── 14382885777943380728_instid_0_vnc_0.ntff │ ├── 14382885777943380728_instid_0_vnc_1.ntff │ ├── 14382885777943380728_instid_0_vnc_10.ntff │ ├── 14382885777943380728_instid_0_vnc_11.ntff ... │ ├── 14382885777943380728_instid_0_vnc_8.ntff │ ├── 14382885777943380728_instid_0_vnc_9.ntff │ ├── cpu_util.pb │ ├── host_mem.pb │ ├── neff_14382885777943380728.neff │ ├── ntrace.pb │ └── trace_info.pb └── 2 directories, 74 files To view a summary of the captured profile data run the command .. code-block:: shell neuron-explorer view -d inspect-output-nccom-test --output-format summary-text .. _neuron-explorer-filtering-system-profiles: Capture-time Filtering ---------------------- **Capture-time filtering** reduces memory usage and trace file size by only collecting specific events, but filtered data cannot be recovered later. Configure filters before trace capture using environment variables or API functions. You can use NeuronCore filters to only capture events for specific NeuronCores (for example only events associated with NeuronCore 0 or all the NeuronCores on a specific NeuronDevice). You can use event type filters to only capture specific events (for example model execute or collectives events). It is possible to combine both NeuronCore and event type filters. NeuronCore ~~~~~~~~~~ If capture is enabled for a NeuronCore then a ring buffer will be allocated in host memory for storing those core's events. Thus filtering by NeuronCore decreases host memory usage during capture. Default Behavior ^^^^^^^^^^^^^^^^ By default, all visible NeuronCores are enabled for capture. Using Environment Variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: shell # Filter to capture events only from NeuronCore 0 export NEURON_RT_INSPECT_EVENT_FILTER_NC=0 # Filter to capture events from NeuronCores 0, 2, and 4 export NEURON_RT_INSPECT_EVENT_FILTER_NC=0,2,4 # Filter to capture events from a range of NeuronCores (0 through 3) export NEURON_RT_INSPECT_EVENT_FILTER_NC=0-3 # Reset to default behavior unset NEURON_RT_INSPECT_EVENT_FILTER_NC # Back to capturing all visible cores Using API Functions ^^^^^^^^^^^^^^^^^^^ .. code-block:: c #include // Allocate and configure trace options nrt_sys_trace_config_t *config; nrt_sys_trace_config_allocate(&config); nrt_sys_trace_config_set_defaults(config); // Enable capture only for specific NeuronCores // Disable all cores since by default they are all enabled int num_cores = 128; for (int i=0; i // Get all available event types const char **event_types = nullptr; size_t count = 0; NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count); if (status == NRT_SUCCESS) { printf("Available event types:\n"); for (size_t i = 0; i < count; ++i) { printf(" %s\n", event_types[i]); } // Free the event types array for (size_t i = 0; i < count; ++i) { free((void*)event_types[i]); } free((void*)event_types); } Using Environment Variables ^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``NEURON_RT_INSPECT_EVENT_FILTER_TYPE`` environment variable supports: * **Default**: If not set, all event types are captured * **Specific event types**: Use exact event names from ``nrt_sys_trace_get_event_types()`` * **Event categories**: Use ``hardware`` or ``software`` to filter by category * **Exclusion**: Use ``^`` prefix to exclude specific events from a category .. code-block:: shell # Filter to capture only specific event types export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=model_load,nrt_execute,runtime_execute # Filter to capture all hardware events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware # Filter to capture all software events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software # Filter to capture all hardware events EXCEPT cc_exec export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,^cc_exec # Filter to capture all software events EXCEPT model_load export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software,^model_load # Mix categories and specific events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,tensor_read,tensor_write # Reset to default behavior unset NEURON_RT_INSPECT_EVENT_FILTER_TYPE # Back to capturing all event types The ``hardware`` group contains events that are executed on the NeuronCore. These are ``nc_exec_running``, ``cc_running``, ``cc_exec_barrier``, ``numerical_err``, ``nrt_model_switch``, ``timestamp_sync_point``, ``hw_notify``. The ``software`` group contains all other events. Using API Functions ^^^^^^^^^^^^^^^^^^^ Use the ``nrt_sys_trace_config_set_capture_enabled_for_event_type`` API to filter by event type. .. code-block:: c #include // Configure trace options nrt_sys_trace_config_t *config; nrt_sys_trace_config_allocate(&config); nrt_sys_trace_config_set_defaults(config); // By default, all event types are enabled // Disable specific event types (others remain enabled) nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "device_exec", false); // Or disable all first, then enable only specific ones const char **all_event_types = nullptr; size_t all_count = 0; nrt_sys_trace_get_event_types(&all_event_types, &all_count); // Disable all event types first for (size_t i = 0; i < all_count; ++i) { nrt_sys_trace_config_set_capture_enabled_for_event_type(config, all_event_types[i], false); } // Enable only specific event types nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "model_load", true); nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "nrt_execute", true); // Verify which event types are enabled const char **enabled_types = nullptr; size_t enabled_count = 0; nrt_sys_trace_config_get_enabled_event_types(config, &enabled_types, &enabled_count); printf("Enabled event types: %zu\n", enabled_count); for (size_t i = 0; i < enabled_count; ++i) { printf(" %s\n", enabled_types[i]); } // Clean up memory (caller is responsible) for (size_t i = 0; i < enabled_count; ++i) { free((void*)enabled_types[i]); } free((void*)enabled_types); for (size_t i = 0; i < all_count; ++i) { free((void*)all_event_types[i]); } free((void*)all_event_types); // Start tracing nrt_sys_trace_start(config); // Your application code here... // Cleanup nrt_sys_trace_stop(); nrt_sys_trace_config_free(config); Processing-time Filtering -------------------------- **Processing-time filtering** preserves the complete trace and allows flexible analysis with different filters, but requires more memory and storage during capture. Apply filters when viewing or processing already captured profiles. This approach allows you to analyze the same trace data in different ways without recapturing. The filters can be used for any ``neuron-explorer`` output format including ``--output-format json`` and ``--output-format perfetto``. NeuronCore ~~~~~~~~~~ Use the ``--system-trace-filter-neuron-core`` to only process events for specific NeuronCores. The IDs are local to the instance and not global IDs. If the ``--system-trace-filter-neuron-core`` argument is not set then events from all NeuronCores will be included in the processed trace. **Single neuron core** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-neuron-core "0" **Multiple neuron cores** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-neuron-core "0,1,2,3" Event Type ~~~~~~~~~~ Use the ``--system-trace-filter-event-type`` to only process specific trace events types. If the ``--system-trace-filter-event-type`` argument is not set then all event types will be included in the processed trace. **Single event type** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-event-type "nrt_execute" **Multiple event type** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-event-type "nrt_execute,nrt_load" Instance ID ~~~~~~~~~~~ Use the ``--system-trace-filter-instance-id`` to only process events for specific ec2 instances. If the ``--system-trace-filter-instance-id`` argument is not set then events from all instances will be included in the processed trace. **Single instance** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-instance-id "i-abc123" **Multiple instances** .. code-block:: shell neuron-explorer view -d ./output --system-trace-filter-instance-id "i-abc123,i-def456,i-ghi789" Processing only system or device profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can reduce processing times by skipping the processing of system or device profiles. Choose this when you are interested in only a specific profile, or when you want to start with a limited set of profiling data before exploring the full profile. To skip processing of device profiles use the ``--ignore-device-profile`` option. To skip processing of system profiles use the ``--ignore-system-profile`` option. These options can be used with the ``--output-format`` values ``parquet`` (default), ``perfetto``, or ``json``. For example: .. code-block:: shell neuron-explorer view -d ./output --ignore-device-profile --output-format perfetto View Profiles ------------- To view a profile in Neuron Explorer, follow these steps: 1. **Start the Neuron Explorer UI and API servers** using the ``neuron-explorer`` tool from ``aws-neuronx-tools``: .. code-block:: bash neuron-explorer view --data-path /absolute/path/to/db By default, the UI will be launched on port 3001 and the API server will be launched on port 3002. 2. **Set up port-forwarding** (if running on a remote EC2 instance) to enable local viewing: .. code-block:: bash ssh -i @ -L 3001:localhost:3001 -L 3002:localhost:3002 note:: it is necessary to forward both 3001 (for the UI server) and 3002 (for the data server) 3. **Open the UI** by navigating to ``localhost:3001`` in your browser. 4. **Upload your profile** by clicking the **"Upload Profile"** button in the Profile Manager page. You can either: * Upload the NEFF (``.neff``) and NTFF (``.ntff``) files individually using the "Individual Files" upload mode, or * Upload the folder containing the NEFF and NTFF files using the "Directory Upload" mode. Neuron Explorer Browser UI ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. _neuron-explorer-profile-manager: Profile Manager ^^^^^^^^^^^^^^^ Profile Manager is a page for uploading artifact (NEFF, NTFF and source code) and selecting profiles to access. .. image:: /tools/profiler/images/profile-workload-3.png .. _profile-manager-upload-profile: Click on the "Upload Profile" button to open the Upload Profile modal. **Device Profile Upload** Select "Individual Files" upload mode to upload NEFF, NTFF, and source code individually. Select "Directory Upload" to upload profile files from a directory. .. note:: * "Profile name" is a required field. You cannot upload a profile with existing name unless the option "Force Upload" is checked at the bottom. Force Upload currently will overwrite the existing profile with the same name. * For uploading source code, the UI only supports the upload of folders, individual files, or compressed files in the gzipped tar ``.tar.gz`` archive format. .. image:: /tools/neuron-explorer/images/device-profile-upload-ui.png .. _profile-manager-system-profile-upload: **System Profile Upload** Select "Directory Upload", then in the Profile Directory drag and drop area, select the directory containing the system profile files. The directory should contain instance sub-directories with the following: ``ntrace.pb``, ``trace_info.pb``, ``cpu_util.pb``, and ``host_mem.pb``. For an example see the output in :ref:`neuron-explorer inspect ` .. note:: System Profile uploads only support "Directory Upload". .. image:: /tools/neuron-explorer/images/system-profile-upload-ui.png **Processing Status** After uploading a profile, the processing task is shown under "User Uploaded" table. Use the "Refresh" button in the top-right to fetch the latest processing status and verify completion. **Listing profiles** All uploaded profiles are provided in the Profile Manager page with details such as the processing status and upload time, along with various quick access actions. .. image:: /tools/profiler/images/profile-workload-5.png * **Pencil button**: Rename a profile. * **Star button**: Mark this profile as favorite profile. This profile will be shown in the User's favorites list. * **Bulb button**: Navigate to the summary page of this profile. For more details on the summary page, see :doc:`this overview of the Neuron Explorer Summary Page `. Clicking on the name of profile takes you to its corresponding profile page. Neuron Explorer for Visual Studio Code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The UI is also available as a VSCode extension, enabling better native integration for features such as code linking. Install the Neuron Explorer extension from the Visual Studio Code Marketplace. Open the Extensions view in VSCode by pressing **Ctrl+Shift+X** (Windows/Linux) or **CMD+Shift+X** (MacOS), and search for ``AWS Neuron Explorer`` or ``amazonwebservices.neuron-explorer``. Select the extension published by **Amazon Web Services** in the sidebar, then click the blue **Install** button. .. image:: /tools/profiler/images/profile-workload-1.png Ensure the SSH tunnel is established by following the steps above. Otherwise, specify a custom endpoint by selecting the extension in the left activity bar. Then, navigate to the "Endpoint" action on the bottom bar of your VSCode session and select "Custom endpoint", and enter ``localhost:3002``. .. image:: /tools/profiler/images/profile-workload-2.png From there, navigate to the **Profile Manager** page through the extension UI in the left activity bar. JSON Output ~~~~~~~~~~~ The ``--output-format`` json option writes processed profile data to human-readable JSON that can be used for scripting and manual inspection. .. code-block:: shell neuron-explorer view -d ./output --output-format json This will generate a ``system_profile.json`` file containing the system profile data and a ``device_profile_model_.json`` file for each unique compiled model that was executed on a Neuron Device. The system_profile.json JSON contains the following data types: * ``trace_events``: Neuron Runtime API trace events and Framework/Application trace events containing timestamps, durations, names, and the ec2 instance-id to differentiate between events from different compute nodes in a distributed workload. .. code-block:: json { "Neuron_Runtime_API_Event": { "duration": 27094, "group": "nrt-nc-000", "id": 1, "instance_id": "i-0f207fb2a99bd2d08", "lnc_idx": "0", "name": "nrt_tensor_write", "parent_id": 0, "process_id": "1627711", "size": "4", "tensor_id": "4900392441224765051", "tensor_name": "_unknown_", "thread_id": 1627711, "timestamp": 1729888371056597613, "type": 11 }, "Framework_Event": { "duration": 3758079, "group": "framework-80375131", "instance_id": "i-0f207fb2a99bd2d08", "name": "PjitFunction(matmul_allgather)", "process_id": "701", "thread_id": 80375131, "timestamp": 1729888382798557372, "type": 99999 } } * ``mem_usage``: sampled host memory usage .. code-block:: json { "duration": 1, "instance_id": "i-0f207fb2a99bd2d08", "percent_usage": 9.728179797845964, "timestamp": 1729888369286687792, "usage": 51805806592 } * ``cpu_util``: sampled CPU utilization. Results are provided per core and per ec2 instance involved in a distributed workload .. code-block:: json { "cpu_id": "47", "duration": 1, "instance_id": "i-0f207fb2a99bd2d08", "timestamp": 1729888371287337243, "util": 2.3255813 }, View in Perfetto ~~~~~~~~~~~~~~~~ Users can view their Neuron Explorer profiles in Perfetto. Please see :doc:`view-perfetto` for more information. .. note:: New Neuron Explorer features released in 2.27 and onwards may not be supported in Perfetto. For the full user experience and features set, please use the Neuron Explorer UI or VSCode Integration. Troubleshooting --------------- Incomplete JAX Profiles ~~~~~~~~~~~~~~~~~~~~~~~ If your JAX profile has fewer events than expected or lacks the Runtime API trace, check whether ``jax.profiler.stop_trace`` is being called inside a ``with jax.profiler.trace`` context block. This can prematurely stop tracing. Use ``jax.profiler.stop_trace`` only when profiling was started with ``jax.profiler.start_trace``, not when using the context-managed ``with jax.profiler.trace`` API. Also when using ``jax.profiler`` within your script ensure that the environment variable ``NEURON_RT_INSPECT_ENABLE`` is not set to 1. Additionally, ensure that ``NEURON_RT_INSPECT_OUTPUT_DIR`` is set to the correct output directory and this is the output directory passed to ``with jax.profiler.trace``. Dropped Events in System Profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When processing a system profile, you may see a warning indicating that some trace events were dropped during capture. .. code-block:: shell WARN[0000] Warning: 1001 trace events were dropped during capture (stored 530560 out of 531561 total events). Consider increasing buffer size, reducing trace duration, or filtering events. This means during capture the trace event buffers filled and oldest events were overwritten. If you need to avoid dropping events for the full duration of your workload consider the following adjustments: * Increase buffer size by setting ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` (see :ref:`Profile Capture Environment Variables `). This will increase host memory usage. * Apply capture-time filters (NeuronCores / event types) (see :ref:`Filtering System Profiles `.) * Shorten profiled region: limit the code span under the profiling context / runtime. ================================================ FILE: tools/neuron-explorer/index.rst ================================================ .. meta:: :description: Neuron Explorer documentation for performance profiling, debugging, and optimization of ML workloads on AWS Trainium and Inferentia. :date-modified: 12/02/2025 .. _neuron-explorer-home: Neuron Explorer ================= .. important:: Neuron Explorer is the recommended profiling tool for AWS Neuron workloads. It provides end-to-end profiling support along with the latest features and an improved user experience. **Note:** Neuron will end support for :ref:`Neuron Profiler 2.0 ` and :ref:`Neuron Profiler ` in Neuron 2.29 release. Users are encouraged to migrate to Neuron Explorer. Please see :doc:`migration-faq` and :ref:`neuron-explorer-faq` for more details. Neuron Explorer is a suite of tools designed to support ML engineers throughout their development journey on AWS Trainium. Neuron Explorer helps developers maintain context, iterate efficiently, and focus on building and optimizing high-performance models. Developers can access Neuron Explorer from CLI, UI, or inside their IDE through VSCode integration. Profiling Viewers -------------------- Neuron Explorer enables ML performance engineers to trace execution from source code down to hardware operations, enabling detailed analysis of model behavior at every layer of the stack. The suite of tools supports both single-node and distributed applications, allowing developers to analyze workloads at scale. Getting Started --------------- .. grid:: 1 2 2 2 :gutter: 3 .. grid-item-card:: Get Started :link: get-started :link-type: doc :class-card: sd-border-1 Set up Neuron Explorer, launch the web UI, and configure SSH tunneling for secure access to profiling data. .. grid-item-card:: Capture and View Profiles :link: how-to-profile-workload :link-type: doc :class-card: sd-border-1 Learn how to capture and view profiles in the Neuron Explorer UI or directly in your IDE via VSCode Integration. Visualization and Analysis --------------------------- .. grid:: 1 :gutter: 3 .. grid-item-card:: Device Trace Viewer :link: overview-device-profiles :link-type: doc :class-card: sd-border-1 Explore hardware-level execution with timeline view, operator table, event details, annotations, dependency highlighting, search, and more analysis features. .. grid-item-card:: System Trace Viewer :link: overview-system-profiles :link-type: doc :class-card: sd-border-1 Explore system-level execution with timeline view and more analysis features. .. grid:: 1 2 2 2 :gutter: 3 .. grid-item-card:: Hierarchy Viewer :link: overview-hierarchy-view :link-type: doc :class-card: sd-border-1 Visualize the entire execution from model layers down to hardware execution, supporting interactivity with device viewer and source code linking. .. grid-item-card:: Source Code Viewer :link: how-to-link-view-source-code :link-type: doc :class-card: sd-border-1 Navigate between NKI and PyTorch source code and profile data with bidirectional linking and highlighting. .. grid-item-card:: Summary Viewer :link: overview-summary-page :link-type: doc :class-card: sd-border-1 Get streamlined performance insights and optimization recommendations with high-level metrics and visualizations. .. grid-item-card:: Database Viewer :link: overview-database-viewer :link-type: doc :class-card: sd-border-1 Develop your own analyses, examine profiling data stored in database tables, or run ad-hoc queries during performance analysis. .. grid-item-card:: Tensor Viewer :link: overview-tensor-viewer :link-type: doc :class-card: sd-border-1 Viewing tensor information including names, sizes, shapes, and memory usage details. .. grid-item-card:: Memory Viewer :link: overview-memory-viewer :link-type: doc :class-card: sd-border-1 Analyze memory allocation, usage patterns, and potential inefficiencies across SBUF partitions. .. grid-item-card:: AI Recommendation Viewer :link: overview-ai-recommendations :link-type: doc :class-card: sd-border-1 Get AI powered bottleneck analysis and optmization recommendations for NKI profiles. Tutorials ---------- .. grid:: 1 :gutter: 3 .. grid-item-card:: Profile a NKI Kernel :link: /nki/guides/use-neuron-profile :link-type: doc :class-card: sd-border-1 Learn how to profile a NKI kernel with Neuron Explorer. .. grid:: 1 2 2 2 :gutter: 3 .. grid-item-card:: vLLM Performance :link: /tools/tutorials/performance-profiling-vllm :link-type: doc :class-card: sd-border-1 Capture and analyze system-level and device-level profiles for vLLM inference workloads on Trainium. Additional Resources -------------------- .. grid:: 1 :gutter: 3 .. grid-item-card:: Viewing Profiles with Perfetto :link: view-perfetto :link-type: doc :class-card: sd-border-1 Learn how to view Neuron Explorer profiles using the Perfetto UI for trace analysis. .. _download-neuron-explorer-vscode: Neuron Explorer for Visual Studio Code ------------------------------------------------ The Neuron Explorer VSCode extension is available on the Visual Studio Code Extension Marketplace. To install the extension, open the Extensions view in VSCode by pressing **Ctrl+Shift+X** (Windows/Linux) or **CMD+Shift+X** (MacOS), and search for ``AWS Neuron Explorer`` or ``amazonwebservices.neuron-explorer``. Select the extension published by **Amazon Web Services** in the sidebar, then click the blue **Install** button. You can also install the extension directly from the `Visual Studio Code Marketplace `_. .. _neuron-explorer-faq: Neuron Explorer FAQ ------------------- What can I expect from the Neuron Explorer? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron Explorer provides a comprehensive profiling experience with both device-level and system-level profiling support. Neuron Explorer features an enhanced profiling experience with hierarchical profiling, bidirectional code linking, AI-powered recommendations, IDE integration, and more. In future releases, Neuron Explorer will continue to expand with additional profiling viewers and features, debugging capabilities, and enhanced recommendation and analysis tools to support the entire ML development journey on Trainium. What is the difference between device-level and system-level profiling? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Device-level profiling captures hardware execution data from NeuronCores, including compute engine instructions, DMA operations, and hardware utilization. Use device-level profiling to analyze hardware performance, identify compute or memory bottlenecks, and optimize kernel implementations. System-level profiling captures software execution data, including framework operations, Neuron Runtime API calls, CPU utilization, and memory usage. Use system-level profiling to analyze framework overhead, identify CPU bottlenecks, and debug runtime issues. Is Neuron Explorer going to replace Neuron Profiler and Neuron Profiler 2.0? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Neuron Explorer is the recommended profiling tool and replaces both Neuron Profiler and Profiler 2.0. Neuron Profiler and Profiler 2.0 are supported for one final release. In Neuron 2.29 release, they will enter end-of-support and will no longer receive updates or technical support, though they will remain accessible through the ``neuron-profile`` package in previous releases. Users should migrate to Neuron Explorer now. Are my existing profiles compatible with Neuron Explorer? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Yes. Neuron Explorer is backwards compatible with profile data captured using Neuron Profiler or Profiler 2.0. Existing profile files must be reprocessed before viewing in Neuron Explorer, but you do not need to recapture them. See :ref:`new-neuron-profiler-setup`. For detailed migration guidance, including CLI command mappings and feature comparisons, see the :doc:`migration-faq`. .. toctree:: :hidden: :maxdepth: 1 Get Started Neuron Profiler to Neuron Explorer Migration Guide Capture and View Profiles Device Trace Viewer System Trace Viewer Hierarchy Viewer Source Code Viewer Summary Viewer Database Viewer Tensor Viewer Memory Viewer AI Recommendation Viewer View Profiles with Perfetto ================================================ FILE: tools/neuron-explorer/migration-faq.rst ================================================ .. _neuron-profiler-migration-guide: Migration Guide from Neuron Profiler to Neuron Explorer ======================================================== This guide provides detailed information for migrating from Neuron Profiler or Neuron Profiler 2.0 to Neuron Explorer. .. contents:: Table of Contents :local: :depth: 2 Overview -------- Neuron Explorer is the recommended profiling tool for AWS Neuron workloads, replacing both Neuron Profiler and Neuron Profiler 2.0. This guide helps you transition your profiling workflows to Neuron Explorer. Key Differences --------------- The following table summarizes the key differences between Neuron Profiler/Profiler 2.0 and Neuron Explorer: .. list-table:: :widths: 30 35 35 :header-rows: 1 :align: left * - Feature - Neuron Profiler / Profiler 2.0 - Neuron Explorer * - CLI tool - ``neuron-profile`` - ``neuron-explorer`` * - Device Profiling - Yes - Yes (enhanced) * - System Profiling - Yes (Profiler 2.0 only) - Yes * - Hierarchy Viewer - No - Yes * - Source Code Viewer - Yes (Device profiles) - Yes (Device profiles) * - AI Recommendation Viewer - No - Yes (for NKI profiles) * - IDE Integration - No - Yes (VSCode Extension) * - Database Viewer - No - Yes * - Tensor Viewer - No - Yes * - Additional Installation Requirements - InfluxDB installation required - None Update CLI Commands -------------------- Replace ``neuron-profile`` with ``neuron-explorer`` in your scripts and workflows. The following commands are subject to change before GA: .. list-table:: :widths: 50 50 :header-rows: 1 :align: left * - Neuron Profiler Command - Neuron Explorer Command * - ``neuron-profile view -d ./output`` - ``neuron-explorer view -d ./output`` * - ``neuron-profile view -n file.neff -s profile.ntff`` - ``neuron-explorer view -n file.neff -s profile.ntff`` * - ``neuron-profile capture -n file.neff -s profile.ntff`` - ``neuron-explorer capture -n file.neff -s profile.ntff`` Frequently Asked Questions -------------------------- Do I need to install InfluxDB for Neuron Explorer? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No. Unlike Neuron Profiler, Neuron Explorer requires no external installation or setup. How do I view existing profiles captured with Neuron Profiler? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Existing NEFF and NTFF files captured with Neuron Profiler are fully compatible with Neuron Explorer. To view them: .. code-block:: bash # View a single device profile neuron-explorer view -n file.neff -s profile.ntff The profiles will be reprocessed using Neuron Explorer's processing pipeline, which may provide additional insights not available in the original Neuron Profiler view. How do I capture profiles with Neuron Explorer? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron Explorer provides the ``neuron-explorer capture`` command for standalone NEFF profiling, similar to ``neuron-profile capture``: .. code-block:: bash # Capture a device profile neuron-explorer capture -n file.neff -s profile.ntff You can also use the framework profiling APIs or environment variables to capture profiles during your actual workload execution. For NKI kernel profiling, continue using the ``nki.benchmark`` or ``nki.profile`` APIs as documented in the :ref:`NKI profiling guide `. What new features does Neuron Explorer provide? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Neuron Explorer introduces several new capabilities: - **Hierarchy Viewer**: Visualize execution from model layers down to hardware operations. See :doc:`overview-hierarchy-view`. - **Source Code Viewer**: Navigate between source code and profile data. See :doc:`how-to-link-view-source-code`. - **AI Recommendation Viewer**: Get AI-powered optimization suggestions for NKI profiles. See :doc:`overview-ai-recommendations`. - **Database Viewer**: Run custom queries on profiling data. See :doc:`overview-database-viewer`. - **Memory Viewer**: Get insight into memory allocation, usage patterns, and potential memory usage inefficiencies. - **Tensor Viewer**: Examine tensor information including shapes and memory usage. See :doc:`overview-tensor-viewer`. - **VSCode Extension**: View profiles directly in your IDE with native code linking support. - **System Trace Viewer**: Enhanced system-level profiling visualization. See :doc:`overview-system-profiles`. How do I get help during migration? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Review the :doc:`get-started` guide for initial setup - See :doc:`how-to-profile-workload` for detailed capture and viewing instructions - Check submitted issues and file new issues via the `AWS Neuron GitHub issues `_ ================================================ FILE: tools/neuron-explorer/overview-ai-recommendations.rst ================================================ .. meta:: :description: AI Recommendation feature helps identify and understand bottlenecks and optimization opportunities for NKI kernels through AI-powered analysis :date-modified: 11/21/2025 AI Recommendation Viewer ========================= In this guide, you'll learn how to use the AI Recommendation Viewer to identify and understand bottlenecks and optimization opportunities for NKI kernels through AI-powered analysis of the user's profile and source code. Users receive actionable recommendations through the Neuron Explorer UI, CLI, or via their IDE. Each report provides the top 2-3 optimization opportunities ranked by effort and impact, including the symptom with quantified metrics, the optimization with implementation guidance, expected speedup estimates, and implementation tradeoffs. The feature is entirely opt-in and only enabled for profiles that the user explicitly requests a recommendation for. .. warning:: * Responses in this AWS Bedrock-powered feature are AI-generated. Verify accuracy and appropriateness before use. * This feature is available in US Regions only. Neuron may securely transmit data across Regions within your geography for processing. * Your AWS account will be billed for Bedrock usage. Each time you generate an AI Recommendation for a profile, a single Bedrock request is made with up to 30,000 input tokens and 10,000 output tokens. * At the moment, this feature may only be used with Claude Sonnet 4.5. .. _local_setup_directions: Local setup directions ---------------------------------------------------- AI Recommendations use Amazon Bedrock. To enable this feature, you must configure AWS credentials on the system you are running neuron-explorer on. The AWS credentials should have bedrock:InvokeModel permissions and access to Claude Sonnet 4.5. For information on configuring Bedrock access, refer to the `AWS Bedrock model access documentation `_. Getting an AI Recommendation From the UI ---------------------------------------------------- To generate an AI Recommendation from the UI open your profile, click the "Add Widget" dropdown, and select **AI Recommendation**. .. image:: /tools/profiler/images/recommendation-button.png Go to the **AI Recommendation** widget box and click the **Get AI Recommendation** button. This will perform additional analysis and send the recommendation request to AWS Bedrock and can take up to a minute to generate. Avoid refreshing the page during this time. .. image:: /tools/profiler/images/recommendation-widget.png Once the recommendation has been generated it will be displayed in the widget box. For each recommendation you will see the performance inefficiency symptoms that were observed, the suggested optimization to make, and potential tradeoffs to look out for when implementing the optimizations. .. image:: /tools/profiler/images/recommendation-view.png Getting an AI Recommendation from the CLI ---------------------------------------------------- Users may also get AI recommendations with the ``neuron-explorer recommend`` CLI command. Before you start, ensure that you have followed the :ref:`local setup directions ` to enable Bedrock access on your configured AWS account. ``neuron-explorer`` uses the default AWS credentials you have configured. If you will use other credentials, you can specify an AWS profile to use by setting environment variables: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html. To generate a recommendation, provide the following to the ``neuron-explorer recommend`` command: * A NEFF file for your compiled NKI kernel * An NTFF file for your captured profile * The location where your NKI source files can be found Example: .. code-block:: neuron-explorer recommend -n -s --nki-source-root Running this command processes the profile and prints the AI-generated recommendation to the console in Markdown format. You can save this output to a file and view it in any text editor or Markdown viewer. ================================================ FILE: tools/neuron-explorer/overview-database-viewer.rst ================================================ .. meta:: :description: Learn about the Database Viewer tool in Neuron Explorer for querying and exploring profiling data using SQL or natural language queries. :date-modified: 01/27/2026 .. _database-viewer-overview: Database Viewer ===================== The Database Viewer offers an interactive interface providing visibility to all underlying data that the Neuron Explorer processes from a :doc:`NEFF ` and NTFF. Use this tool to develop your own analyses, examine profiling data stored in database tables, or run ad-hoc queries during performance analysis. You can access this data through natural language queries or raw SQL. .. image:: /tools/profiler/images/database-viewer.png Table Selection and Schema Inspection ------------------------------------- When the tool loads, it fetches the list of available database tables. Select a table from the dropdown to view its schema. The schema table displays: * **Field Name** - Column name (hover for description tooltip). * **Data Type** - The data type of the field. * **Required** - Whether the field is required. * **Unit** - Measurement unit (if applicable). * **Example** - Example value for the field. Querying Data ------------- The query input supports two modes: 1. **SQL queries** - Write standard SQL starting with ``SELECT``. 2. **Natural language queries** - Describe what you want in plain English. Examples: Natural language query to get the first 5 rows:: Get the first 5 rows SQL query to filter with conditions:: SELECT field_name FROM table_name WHERE condition Press **Enter** or click **Execute Query** to run. Use **Shift+Enter** for multi-line input. Query Results ------------- Results appear below the query input in reverse chronological order (newest first). Each result shows: * The original query text. * The generated SQL (for natural language queries). * A scrollable results table. Click **Export CSV** to download any result set as a CSV file. .. image:: /tools/profiler/images/database-viewer-query-result.png ================================================ FILE: tools/neuron-explorer/overview-device-profiles.rst ================================================ .. meta:: :description: Learn about Neuron Explorer widgets for device profiling including timeline views, event details, annotations, and performance analysis tools. :date-modified: 12/02/2025 Device Trace Viewer =================== The Neuron Device Trace Viewer displays a hardware instruction level granularity of execution on a NeuronCore. Neuron Explorer collects the timestamped start and end events that occur on the device into a NTFF. As a post-processing step, the profiler will correlate these events with information in the compiled NEFF to generate a detailed report of the hardware performance. The Neuron Explorer UI provides several different tools for an extensible and customizable workflow. .. image:: /tools/profiler/images/device-profile-1.png Tools ------ Device Trace Viewer ~~~~~~~~~~~~~~~~~~~~~ The Device Trace Viewer presents a timeline view of the device execution, including activity on the DMA and compute engines, Hardware FLOPs Utilization (HFU) and device memory utilization over time, and more. .. image:: /tools/profiler/images/device-profile-2.png Hover ^^^^^ .. image:: /tools/profiler/images/device-profile-3.png Hover on events in the timeline to see important identifying information at a glance, such as the time window, the hierarchy, and the hardware instruction that was executed. For more details, clicking the event will display the full details in the Event Details widget. Color Scheme ^^^^^^^^^^^^ .. list-table:: :header-rows: 0 :widths: 50 50 * - .. image:: /tools/profiler/images/device-profile-4.png :width: 100% - .. image:: /tools/profiler/images/device-profile-5.png :width: 100% Instructions are color-coded according to their associated PyTorch operator. All instructions derived from the same PyTorch operator share an identical color. .. note:: In future releases, we will introduce more customizable options for color-coding. Panning ^^^^^^^ .. image:: /tools/profiler/images/device-profile-6.gif Panning is supported in a couple of ways: * Left-clicking the x-axis and dragging it * Spinning scroll-wheel while holding down shift * With the keyboard: * A/D keys for left/right movement * Left/right arrow keys for left/right movement The amount panned depends on the current zoom level. Event Details ~~~~~~~~~~~~~ Upon clicking an event in the Device Trace Viewer, all details related to the event will appear in the Event Details. The information shown will be a superset of the information available on hover, allowing us to dive deeper into what is happening on the hardware. * The Event Details table will populate with field data from clicked events from the instruction widget. * When filtering by fields through Search, all matching events will be rendered as pages in the Event Details. Users can navigate through each page to analyze data for each matching event. .. image:: /tools/profiler/images/device-profile-7.png Annotations ~~~~~~~~~~~ Users can create annotations by right-clicking in the Device Trace Viewer. These annotations can be moved by clicking and dragging the vertical line, and will snap to the closest events when applicable. The annotations tab will show more details on all available annotations in the profile, such as the time difference and summary metrics that occur between two markers. The option of which two annotations to compare is configurable in the diff vs column. You can also quickly zoom in to the region between two annotations by selecting the checkbox on the left. Users can rename, delete, save, and load annotations for better readability and collaboration. .. image:: /tools/profiler/images/device-profile-8.png Operator Table ~~~~~~~~~~~~~~ The Operator Table aggregates the hardware level metrics into framework layers and operations, such as the MFU and amount of data being moved. Users can progressively expand each row to get a further breakdown of each nested operator. Filters can be applied and columns can be sorted for more streamlined viewing. .. image:: /tools/profiler/images/device-profile-9.png Overall Summary ~~~~~~~~~~~~~~~ The Overall Summary displays performance metrics across the entire profile run, with metrics broken down into different categories such as by the NeuronCore engines. These can be used for quick insights into how well the model performed. .. image:: /tools/profiler/images/device-profile-10.png Current Selection Summary ~~~~~~~~~~~~~~~~~~~~~~~~~ The Current Selection Summary provides metrics for the current time window. Zooming in and out in the Device Trace Viewer will update the summary. This can be used in conjunction with the zoom feature of Annotations for easy access to a region of interest. .. image:: /tools/profiler/images/device-profile-11.png .. _box-selection-summary: Box Selection Summary ~~~~~~~~~~~~~~~~~~~~~ The Box Selection Summary provides metrics within a bounding box region. Select and drag regions within the timeline widget to update the summary. .. image:: /tools/profiler/images/box-select.gif Box selection is supported in a couple of ways: * Toggling the box selection button within the timeline widget * Clear selection with `esc` key Correponding summary information of the selected region is displayed within the box selection selection widget. Code Viewer ~~~~~~~~~~~ Profiles that are uploaded with source code files enable users to quickly navigate between NKI and application level source code and the corresponding hardware level instructions. In the Device Trace Viewer, we can click on an event to highlight the source code line in the Code Viewer. A (Ctrl/Cmd) + click on the event will scroll to the corresponding source code line. In the Code Viewer, clicking on a line in the source code will automatically highlight all associated events in the Device Trace Viewer. Similarly, highlighting multiple lines of the source code will also highlight all events in the timeline. .. image:: /tools/profiler/images/device-profile-12.png See :ref:`neuron-explorer-source-code` for instructions on how to enable source code viewing. Layout Customization ~~~~~~~~~~~~~~~~~~~~ Understanding and optimizing performance with the profiler can be overwhelming given the amount of information being processed and displayed. As part of preparing for optimization work, you can cross-reference different information, such as the Device Trace Viewer with the application source code. With the widget-based UI, you can customize the layout to best fit a specific workflow. Each widget can be added, removed, dragged around, and resized. Once you are happy with the layout, you can save it through the Layout dropdown at the top right. The layouts are not tied to a specific profile, so they can be loaded and re-used for future profiles as well. .. image:: /tools/profiler/images/device-profile-13.png ================================================ FILE: tools/neuron-explorer/overview-hierarchy-view.rst ================================================ .. meta:: :description: Learn about the Hierarchy View in Neuron Explorer for analyzing framework layers and HLO operations with zooming, highlighting, and display options. :date-modified: 12/02/2025 Hierarchy Viewer =================== The Hierarchy Viewer shows an up-leveled representation of the hardware execution organized by the framework layers and HLO operations. It enables you to progressively drill down into nested layers or operators and map the execution of application level constructs to the Neuron device. This view interacts with other tools such as the Device Trace Viewer. .. image:: /tools/profiler/images/hierarchy-view-1.gif Zooming ------- .. image:: /tools/profiler/images/hierarchy-view-2.png You can zoom in on the Hierarchy Viewer in a couple of ways: * Click-drag your mouse across the graph (support in both directions) * Scroll down using your mouse wheel, with the mouse cursor on the x-axis * Zoom in and out buttons in the top-right corner * With the keyboard: * W and S for zooming in and out, respectively * Up and down arrow keys for zooming in and out, respectively To zoom out, simply scroll up with your mouse wheel when you place your mouse cursor on the x-axis. Change Displayed Layers ----------------------- .. image:: /tools/profiler/images/hierarchy-view-3.png The display options menu, accessed with the button in the top-right corner, allows you to selectively show or hide different layers. For instance, in the example shown above, the framework layer is hidden while displaying the hierarchy starting from HLO. Highlighting ------------ .. image:: /tools/profiler/images/hierarchy-view-4.png Right-clicking on an operator in Hierarchy Viewer will highlight all the corresponding instructions in the Device Trace Viewer for the operator using the same color. Multiple operators can be highlighted at once. .. image:: /tools/profiler/images/hierarchy-view-5.png ================================================ FILE: tools/neuron-explorer/overview-memory-viewer.rst ================================================ .. meta:: :description: Learn about the Memory View in Neuron Explorer for analyzing all the memory allocations on SBUF. :date-modified: 03/24/2026 Memory Viewer =================== The Memory Viewer in Neuron Explorer offers deep, low-level insight into memory allocation, usage patterns, and potential inefficiencies — going well beyond surface-level metrics. With comprehensive visibility into how memory is consumed across the device, it enables kernel and performance engineers to make informed optimization decisions, reduce debugging time, and improve overall system performance. .. image:: /tools/neuron-explorer/images/memory_viewer_overview.png :alt: Memory Viewer overview showing memory allocation patterns across SBUF partitions Enable Memory Viewer during Profile Upload -------------------------------------------- To enable the Memory Viewer feature, check the option 'Enable Memory Viewer' when you upload your profile: .. image:: /tools/neuron-explorer/images/memory_viewer_enable.png View the Memory Viewer Widget ------------------------------ Once your profile finishes processing and is ready to view, click the Add Widget button and select 'Memory Viewer': .. image:: /tools/neuron-explorer/images/memory_viewer_add_widget.png By hovering your mouse over each allocation, you can see the detailed information about this allocation. For allocations triggered by instructions, hover informations includes: * Start time and end time * Duration * Start address and end address * Opcode * Operands For allocations triggered by DMAs, hover information includes: * Partition number * Start time and end time * Duration * Start address and end address * DMA queue name * Block ID By analyzing memory allocations, you can address memory fragmentation by identifying sparse allocation patterns and potentially rescheduling instructions or DMAs to different addresses to maintain memory compactness. Additionally, you can perform spill/reload analysis to identify opportunities for reducing spills by relocating allocations to available space at alternative addresses. You can also use the dropdown menu to inspect the memory allocations on different partitions and NC cores: .. image:: /tools/neuron-explorer/images/memory_viewer_hover.png ================================================ FILE: tools/neuron-explorer/overview-summary-page.rst ================================================ .. meta:: :description: Learn how to use the Neuron Explorer summary page to quickly identify performance issues, view key metrics, and get actionable optimization recommendations for your profiles. :date-modified: 03/20/2026 Summary Viewer ================ The Neuron Explorer summary viewer provides a streamlined view of your profile's most critical performance insights, enabling quick identification of issues and optimization opportunities without navigating through detailed data. .. image:: /tools/profiler/images/explorer-summary-page.png Benefits -------- Both new and experienced users benefit from this streamlined view of profiling data. * Identify performance issues quickly * Understand your profile's most critical metrics at a glance * Get actionable recommendations for optimization How to use ------------- 1. **Open your profile** - The Summary Viewer is accessible via the Profile Manager or Neuron Explorer UI. 2. **Examine key metrics** - Review the metrics and graphs to understand your profile's performance characteristics. 3. **Review recommendations** - Start with the **Performance Insights & Recommendations** section. This section highlights the most important performance issues. 4. **Select specific time regions** - Use the "Region Selection" menu to view specific timeslices corresponding to network layers. This helps you drill down into specific sections of your profile. You can generate custom time regions using the "Add Region" button. 5. **Take action** - Apply the recommended optimizations to your model or workload. Understanding region-level insights ----------------------------------- When you work with profiles from entire networks or network subgraphs, different regions will have different performance characteristics. The landing page enables performance analysis on a per-layer basis and provides: * Layer-specific recommendations * Time-range indication of where problems occur * More accurate insights for complex profiles Use the 'Region Selection' menu to navigate between different layers and view their individual performance data. What the landing page displays ------------------------------ Performance Insights and Recommendations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section shows 2-4 recommendations to help you improve performance. The profiler analyzes your data and identifies the most important issues to address. The profiler prioritizes recommendations by criticality and shows you the most critical ones first. Example recommendations ^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 30 35 35 * - Condition - Root Cause - Recommended Action * - Low Model FLOPS relative to Active FLOPS (< 50%) - Tensor engine is active but not performing useful matrix operations - Ensure instructions use the entire tensor engine and are pipelined correctly * - NKI instruction coverage < 50% on tensor, vector, or scalar engine - Compiler-generated instructions dominate the engine - Write NKI kernel code for the network operations present in that profile section * - Active FLOPS throttling detected - FLOPS lost due to throttling during active tensor engine periods - Investigate the root cause of throttling to recover tensor engine utilization * - Transpose FLOPS > 10% of total hardware FLOPS - Excessive data movement within the tensor engine - Improve memory layout to reduce transpose operations * - Collective operation outliers detected - Significantly underperforming collective operations relative to their group median - Check for overlapping instructions that might be causing delays * - Spill reload bytes > 25% of total HBM reads - Excessive spill/reload operations consuming memory bandwidth - Check for data dependencies causing excessive spill/reload operations Key Metrics ~~~~~~~~~~~ This section displays tables and graphs that summarize your profile's performance metrics. Compute Performance Statistics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * **total_time** - Total duration of on-device time for the run in seconds. This doesn't include host-device data movement overhead or host runtime/framework overhead. * **mm_arithmetic_intensity** - The ratio of regular Matrix Multiplication (MATMUL) Floating Point Operations (FLOPs) to total Dynamic Random Access Memory (DRAM) transfer size. This metric helps you determine if your workload is memory-bound or compute-bound. * **hfu_estimated_percent** - Hardware FLOPs Utilization reflects the Tensor Engine utilization calculated from all Tensor Engine instructions. * **mfu_estimated_percent** - Model FLOPs Utilization reflects the Tensor Engine utilization for useful compute (matrix multiplications from your model definition). Memory Bandwidth Utilization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * **total_bandwidth_available** - The total bytes possible to be transferred within the given time region for the current Neuron hardware specification. * **mbu_estimated_percent** - Memory Bandwidth Utilization (MBU) shows the achieved (as running on the current Neuron hardware) High Bandwidth Memory (HBM) bandwidth utilization. * **average_dma_size** - The average DMA transfer size (higher is better). * **useful_read_percent** - The fraction of HBM reads that are useful (``hbm_read_bytes`` - ``hbm_reload_bytes``) / hbm_read_bytes). Note that "useful" is related to an inherent property of the memory itself, but a measurement of how efficiently the memory is being utilized by a specific workload or application. Low numbers may indicate inefficient memory access patterns and suboptimal layouts. FLOPs Utilization ^^^^^^^^^^^^^^^^^ For each compute engine (tensor, vector, scalar, gpsimd), displays how well utilized the engine is. You can view all cores simultaneously or select a specific Neuron Core from the dropdown. Tensor Engine """"""""""""" The Tensor engine has a detailed breakdown of how the FLOPs are being used: * **model_flops**: The percentage of tensor flops spent performing useful matrix operations, contributing to model progress * **transpose_flops**: The percentage of tensor flops spent performing transpose operations / data movement * **active_flops** - Percentage of tensor flops that correspond to the active_time of the tensor engine, but where the engine was not effectively utilized. * **throttled_flops (active and inactive)** - Percentage of FLOPs wasted due to throttling, either during active or inactive tensor engine periods. There are a few key things to look for in this graph: 1. **model_flops relative to active_flops**. Large differences could indicate that the tensor engine is being poorly utilized with small tensor sizes, or that operations are not being pipelined effectively. 2. **model_flops relative to transpose_flops**. It is desired to have little-to-no ``transpose_flops`` consuming tensor engine utilization. Ideally the ``model_flops`` amount is much larger than the amount of transposes. 3. **active_throttled_flops**: FLOPs lost due to throttling during active periods is undesirable. It is worth identifying the root cause for the throttling if there is indication of this happening. Other Engines (Scalar, Vector, GpSimd) """"""""""""""""""""""""""""""""""""""" These engines do not yet have detailed FLOP utilization breakdowns, they only show the active period of operation for the engine. * **active_flops** - Percentage of FLOPs when the engine processes at least one instruction (excluding semaphore waits). NKI Engine Statistics ^^^^^^^^^^^^^^^^^^^^^ This chart shows the instruction count breakdown between NKI-generated instructions and compiler-generated instructions for each compute engine (tensor, vector, scalar). The stacked bar chart helps you understand how much of your workload is running NKI kernel code versus compiler-generated code. Hovering over a bar displays a detailed breakdown of instruction counts by opcode for that engine and source type. When NKI instruction coverage is below 50% for a given engine, the summary page generates a recommendation to write NKI kernel code for the network operations in that profile section. DMA Utilization ^^^^^^^^^^^^^^^ This chart shows how the DMA engines are being utilized, displayed as a percentage of the total available bandwidth. Two dropdown menus control the chart's aggregation: * **Outer aggregation** - Choose between viewing data per DMA engine ("All Engines") or per Neuron Core ("Neuron Cores"). * **Inner aggregation** - Choose between grouping by data type or source type: * **Data Type** groups transfers into Instruction, IO, Weights, and Dynamic categories. * **Source Type** groups transfers into Static (compiler-generated), Software Dynamic (GpSimd-generated), and Hardware Dynamic (DGE hardware-generated) categories. Each category shows two bar segments: a solid bar representing bandwidth utilization and a striped bar representing active time utilization beyond the bandwidth portion. This helps distinguish between time spent transferring data and time the DMA engine is active but not fully utilizing bandwidth. Memory Bandwidth Breakdown ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Shows how the available HBM memory bandwidth was used as a doughnut chart: * HBM Read — effective read bytes (excluding spill reloads) * HBM Write — effective write bytes (excluding spill saves) * SBUF Spill Reload — bytes reloaded from HBM due to state buffer spills * SBUF Spill Save — bytes saved to HBM due to state buffer spills * Unused — remaining available bandwidth Collective Operations Duration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Displays the duration of each collective operation in the profile, grouped by operation type and size. Two visualization modes are available via a dropdown: * **Scatter** - Shows individual operation durations as scatter points, with each operation type on a separate row. Hovering over a point displays detailed information including algorithm, operation, duration, start/end timestamps, element count, input/output sizes, and trigger engine. Clicking a point pins the tooltip for easy text selection. * **Box Plot** - Shows the statistical distribution (min, Q1, median, mean, Q3, max, variance, count) of operation durations per operation type. This is useful for quickly identifying the spread and central tendency of each operation group. Both modes are useful for identifying outliers in collective runtime, which can be used to investigate specific sections of the profile more deeply. It is possible to filter out datasets by clicking on the datasets in the legend of the graph. System Information ^^^^^^^^^^^^^^^^^^ Displays metadata about the system and software versions used during profiling: * Instance Type * Compiler Version * Explorer Version * Driver Version * Runtime Version * Collectives Version System Profile Summary ====================== When a system profile is loaded, the Summary Viewer automatically switches to the System Profile Summary view. System profiles capture data across multiple devices, processes, and instances, providing a holistic view of distributed workload performance. Overview -------- The System Profile Summary provides: * A high-level overview of the entire system's profiling session * HBM memory usage trends across logical NeuronCores * A detailed table of all device profiles with key performance metrics * The ability to drill down into individual device profiles for detailed analysis System Overview Card -------------------- Displays aggregate information about the profiling session: * **Instances** - Number of unique instances captured in the profile * **Processes** - Number of unique processes captured * **System Profile Time** - Total wall-clock duration of the system profiling session * **Total Device Runtime** - Cumulative on-device execution time across all device profiles * **Total Device Profiles** - Number of individual device profiles in the system profile HBM Memory Usage Chart ----------------------- A line chart showing HBM memory usage over time. When per-NeuronCore data is available, the chart displays a separate line for each logical NeuronCore (HBM index), color-coded for easy identification. When only aggregate data is available, a single filled area chart shows total HBM usage. The x-axis shows time (in the profiling session's time domain) and the y-axis shows memory usage in bytes. Hovering over the chart displays the exact timestamp and memory usage for each NeuronCore. Device Profiles Table --------------------- A table listing all device profiles captured in the system profile. The table supports: * **Process filtering** - Use the dropdown to filter profiles by process ID, or select "All Processes" to view everything. * **Expandable rows** - Click the expand arrow on any row to see additional per-profile metrics including tensor/vector/scalar engine active time percentages, DMA active time, and HBM read/write bytes. * **Column tooltips** - Hover over column headers to see descriptions of each metric from the profile schema. Table columns: * **Profile Name** - Clickable link that navigates to the detailed device profile view * **LNC** - Logical NeuronCore ID * **Neuron Cores** - Number of physical NeuronCores used by this profile * **Total Duration** - Total on-device execution time for this profile's events * **Calls** - Number of execution events for this profile * **Duration** - Total profiled time for this device profile * **MFU** - Model FLOPs Utilization * **HFU** - Hardware FLOPs Utilization * **MBU** - Memory Bandwidth Utilization * **CC Active** - Collective communication active time percentage Device Profile Detail View -------------------------- Clicking a device profile name in the table navigates to a detail view that embeds the standard Summary Viewer for that specific device profile. This provides the full set of per-device metrics, charts, and recommendations described in the sections above. A "Back to System Overview" button at the top returns you to the system-level summary. ================================================ FILE: tools/neuron-explorer/overview-system-profiles.rst ================================================ .. meta:: :description: Learn about the System Profile in Neuron Explorer for analyzing system-level execution across instances and workers with runtime and hardware events. :date-modified: 01/30/2026 System Profile ================ The Neuron System Profile show a system-level granularity of execution across instances and workers in your workload. This provides visibility into Neuron Runtime API calls and ML framework function calls (PyTorch or JAX) to help identify bottlenecks in distributed workloads. The Neuron Explorer UI provides system-level widgets for an extensible and customizable workflow. .. image:: /tools/neuron-explorer/images/neuron-explorer-system-viewer.png System Trace Viewer --------------------- The System Trace Viewer provides an interactive timeline interface with time range selection, configurable event grouping, system event details on hover, and linking of hardware events to Device Trace Viewer widgets. You can see events in the Neuron Runtime and correlate them with hardware execution events on the Neuron Devices. .. image:: /tools/neuron-explorer/images/system-timeline-widget.png You can also see the device memory (HBM) allocations for each Neuron device over time. Hovering over these memory usage events shows a breakdown by usage category. .. image:: /tools/neuron-explorer/images/system-timeline-widget-hbm-usage.png Adding Widgets --------------- The System Profile supports both System and Device widgets, enabling multi-profile analysis, for example comparing annotated device events across different devices. To add a widget: 1. Click the **Add Widget** button to open the Add Widget modal. 2. Select a Device or System widget. 3. Click a widget tile to load it with the selected profile. Each tile is tagged with its supported profile type (system, device, or both). To load multiple instances of the same widget type for different profiles, repeat the steps above and select a different profile each time. .. image:: /tools/neuron-explorer/images/system-timeline-add-widget.gif After adding a widget, you can switch to a different profile by using the profile dropdown at the top of the widget. .. image:: /tools/neuron-explorer/images/widget_switch_profiles.png .. note:: Adding duplicate widgets for the same profile is not currently supported. Settings ---------- The System Trace Viewer supports multiple grouping modes to organize events for different analysis perspectives. You can switch between the following grouping modes in the settings to focus your analysis on different aspects of system performance: .. list-table:: Grouping Options :widths: auto :header-rows: 1 :align: left * - Grouping Option - Description - Example * - CPU vs Device Grouping (Default) - Groups events by event source (CPU or Neuron device events) - Runtime events: ``i-0b1ea78ca2865fd32/PID:1765325/TID:0/neuron_rt``, Hardware events: ``i-0b1ea78ca2865fd32/PID:1765325/Worker:0/neuron_hw`` * - NeuronCore Grouping - Groups events by individual NeuronCore - ``i-0b1ea78ca2865fd32/NC:0``, ``i-0b1ea78ca2865fd32/NC:1`` * - Thread Grouping - Groups events by thread identifier - ``i-0b1ea78ca2865fd32/PID:1765325/TID:0`` * - Process Grouping - Groups events by process identifier - ``i-0b1ea78ca2865fd32/PID:1765325`` * - Instance Grouping - Groups all events by instance only - ``i-0b1ea78ca2865fd32`` .. image:: /tools/neuron-explorer/images/system-timeline-settings.png Event Details -------------- Clicking on trace events in the timeline populates the Event Details widget with a list of properties for the system trace event. .. image:: /tools/neuron-explorer/images/system-event-details.png Device Profile Linking ------------------------ The System Trace Viewer links hardware events to the Device Trace Viewer, which renders the corresponding device traces. Navigating from the System Trace Viewer to a Device Trace Viewer can be accomplished in two ways: Open the Device Profile List Modal ------------------------------------ To see a list of all device profiles captured during your workload: 1. **Click the "Device Profiles List" button** in the top right action bar of the System Trace Viewer to open a modal containing a list of device profiles 2. **Select a Device Profile and click Submit** to open the Device Trace Viewer with the selected device profile .. image:: /tools/neuron-explorer/images/system-timeline-device-profiles-list-modal.png Drill-down from Hardware Events --------------------------------- To drill-down from a hardware event to the Device Trace Viewer: 1. Find a hardware event such as ``nc_exec_running`` 2. Click on the hardware event 3. Wait for the Device Trace Viewer to open This will open a new Device Trace Viewer with the selected device profile showing detailed hardware events. To learn about device profiles, see :doc:`Device Profiles in Neuron Explorer `. .. image:: /tools/neuron-explorer/images/system-timeline-hardware-event-linking.gif ================================================ FILE: tools/neuron-explorer/overview-tensor-viewer.rst ================================================ .. meta:: :description: Learn about the Tensor Viewer in Neuron Explorer for viewing tensor information including names, sizes, shapes, and memory usage details. :date-modified: 01/27/2026 .. _tensor-viewer-overview: Tensor Viewer ================= The Tensor Viewer contains the following information about all tensors in the NEFF file: * **variable_name** - The tensor name. * **type** - How the system uses the tensor. Examples include input tensor, output tensor, or weight tensor. * **format** - How the tensor arranges in memory. For example, "NHWC" shows a specific dimension arrangement. Letters include N (batch size), H (height), W (width), C (channel). * **shape** - The tensor's multi-dimensional shape. * **size** - The tensor's total size in bytes. * **node** - NEFF node. * **pcore_idx** - Index of the physical NeuronCore within a Logical NeuronCore (LNC). A Logical NeuronCore groups physical NeuronCores. For LNC2, this field shows either 0 or 1. * **load_to_sbuf_avg_size_bytes** - The average size in bytes of each DMA transfer when the system loads this tensor into the State Buffer. * **load_to_sbuf_total_size_bytes** - The total size in bytes of all DMA transfers when the system loads this tensor into the State Buffer. * **load_to_sbuf_dma_count** - The total number of DMAs that loaded this tensor into the State Buffer. * **load_to_sbuf_repeat_factor** - How many times the system loaded this tensor into the State Buffer. A value of 1 means one load, 2 means two loads, and so on. .. image:: /tools/profiler/images/tensor-viewer-table.png You can use this data to match with framework-level instructions or for kernel development. You can also use it to search for instructions in the Device Timeline Viewer. The SBUF loading information in the table can help you verify tensors are loaded efficiently. Searching --------- You can use the Tensor Viewer with the Device Timeline Viewer and Search tool to match tensor information in the table with instructions that run on the device. Enter the variable_name from the table, into the DMA search field to see all DMA instructions that relate to that tensor. The example below shows a complete search for the tensor token_position_to_id: .. image:: /tools/profiler/images/tensor-viewer-search-example.png ================================================ FILE: tools/neuron-explorer/view-perfetto.rst ================================================ .. meta:: :description: Learn about using Neuron Explorer with Perfetto :date-modified: 02/05/2026 Viewing Profiles with Perfetto ============================== .. note:: New Neuron Explorer features released in 2.27 and onwards may not be supported in Perfetto. For the full user experience and features set, please use the Neuron Explorer UI or VSCode Integration. Perfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data. Users of Neuron Profiler have the option of viewing their profiles in the Perfetto UI. The ``--output-format perfetto`` option writes processed data to Perfetto's native protobuf-based tracing format which can be visualized in the Perfetto UI at https://ui.perfetto.dev/. Example: .. code-block:: shell neuron-explorer view -d ./output --output-format perfetto This will generate a ``system_profile.pftrace`` file for the system profile and a ``device_profile_model_.pftrace`` file for each unique compiled model that was executed on a Neuron Device. To view the system profile, go to https://ui.perfetto.dev/ and open the ``system_profile.pftrace`` file. .. note:: When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers. |neuron-explorer-perfetto-timeline| To view a device profile go to https://ui.perfetto.dev/ and open the ``device_profile_model_.pftrace`` file. This will show a detailed view of hardware activity on the NeuronCore during execution of this graph. |neuron-explorer-perfetto-device-timeline| .. note:: Your browser may run out of memory when viewing ``*.pftrace`` (Perfetto trace) files that are more than a few hundred MB. See the section :ref:`Viewing Large Profiles in Perfetto ` for directions on how to view large traces using the trace processor. Perfetto Output View Options ---------------------------- When outputting to Perfetto it is possible to group your traces by different attributes. This is useful for larger profiles involving many NeuronCores and instances. The following options are available: .. list-table:: Perfetto output view options :header-rows: 1 :widths: 30 70 * - CLI option - Description * - ``--system-trace-primary-group`` - First-order grouping of trace events (maps to a Perfetto process / process group of rows). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``instance_id,process_id``. * - ``--system-trace-secondary-group`` - Second-order grouping of trace events (maps to a Perfetto thread / single row). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``worker_gid``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``worker_gid,lnc_idx, thread_id``. For example, the following profile uses ``neuron-explorer view --output-format=perfetto --system-trace-primary-group=instance_id,process_id --system-trace-secondary-group=lnc_idx,thread_id`` to group the system profile first by unique combinations of instance_id and process_id, and then in each of those groups there are rows of events with unique combinations of lnc_idx and thread_id. |neuron-explorer-perfetto-grouping| Grouping By Global Worker ID ---------------------------- By default, Perfetto traces are grouped by ``worker_gid`` which is a unique global identifier for each NeuronCore across all instances in a distributed workload. When clicking on an event in the trace you will see fields for both ``lnc_idx`` (local NeuronCore index on that process) and ``worker_gid`` (global NeuronCore index across all instances). It is possible for ``lnc_idx`` to be the same for different processes on the same instance or across different instances in a distributed workload. However, ``worker_gid`` is unique for each NeuronCore across all instances. The image below shows how to correlate the naming of tracks (rows) in the Perfetto UI to both ``lnc_idx`` and ``worker_gid``. |neuron-explorer-perfetto-gid| .. |neuron-explorer-perfetto-timeline| image:: /images/neuron-profiler2-perfetto-timeline.png .. |neuron-explorer-perfetto-device-timeline| image:: /images/neuron-profiler2-perfetto-device-timeline.png .. |neuron-explorer-perfetto-grouping| image:: /images/neuron-profiler2-perfetto-grouping.png .. |neuron-explorer-perfetto-gid| image:: /images/neuron-profiler2-perfetto-gid.png ================================================ FILE: tools/neuron-sys-tools/index.rst ================================================ System Tools ============ Neuron system tools provide essential utilities for monitoring, debugging, and managing AWS Neuron devices and workloads. These command-line tools offer real-time insights into device utilization, process management, hardware health, and performance metrics across Neuron instances. .. toctree:: :maxdepth: 1 :hidden: Neuron-Monitor User Guide Neuron-Top User Guide Neuron-LS User Guide Neuron-Sysfs User Guide NCCOM-TEST User Guide TensorBoard .. grid:: 1 1 2 2 :gutter: 3 .. grid-item-card:: Neuron-Monitor User Guide :link: /tools/neuron-sys-tools/neuron-monitor-user-guide :link-type: doc :class-header: sd-bg-primary sd-text-white Real-time monitoring tool for tracking NeuronCore utilization, memory usage, and thermal metrics across Neuron devices with customizable output formats. .. grid-item-card:: Neuron-Top User Guide :link: /tools/neuron-sys-tools/neuron-top-user-guide :link-type: doc :class-header: sd-bg-primary sd-text-white Interactive process viewer similar to htop that displays running processes on Neuron devices with real-time resource consumption metrics. .. grid-item-card:: Neuron-LS User Guide :link: /tools/neuron-sys-tools/neuron-ls :link-type: doc :class-header: sd-bg-primary sd-text-white Device discovery and listing tool that provides detailed information about available Neuron devices, their capabilities, and current status. .. grid-item-card:: Neuron-Sysfs User Guide :link: /tools/neuron-sys-tools/neuron-sysfs-user-guide :link-type: doc :class-header: sd-bg-primary sd-text-white Low-level system interface tool for accessing Neuron device information through the Linux sysfs filesystem interface. .. grid-item-card:: NCCOM-TEST User Guide :link: /tools/neuron-sys-tools/nccom-test :link-type: doc :class-header: sd-bg-primary sd-text-white Collective communication testing and benchmarking tool for validating and measuring performance of multi-device communication patterns. .. grid-item-card:: TensorBoard :link: /tools/tensorboard/index :link-type: doc :class-header: sd-bg-primary sd-text-white TensorBoard Neuron plugin for Trn1 instances, including installation, configuration, and advanced visualization features. .. grid-item-card:: Tutorials :link: /tools/tutorials/index :link-type: doc :class-header: sd-bg-secondary sd-text-white Tutorials for how to utilize the Neuron system tools suite. .. grid-item-card:: What's New :link: /release-notes/prev/2.27.0/index :link-type: doc :class-header: sd-bg-secondary sd-text-white Latest updates, new features, and improvements to the Neuron system tools suite. ================================================ FILE: tools/neuron-sys-tools/nccom-test.rst ================================================ .. _nccom-test: ====================== NCCOM-TEST User Guide ====================== .. contents:: Table of contents :local: :depth: 2 Overview -------- **nccom-test** is a benchmarking tool for evaluating Collective Communication operations on AWS Trainium and Inferentia instances. It supports Trn1, Trn2, Trn3, and Inf2 instance types. The tool can assess performance across multiple instances or perform quick environment sanity checks before running more complex workloads. While single-instance benchmarking is supported for all compatible instance types, multi-instance benchmarking is limited to Trainium instances (Trn1, Trn2, and Trn3). To execute collective operations, **nccom-test** will generate, and then execute, NEFFs (Neuron Executable File Format) containing several collective operation instructions. .. note:: On Inf2 instances, only single-instance benchmarking is supported. Running a multi-node nccom-test benchmark will result in an error. Using nccom-test ---------------- Here is a simple example which will run a 2 worker (ranks) all-reduce with a total size of 32MB: .. code-block:: nccom-test -r 2 allr size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 33554432 33554432 uint8 768 40.69 40.69 Avg bus bandwidth: 40.6901GB/s Output description ^^^^^^^^^^^^^^^^^^ The command will output a table containing several columns containing performance metrics. There will be a line for every requested data size (by default the data size is 32MB as seen in the previous example). .. list-table:: :widths: 40 260 :header-rows: 1 * - Column name - Description * - size(B) - Size in bytes for the data involved in this collective operation * - count(elems) - Number of elements in the data involved in this collective operation. For example, if **size(B)** is 4 and **type** is fp32, then **count** will be 1 since one single fp32 element has been processed. * - type - Data type for the processed data. Can be: **uint8**, **int8**, **uint16**, **int16**, **fp16**, **bf16**, **int32**, **uint32**, **fp32** * - time(us) - Time in microseconds representing the average of all durations for the Collective Communication operations executed during the benchmark. * - algbw(GB/s) - Algorithm bandwidth in gibibytes (1GiB = 1,073,741,824 bytes) per second which is calculated as **size(B)** / **time(us)** * - busbw(GB/s) - Bus bandwidth - bandwidth per data line in gibibytes per second - it provides a bandwidth number that is independent from the number of ranks (unlike **algbw**). For a more in-depth explanation on bus Bandwidth, please refer to `Bus Bandwidth Calculation`_ * - algorithm (optional) - Algorithm used to execute this collective operation (e.g. Ring, Mesh, RDH) * - Avg bus bandwidth - Average of the values in the busbw column .. _Bus Bandwidth Calculation: **Bus Bandwidth Calculation:** The purpose of bus bandwidth is to provide a number reflecting how optimally hardware is used, normalizing for different rank counts. Given the following: - ``r`` as the number of ranks participating in a collective operation - ``s`` as the size of the collective operation - ``B`` as the bus bandwidth of a single rank - ``t`` latency of the operation Let's take an AllGather operation as an example. To complete an AllGather operation with ``r`` ranks, each rank must transfer ``r-1`` data chunks of size ``s/r``. Therefore, with a bandwidth of ``B``, the latency (``t``) of the operation would be: .. code-block:: t = ((number of chunks to transfer) * (size of each chunk)) / (bandwidth of rank) t = ((r-1) * (s/r)) / B However, for a given collective operation result, we have the latency, but not the bandwidth of each rank. Rearranging to solve for bus bandwidth, we get: .. code-block:: B = ((r-1) * (s/r)) / t which, given ``algbw = s / t``, can also be rewritten as: .. code-block:: B = ((r-1) / r) * algbw Using this formula, we can calculate the bus bandwidth, ``B``, for an AllGather collective operation among ``r`` ranks with size ``s`` that took ``t`` seconds. We can now directly compare the calculated bus bandwidth to the actual hardware bandwidth to see how well the hardware is being utilized. For different operations that transfer a different number of chunks, the bandwidth calculation changes slightly, with our algbw factor ``(r-1) / r`` changing depending on the collective operation: .. list-table:: :widths: 40 40 :header-rows: 1 * - Collective Operation - Bus Bandwidth Factor * - All-Reduce - ``(2 * (r-1)) / r`` * - All-Gather - ``(r-1) / r`` * - Reduce-Scatter - ``(r-1) / r`` * - Send-Receive - 1 * - All-to-All - ``(r-1) / r`` * - Permute - 1 * - All-to-Allv - ``(r-1) / r`` CLI arguments ^^^^^^^^^^^^^ Required Arguments: ~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - - N/A, required argument - The type of Collective Communication operation to execute for this benchmark. Supported types: - ``all_reduce`` / ``allr``: All-Reduce - ``all_gather`` / ``allg``: All-Gather - ``reduce_scatter`` / ``redsct``: Reduce-Scatter - ``sendrecv``: Send-Receive - ``alltoall``: All-to-All - ``permute``: Permute - ``alltoallv``: All-to-Allv (Currently only supported for inter-node configurations) * - ``-r, --nworkers`` - N/A, required argument - Total number of workers (ranks) to use Benchmark Configuration: ~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``-N, --nnodes`` - 1 - Total number of nodes (instances) to use. The number of workers will be divided equally across all nodes. If this argument is greater than 1, `MPI Execution`_ or `Slurm Execution`_ will need to be used. * - ``-b, --minbytes`` - 32M - The starting size for the benchmark * - ``-e, --maxbytes`` - 32M - The end size for the benchmark. **nccom-test** will run benchmarks for all sizes between ``-b, --minbytes`` and ``-e, --maxbytes``, increasing the size by either ``-i, --stepbytes`` or ``--f, --stepfactor`` with every run. * - ``-i, --stepbytes`` - (``--maxbytes`` - ``--minbytes``) / 10 - Amount of bytes with which to increase the benchmark's size on every subsequent run. For example, for this combination of arguments: ``-b 8 -e 16 -i 4``, the benchmark will be ran for the following sizes: 8 bytes, 12 bytes, 16 bytes. * - ``-f, --stepfactor`` - N/A - Factor with which to increase the benchmark's size on every subsequent run. For example, for this combination of argument values: ``-b 8 -e 32 -f 2``, the benchmark will be ran for the following sizes: 8 bytes, 16 bytes, 32 bytes. .. note:: All arguments that take a size in bytes will also accept larger size units, for example: ``-f 2048`` can be written as ``-f 2kb`` or ``-f 1048576`` can be written as ``-f 1MB``. Iteration Configuration: ~~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``-n, --iters`` - 20 - Number of Collective Communication operations to execute during the benchmark. * - ``-w, --warmup_iters`` - 5 - Number of Collective Communication operations to execute as warmup during the benchmark. The warmup operations will execute prior to any of the measured operations and their performance will be not be used calculate the reported statistics. * - ``-I, --neff_iters`` - N/A - Number of times to execute the NEFF with Collective Communication operations during the benchmark. * - ``-W, --neff_warmup_iters`` - N/A - Number of times to execute the NEFF with Collective Communication operations as warmup during the benchmark. All collective operations in a warmup NEFF execution will be ignored when calculating statistics. To execute collective operations, ``nccom-test`` will generate, and then execute, NEFFs (Neuron Executable File Format) containing several collective operation instructions. The above flags control how many collective operations are generated, run, and measured. There are two primary modes for controlling the number of collective operations run: 1. If neither the ``neff_iters`` nor the ``neff_warmup_iters`` flag is supplied, ``iters + warmup_iters`` will be treated as the desired total number of operations to be run. If necessary, ``nccom-test`` will spread this total number of operations out across several NEFFs. 2. If the user desires more control over how collectives operation execution should be organized, they should use the ``neff_iters`` and ``neff_warmup_iters`` flags. When these flags are used, the ``iters`` and the ``warmup_iters`` flags now represent the number of operations in a single NEFF. The NEFF itself will be repeatedly run ``neff_iters + neff_warmup_iters`` times. Examples: - ``-n 15``, ``-w 5``, ``-I 10``, would result in 200 Collective Communication operations being run with 150 being measured: The generated NEFF will have 20 (15 measured, 5 warmup) ops and the NEFF will be run 10 times. - ``-n 15``, ``-w 5``, ``-I 10``, ``-W 5``, would result in 300 Collective Communication operations being run with 150 being measured: The generated NEFF will have 20 (15 measured, 5 warmup) ops and the NEFF will be run 15 (10 measured, 5 warmup) times Input/Output Data: ~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``-d, --datatype`` - ``uint8`` - Data type for the data used by the benchmark. Supported types: ``uint8``, ``int8``, ``uint16``, ``int16``, ``fp16``, ``bf16``, ``uint32``, ``int32``, ``fp32``. Input data will be zero filled, unless ``--check`` is provided in which case it will be filled with either pseudo-random data or ones. * - ``-c, --check`` - N/A - If provided, validates correctness of the operations. Can additionally specify options: ``random`` (default) or ``all_ones``. For an explanation of these options, see `Data Integrity`_. This will not impact device execution time and collective operation performance (time, algbw, and busbw), but will slightly increase the overall execution time. * - ``--seed`` - N/A - Seed to use while generating pseudo-random data for ``random`` correctness check with ``--check`` flag * - ``--unique-buff`` - false - Use a unique buffer for the input and output of every collective operation. When using this flag, each collective operation in a NEFF will use a different in-memory input/output buffer than every other operation. For All-Gather operations run with certain algorithms (e.g. Mesh, RDH), there is additional handshaking for output buffers, and using unique buffers may improve collective operation performance. * - ``--coalesced-cc-size-ratio`` - N/A - List representing the ratio with which to split the input tensor into multiple tensors for coalesced, collective operations. Given a size of ``4MB`` and a ``coalesced-cc-size-ratio`` of ``[1,2,1]``, each collective operation would actually consist of 3 parallel, coalesced operations of sizes: ``1MB``, ``2MB``, and ``1MB``. * - ``--shared-output-buff`` - false - For the CC operation, use a single, shared, HBM output buffer between 2 neuron cores in the same HBM domain. * - ``--alltoallv-metadata`` - N/A - For ``alltoallv`` collective operation, a ``json`` file containing send counts, send displacements, receive counts, and receive displacements for the collective operation. Counts specify number of elements to send/receive between ranks, displacements specify where in buffer to send/receive data. Length of count and displacement arrays should equal size of replica group over which ``alltoallv`` collective operation is performed. If one metadata entry is provided, it applies to all ranks, otherwise, specify one entry per rank. `AlltoAllV Example`_. .. _Data Integrity: Data Integrity: If the ``--check`` flag is provided when running ``nccom-test``, the correctness of the CC operations will be verified. There are currently two modes for verification: ``random`` (the default used when only ``--check`` is provided) and ``all_ones``. 1. The ``random`` mode will fill each input tensor with pseudo-random data and then, on the CPU, calculate a expected golden output. After collective operation execution, the output tensor of the operation will be compared against the calculated golden tensor. For non-integral types (e.g. ``fp16``, ``fp32``), golden comparison will use tolerances. For operations in which all participating ranks should finish with identical outputs (e.g ``allr``, ``allg``), there will also be a check between ranks to ensure this. If the ``random`` check fails, input, output, and golden tensors will be saved to disk for further investigation. The ``--seed`` flag can be used to set the seed for the pseudo-random input tensor generation. Otherwise, the seed value will be based on the current time and logged. 2. The ``all_ones`` mode will fill each input tensor with the value ``1``. A single, golden value\: ``G``, will be calculated based on the operation. For example, the golden value\: ``G`` for an All-Reduce with 16 ranks will be ``16``. After operation execution, ``nccom-test`` will verify each output tensor is filled with ``G``. ``random`` mode should be preferred for more rigorous verification. However, for quicker, more easily understood verification, ``all_ones`` should be preferred. .. _MPI Execution: MPI Execution: ~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``-s, --hosts`` - N/A - Hosts on which to run execution. * - ``--hosts-file`` - N/A - File containing hosts on which to run execution. One host specified per line. * - ``--mpi-log-dir`` - N/A - If specified, logs from each node in ``mpi`` multi-node benchmark will be saved to a unique file within the specified directory To use ``mpi`` mode, provide all hosts for your invocation, either with the ``--hosts`` flag or a ``~/hosts`` file, and set the ``NEURON_RT_ROOT_COMM_ID`` environment variable to the IP address of the first host listed and any free port. Depending on your environment, ``mpi`` may require passwordless SSH access to each host in your invocation. See the `Open MPI SSH documentation `_ for details. Example: ``NEURON_RT_ROOT_COMM_ID=10.1.4.145:45654 nccom-test -r 64 -N 2 -d fp32 allr --hosts 10.1.4.145 10.1.4.138`` The above command will invoke a ``neuron-bench`` process on both hosts listed, to execute the collective operations, using 32 ranks from each host. Latency data will be reported back from each host and collected on the host on which the ``nccom-test`` command was invoked. The host on which the ``nccom-test`` command is invoked should usually be one of the provided hosts, but it can be another unrelated host, as long as it can invoke MPI processes on the provided hosts. .. _Slurm Execution: Slurm Execution: ~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``-S, --slurm-mode`` - false - Use ``srun`` to run benchmark on ``slurm``-based cluster * - ``-u, --slurm-vcpus-per-node`` - Minimum CPU count amongst all nodes - Number of vCPUs available per node in ``slurm`` allocation * - ``--slurm-setup-script`` - N/A - Script to run on each node in ``slurm`` allocation before executing benchmark. Can use ``default`` to run a default script installing the latest Neuron software. * - ``--slurm-job-id`` - alloc - Specify jobId for ``slurm`` allocation to execute benchmark on. By default, will create a new allocation to execute benchmark on. * - ``--slurm-use-head-node-neuron-bench`` - false - Copy ``neuron-bench`` binary from head node to all nodes in allocation To use ``slurm`` mode, specify the ``--slurm-mode`` flag. When using slurm mode, ``nccom-test`` invocations should be run from the head node of the slurm cluster. Users can either use an existing slurm job by providing a job id, or have ``nccom-test`` allocate one for you. Additionally, users can provide a path to a setup script to run on each slurm node before execution. Users can alternatively specify ``default`` to use a supplied default setup script. Examples: ``nccom-test -r 64 -N 2 allr --slurm-mode --slurm-setup-script path/to/my/custom-setup-script.sh`` The above command will execute collective operation across two nodes using slurm. Slurm will allocate a job with two nodes before beginning execution and will run the ``custom-setup-script.sh`` on each node before executing any collective operations. ``nccom-test -r 64 -N 2 allr --slurm-mode --slurm-job-id 12345`` The above command will use an existing slurm allocation (``jobId: 12345``) with no setup. Output: ~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``--non-interactive`` - false - Do not display any animation or progress indicator. * - ``--report-to-json-file`` - N/A - Persist config and results to specified JSON file if a filepath is provided. * - ``-t, --stats`` - avg - Latency (time) statistics to display in the final output. Currently supports ``avg`` and any percentile (e.g ``p15``, ``p50``, ``p90``). * - ``--show-algorithm`` - false - Show which algorithm (e.g. Ring, Mesh, RDH) was used to execute the collective operation in ``nccom-test`` output. Currently, any hierarchical algorithms used will be displayed as ``hier``, and will not include any sub-algorithms. * - ``--show-input-output-size`` - false - Print or save to JSON per rank input and output sizes in B. * - ``--debug`` - false - Show debug logs from execution of ``nccom-test`` and ``neuron-bench`` in realtime. Enables ``non-interactive`` mode implicitly. SBUF Collectives: ~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``--sb2sb`` - false - Indicates whether to allocate input, output, and scratch-buffer on SBUF (rather than HBM). This may result in improved performance. * - ``--input-shape`` - N/A - Provide input tensor dimensions in format: ``[step0,step1][num_elem0,num_elem1]``. ``step0/num_elem0`` correspond to the free dimension of the SBUF, while ``step1/num_elem1`` correspond to the partition dimension of the SBUF. * - ``--output-shape`` - N/A - Provide output tensor dimensions in format: ``[step0,step1][num_elem0,num_elem1]``. ``step0/num_elem0`` correspond to the free dimension of the SBUF, while ``step1/num_elem1`` correspond to the partition dimension of the SBUF. * - ``--cc-dim`` - 1 - Control dimensions of tensor concatenation. Either concatenate tensor in free dimension (``cc-dim = 0``) or concatenate in partition dimension first and wrap around in free dimension second (``cc-dim = 1``) Replica Group: ~~~~~~~~~~~~~~ Flags to control which subset of ranks a collective operation will be executed on. .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``--data-parallel-dimension`` - N/A - Run the given collective operation in parallel across multiple sub-groups of size ``data-parallel-dimension``. For 128 ranks and data parallel dimension of 2, there would be 64 parallel collective operations happening at the same time, each with 2 ranks. Primarily intended for multi-node executions with one-rank-per-node replica groups. * - ``--custom-replica-group`` - N/A - Provide the JSON file for custom-defined replica groups. * - ``--custom-src-target-pairs`` - N/A - Provide the JSON file for custom-defined source_target_pairs for the collective permute operation. Additional Flags: ~~~~~~~~~~~~~~~~~ .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Argument - Default value - Description * - ``--vcpu-pin-mode`` - false - Pin CPU thread for each rank to a given CPU. * - ``--data-collector-port`` - 60006 - If running ``nccom-test`` in multi-node mode or on another node, a data collector is used to gather latencies from all nodes in benchmark. Port to use for data collector. * - ``--data-collector-host`` - current host - Hostname or IP address of node to use as data collector, all latencies from other nodes will be sent to this host Environment Variables ^^^^^^^^^^^^^^^^^^^^^ In addition to CLI arguments, there are also several environment variables which can be used to alter how collectives run inside ``nccom-test`` .. list-table:: :widths: 40 80 260 :header-rows: 1 * - Environment Variable - Default value - Description * - ``NEURON_LOGICAL_NC_CONFIG`` - 2 for ``trn2`` and ``trn3``. 1 for ``inf2`` and ``trn1`` - Controls how many physical NeuronCores are grouped to make up a logical NeuronCore. Users may also find certain Neuron Runtime environment variables useful with ``nccom-test`` executions. See :ref:`nrt-configuration` Examples ^^^^^^^^ .. note:: Performance data shown in these examples should not be considered up-to-date. For the latest performance data, please refer to the performance section. Single Instance Examples ~~~~~~~~~~~~~~~~~~~~~~~~ - Quick environment validation .. code-block:: nccom-test -r 2 allr size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 33554432 33554432 uint8 768 40.69 40.69 Avg bus bandwidth: 40.6901GB/s If a problem was found, it can be reported in two possible ways: - Immediately: .. code-block:: nccom-test -r 2 allr Neuron DKMS Driver is not running! Read the troubleshooting guide at: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-runtime/nrt-troubleshoot.html#neuron-driver-installation-fails - After a benchmark attempt: .. code-block:: nccom-test -r 2 allr size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 33554432 Failure running neuron-bench - log file /tmp/nccom_test_log_7pqpdfjf.log 1 errors found - test failed In this case, further information about the error can be found in the ``neuron-bench`` log file. - 2 rank all-reduce on a single instance for sizes ranging from 1MiB to 1GiB with a step of 4x .. code-block:: nccom-test -r 2 --minbytes 1kb --maxbytes 1gb --stepfactor 4 --datatype fp32 allr size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 1024 256 fp32 58 0.02 0.02 4096 1024 fp32 58 0.07 0.07 16384 4096 fp32 58 0.26 0.26 65536 16384 fp32 58 1.05 1.05 262144 65536 fp32 60 4.07 4.07 1048576 262144 fp32 68 14.36 14.36 4194304 1048576 fp32 107 36.51 36.51 16777216 4194304 fp32 332 47.06 47.06 67108864 16777216 fp32 1214 51.48 51.48 268435456 67108864 fp32 4750 52.63 52.63 1073741824 268435456 fp32 18930 52.83 52.83 Avg bus bandwidth: 23.6671GB/s - 32 rank all-gather on a single instance for sizes ranging from 1KiB to 1MiB with a step of 8x, with correctness checking .. code-block:: nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 1024 256 fp32 151 0.01 0.01 8192 2048 fp32 149 0.05 0.05 65536 16384 fp32 150 0.41 0.39 524288 131072 fp32 179 2.73 2.64 Avg bus bandwidth: 0.7731GB/s - Specify the custom source target pairs as a JSON file for the collective permute operator ``--custom-src-target-pairs``. .. code-block:: nccom-test -r 8 --custom-src-target-pairs pairs.json permute size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) 33554432 33554432 uint8 894.24 37.52 37.52 Avg bus bandwidth: 37.5230GB/s cat pairs.json { "src_target_pairs": [ [ [0, 1], [1, 0], [2, 3], [3, 2], [4, 4], [5, 5], [6, 6], [7, 7] ] ] } - Reporting the input and output size explicitly with ``--show-input-output-size``. .. code-block:: nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg --show-input-output-size size(B) count(elems) total_input_size(B) total_output_size(B) type time:avg(us) algbw(GB/s) busbw(GB/s) 1024 256 32 1024 fp32 6.16 0.17 0.16 8192 2048 256 8192 fp32 6.48 1.26 1.23 65536 16384 2048 65536 fp32 8.17 8.02 7.77 524288 131072 16384 524288 fp32 23.16 22.64 21.93 Avg bus bandwidth: 7.7715GB/s - Getting percentile latency results with ``--stats`` .. code-block:: nccom-test -r 8 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --stats avg p25 p50 p90 p99 --iters 1000 allg size(B) count(elems) type time:avg(us) time:p25(us) time:p50(us) time:p90(us) time:p99(us) algbw(GB/s) busbw(GB/s) 1024 256 fp32 10.0 10 10 11 12 0.10 0.09 8192 2048 fp32 10.22 10 10 11 12 0.80 0.70 65536 16384 fp32 11.31 11 11 13 13 5.80 5.07 524288 131072 fp32 14.83 14 15 16 17 35.34 30.92 Avg bus bandwidth: 9.1966GB/s - Example results as JSON with ``--report-to-json-file`` .. code-block:: nccom-test -r 32 --minbytes 1kb --maxbytes 1mb --stepfactor 8 --datatype fp32 --check allg --report-to-json-file nccom-results.json size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) 1024 256 fp32 6.19 0.17 0.16 8192 2048 fp32 6.55 1.25 1.21 65536 16384 fp32 8.18 8.01 7.76 524288 131072 fp32 23.11 22.69 21.98 Avg bus bandwidth: 7.7775GB/s python3 -m json.tool nccom-results.json { "results": [ { "size(B)": 1024, "count(elems)": 256, "type": "fp32", "algbw(GB/s)": 0.16553675170497603, "busbw(GB/s)": 0.16036372821419553, "time:avg(us)": 6.19 }, { "size(B)": 8192, "count(elems)": 2048, "type": "fp32", "algbw(GB/s)": 1.2500906056270864, "busbw(GB/s)": 1.21102527420124, "time:avg(us)": 6.55 }, { "size(B)": 65536, "count(elems)": 16384, "type": "fp32", "algbw(GB/s)": 8.008982241741455, "busbw(GB/s)": 7.758701546687035, "time:avg(us)": 8.18 }, { "size(B)": 524288, "count(elems)": 131072, "type": "fp32", "algbw(GB/s)": 22.688776793562784, "busbw(GB/s)": 21.97975251876395, "time:avg(us)": 23.11 } ] } - Example results with ``--show-algorithm`` flag .. code-block:: nccom-test -r 16 allr -b 4 -e 1gb -f 16 -d fp32 --show-algorithm size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) algorithm 4 1 fp32 299.91 0.00 0.00 mesh 32 8 fp32 299.69 0.00 0.00 mesh 512 128 fp32 299.82 0.00 0.00 mesh 8192 2048 fp32 299.74 0.03 0.05 mesh 131072 32768 fp32 574.15 0.23 0.43 mesh 2097152 524288 fp32 686.32 3.06 5.73 rdh 33554432 8388608 fp32 2754.15 12.18 22.84 kangaring 536870912 134217728 fp32 9689.51 55.41 103.89 kangaring Avg bus bandwidth: 16.6181GB/s Multiple Instances Example ~~~~~~~~~~~~~~~~~~~~~~~~~~ - 64 rank all-reduce on two instances for sizes ranging from 8 bytes to 1GiB with a step of 2x, running 50 ops .. code-block:: NEURON_RT_ROOT_COMM_ID=10.1.4.145:45654 nccom-test -r 64 -N 2 -b 8 -e 1GB -f 2 -n 50 -w 5 -d fp32 allr --hosts 127.0.0.1 10.1.4.138 size(B) count(elems) type time(us) algbw(GB/s) busbw(GB/s) 8 2 fp32 520 0.00 0.00 16 4 fp32 520 0.00 0.00 32 8 fp32 523 0.00 0.00 64 16 fp32 525 0.00 0.00 128 32 fp32 553 0.00 0.00 256 64 fp32 709 0.00 0.00 512 128 fp32 782 0.00 0.00 1024 256 fp32 840 0.00 0.00 2048 512 fp32 881 0.00 0.00 4096 1024 fp32 916 0.00 0.01 8192 2048 fp32 1013 0.01 0.01 16384 4096 fp32 1031 0.01 0.03 32768 8192 fp32 1174 0.03 0.05 65536 16384 fp32 1315 0.05 0.09 131072 32768 fp32 1315 0.09 0.18 262144 65536 fp32 1311 0.19 0.37 524288 131072 fp32 1312 0.37 0.73 1048576 262144 fp32 1328 0.74 1.45 2097152 524288 fp32 1329 1.47 2.89 4194304 1048576 fp32 1378 2.83 5.58 8388608 2097152 fp32 1419 5.51 10.84 16777216 4194304 fp32 2138 7.31 14.39 33554432 8388608 fp32 2711 11.53 22.69 67108864 16777216 fp32 3963 15.77 31.05 134217728 33554432 fp32 6279 19.91 39.19 268435456 67108864 fp32 11954 20.91 41.17 536870912 134217728 fp32 21803 22.93 45.15 1073741824 268435456 fp32 41806 23.92 47.09 Avg bus bandwidth: 9.3924GB/s .. _AlltoAllV Example: - Specify alltoallv-metadata as JSON for ``alltoallv`` operation ``--alltoallv-metadata``. .. code-block:: NEURON_RT_ROOT_COMM_ID=172.32.137.79:44444 nccom-test -r 2 -N 2 -d fp32 alltoallv -b 1MB -e 1MB --hosts 127.0.0.1 172.32.253.16 --alltoallv-metadata alltoallv_metadata.json size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) 1048608 262152 fp32 955.05 1.10 0.55 Avg bus bandwidth: 0.5490GB/s cat alltoallv_metadata.json { "alltoallv_metadata": [ { "send_counts": [512, 1024], "send_displs": [0, 512], "recv_counts": [256, 768], "recv_displs": [0, 256] } ] } ================================================ FILE: tools/neuron-sys-tools/neuron-ls.rst ================================================ .. _neuron-ls-ug: Neuron LS User Guide --------------------- The neuron-ls command is a tool for managing Neuron devices in your instance. This command serves two key purposes: it identifies all Neuron devices present in the current instance and provides information about the processes running on each device along with the command that launched that process. To use this command, simply type ``neuron-ls`` in your terminal. .. rubric:: neuron-ls CLI .. code-block:: text neuron-ls [options] **Options** ``--wide, -w`` Displays the table in a wider format. ``--show-all-procs, -a`` Show all processes using the Neuron Devices, including processes that aren't using Neuron Runtime 2.x such as ``neuron-monitor`` or ``neuron-ls`` itself. ``--topology, -t`` Display topology information about the system's Neuron Devices. ``--json-output, -j`` Output in JSON format. .. note:: ``neuron-ls`` fully supports the newly launched Trn2 instances. Examples ^^^^^^^^ ``neuron-ls`` is compatible with all Neuron instance types: inf1, inf2, trn1 and trn2. These are a few examples on running the tool on a trn2n.48xlarge: :: $ neuron-ls instance-type: trn2n.48xlarge instance-id: i-aabbccdd123456789 logical-neuroncore-config: 2 +--------+--------+----------+--------+---------------+--------------+---------------+------+ | NEURON | NEURON | NEURON | NEURON | CONNECTED | PCI | CPU | NUMA | | DEVICE | CORES | CORE IDS | MEMORY | DEVICES | BDF | AFFINITY | NODE | +--------+--------+----------+--------+---------------+--------------+---------------+------+ | 0 | 4 | 0-3 | 96 GB | 12, 3, 4, 1 | 0000:cc:00.0 | 48-95,144-191 | 1 | | 1 | 4 | 4-7 | 96 GB | 13, 0, 5, 2 | 0000:b5:00.0 | 48-95,144-191 | 1 | | 2 | 4 | 8-11 | 96 GB | 14, 1, 6, 3 | 0000:b6:00.0 | 48-95,144-191 | 1 | | 3 | 4 | 12-15 | 96 GB | 15, 2, 7, 0 | 0000:cb:00.0 | 48-95,144-191 | 1 | | 4 | 4 | 16-19 | 96 GB | 0, 7, 8, 5 | 0000:6f:00.0 | 0-47,96-143 | 0 | | 5 | 4 | 20-23 | 96 GB | 1, 4, 9, 6 | 0000:58:00.0 | 0-47,96-143 | 0 | | 6 | 4 | 24-27 | 96 GB | 2, 5, 10, 7 | 0000:59:00.0 | 0-47,96-143 | 0 | | 7 | 4 | 28-31 | 96 GB | 3, 6, 11, 4 | 0000:6e:00.0 | 0-47,96-143 | 0 | | 8 | 4 | 32-35 | 96 GB | 4, 11, 12, 9 | 0000:9b:00.0 | 0-47,96-143 | 0 | | 9 | 4 | 36-39 | 96 GB | 5, 8, 13, 10 | 0000:84:00.0 | 0-47,96-143 | 0 | | 10 | 4 | 40-43 | 96 GB | 6, 9, 14, 11 | 0000:85:00.0 | 0-47,96-143 | 0 | | 11 | 4 | 44-47 | 96 GB | 7, 10, 15, 8 | 0000:9a:00.0 | 0-47,96-143 | 0 | | 12 | 4 | 48-51 | 96 GB | 8, 15, 0, 13 | 0000:f8:00.0 | 48-95,144-191 | 1 | | 13 | 4 | 52-55 | 96 GB | 9, 12, 1, 14 | 0000:e1:00.0 | 48-95,144-191 | 1 | | 14 | 4 | 56-59 | 96 GB | 10, 13, 2, 15 | 0000:e2:00.0 | 48-95,144-191 | 1 | | 15 | 4 | 60-63 | 96 GB | 11, 14, 3, 12 | 0000:f7:00.0 | 48-95,144-191 | 1 | +--------+--------+----------+--------+---------------+--------------+---------------+------+ :: $ neuron-ls --wide instance-type: trn2n.48xlarge instance-id: i-aabbccdd123456789 logical-neuroncore-config: 2 +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+ | NEURON | NEURON | NEURON | CONNECTED | PCI | PID | COMMAND | RUNTIME | | DEVICE | CORES | MEMORY | DEVICES | BDF | | | VERSION | +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+ | 0 | 4 | 96 GB | 12, 3, 4, 1 | cc:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 1 | 4 | 96 GB | 13, 0, 5, 2 | b5:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 2 | 4 | 96 GB | 14, 1, 6, 3 | b6:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 3 | 4 | 96 GB | 15, 2, 7, 0 | cb:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 4 | 4 | 96 GB | 0, 7, 8, 5 | 6f:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 5 | 4 | 96 GB | 1, 4, 9, 6 | 58:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 6 | 4 | 96 GB | 2, 5, 10, 7 | 59:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 7 | 4 | 96 GB | 3, 6, 11, 4 | 6e:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 8 | 4 | 96 GB | 4, 11, 12, 9 | 9b:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 9 | 4 | 96 GB | 5, 8, 13, 10 | 84:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 10 | 4 | 96 GB | 6, 9, 14, 11 | 85:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 11 | 4 | 96 GB | 7, 10, 15, 8 | 9a:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 12 | 4 | 96 GB | 8, 15, 0, 13 | f8:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 13 | 4 | 96 GB | 9, 12, 1, 14 | e1:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 14 | 4 | 96 GB | 10, 13, 2, 15 | e2:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | | 15 | 4 | 96 GB | 11, 14, 3, 12 | f7:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --warmup none --fixed-instance-count 64 --... | 2.0.0 | +--------+--------+--------+---------------+---------+--------+----------------------------------------------------------------------------------+---------+ :: $ neuron-ls --show-all-procs instance-type: trn2n.48xlarge instance-id: i-aabbccdd123456789 logical-neuroncore-config: 2 +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | NEURON | NEURON | NEURON | CONNECTED | PCI | PID | COMMAND | RUNTIME | | DEVICE | CORES | MEMORY | DEVICES | BDF | | | VERSION | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 0 | 4 | 96 GB | 12, 3, 4, 1 | cc:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 1 | 4 | 96 GB | 13, 0, 5, 2 | b5:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 2 | 4 | 96 GB | 14, 1, 6, 3 | b6:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 3 | 4 | 96 GB | 15, 2, 7, 0 | cb:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 4 | 4 | 96 GB | 0, 7, 8, 5 | 6f:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 5 | 4 | 96 GB | 1, 4, 9, 6 | 58:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 6 | 4 | 96 GB | 2, 5, 10, 7 | 59:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 7 | 4 | 96 GB | 3, 6, 11, 4 | 6e:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 8 | 4 | 96 GB | 4, 11, 12, 9 | 9b:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 9 | 4 | 96 GB | 5, 8, 13, 10 | 84:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 10 | 4 | 96 GB | 6, 9, 14, 11 | 85:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 11 | 4 | 96 GB | 7, 10, 15, 8 | 9a:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 12 | 4 | 96 GB | 8, 15, 0, 13 | f8:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 13 | 4 | 96 GB | 9, 12, 1, 14 | e1:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 14 | 4 | 96 GB | 10, 13, 2, 15 | e2:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ | 15 | 4 | 96 GB | 11, 14, 3, 12 | f7:00.0 | 268911 | neuron-bench exec --run-as-cc-neff --... | 2.0.0 | | | | | | | 269192 | neuron-ls --show-all-procs | NA | +--------+--------+--------+---------------+---------+--------+------------------------------------------+---------+ :: $ neuron-ls --topology instance-type: trn2n.48xlarge instance-id: i-aabbccdd123456789 logical-neuroncore-config: 2 +--------+--------+--------+---------------+---------+ | NEURON | NEURON | NEURON | CONNECTED | PCI | | DEVICE | CORES | MEMORY | DEVICES | BDF | +--------+--------+--------+---------------+---------+ | 0 | 4 | 96 GB | 12, 3, 4, 1 | cc:00.0 | | 1 | 4 | 96 GB | 13, 0, 5, 2 | b5:00.0 | | 2 | 4 | 96 GB | 14, 1, 6, 3 | b6:00.0 | | 3 | 4 | 96 GB | 15, 2, 7, 0 | cb:00.0 | | 4 | 4 | 96 GB | 0, 7, 8, 5 | 6f:00.0 | | 5 | 4 | 96 GB | 1, 4, 9, 6 | 58:00.0 | | 6 | 4 | 96 GB | 2, 5, 10, 7 | 59:00.0 | | 7 | 4 | 96 GB | 3, 6, 11, 4 | 6e:00.0 | | 8 | 4 | 96 GB | 4, 11, 12, 9 | 9b:00.0 | | 9 | 4 | 96 GB | 5, 8, 13, 10 | 84:00.0 | | 10 | 4 | 96 GB | 6, 9, 14, 11 | 85:00.0 | | 11 | 4 | 96 GB | 7, 10, 15, 8 | 9a:00.0 | | 12 | 4 | 96 GB | 8, 15, 0, 13 | f8:00.0 | | 13 | 4 | 96 GB | 9, 12, 1, 14 | e1:00.0 | | 14 | 4 | 96 GB | 10, 13, 2, 15 | e2:00.0 | | 15 | 4 | 96 GB | 11, 14, 3, 12 | f7:00.0 | +--------+--------+--------+---------------+---------+ Neuron Device Topology * * * * │ │ │ │ ▼ ▼ ▼ ▼ *––►[ 0 ]◄––►[ 1 ]◄––►[ 2 ]◄––►[ 3 ]◄––* ▲ ▲ ▲ ▲ │ │ │ │ ▼ ▼ ▼ ▼ *––►[ 4 ]◄––►[ 5 ]◄––►[ 6 ]◄––►[ 7 ]◄––* ▲ ▲ ▲ ▲ │ │ │ │ ▼ ▼ ▼ ▼ *––►[ 8 ]◄––►[ 9 ]◄––►[10 ]◄––►[11 ]◄––* ▲ ▲ ▲ ▲ │ │ │ │ ▼ ▼ ▼ ▼ *––►[12 ]◄––►[13 ]◄––►[14 ]◄––►[15 ]◄––* ▲ ▲ ▲ ▲ │ │ │ │ * * * * Legend: *––► = Wrap-around link :: $ neuron-ls -j [ { "neuron_device": 0, "bdf": "cc:00.0", "cpu_affinity": "48-95,144-191", "numa_node": "1", "connected_to": [ 12, 3, 4, 1 ], "nc_count": 4, "logical_neuroncore_config": 2, "memory_size": 103079215104, "neuroncore_ids": [ 0, 1, 2, 3 ], "neuron_processes": [ { "pid": 113985, "command": "neuron-bench exec --run-as-cc-neff --...", "neuron_runtime_version": "2.0.0" } ] }, ... { "neuron_device": 15, "bdf": "f7:00.0", "cpu_affinity": "48-95,144-191", "numa_node": "1", "connected_to": [ 11, 14, 3, 12 ], "nc_count": 4, "logical_neuroncore_config": 2, "memory_size": 103079215104, "neuroncore_ids": [ 60, 61, 62, 63 ], "neuron_processes": [ { "pid": 113985, "command": "neuron-bench exec --run-as-cc-neff --...", "neuron_runtime_version": "2.0.0" } ] } ] Field Definitions ^^^^^^^^^^^^^^^^^ - instance-type: Type of instance on which neuron-ls is running. - instance-id: EC2 ID of the instance on which neuron-ls is running. - logical-neuroncore-config: (only available on trn2 instances) the current logical NeuronCore configuration; for more information refer to :ref:`logical-neuroncore-config` - NEURON DEVICE / neuron_device: Logical ID assigned to the Neuron Device. - NEURON CORES / nc_count: Number of NeuronCores present in the Neuron Device. - NEURON CORE IDS / neuroncore_ids: Range or list of individual NeuronCore IDs belonging to the device, used with ``NEURON_RT_VISIBLE_CORES`` for selective core usage. - NEURON MEMORY / memory_size: Amount DRAM memory in Neuron Device. - CONNECTED DEVICES / connected_to: Logical ID of Neuron Devices connected to this Neuron Device. - PCI BDF / bdf: PCI Bus Device Function (BDF) ID of the device. - CPU AFFINITY / cpu_affinity: CPU cores that per NeuronCore proxy threads are pinned to - NUMA NODE / numa_node: NUMA (Non-Uniform Memory Access) node associated with the Neuron Device - PID / pid: ID of the process using this Neuron Device. - COMMAND / command: Command used to launch the process using this Neuron Device. - RUNTIME VERSION / neuron_runtime_version: Version of Neuron Runtime (if applicable) for the application using this Neuron Device. ================================================ FILE: tools/neuron-sys-tools/neuron-monitor-user-guide.rst ================================================ .. _neuron-monitor-ug: Neuron Monitor User Guide ========================= .. contents:: Table of contents :local: :depth: 2 Overview -------- **neuron-monitor** collects metrics and stats from the Neuron Applications running on the system and streams the collected data to ``stdout`` in ``JSON`` format. It is provided as part of the ``aws-neuron-tools`` package. These metrics and stats are organized into **metric groups** which can be configured by providing a configuration file as described in :ref:`using-neuron-monitor` When running, **neuron-monitor** will: - Collect the data for the metric groups which, based on the elapsed time since their last update, need to be updated - Take the newly collected data and consolidate it into a large report - Serialize that report to JSON and stream it to stdout from where it can be consumed by other tools - such as the sample :ref:`neuron-monitor-cloudwatch.py ` and :ref:`neuron-monitor-prometheus.py ` scripts. - Wait until at least one **metric group** needs to be collected and repeat this flow .. note:: ``neuron-monitor`` fully supports the newly launched Trn2 instances. .. _using-neuron-monitor: Using neuron-monitor -------------------- .. _monitor_cli: .. rubric:: neuron-monitor CLI .. program:: neuron-monitor .. option:: neuron-monitor [parameters] neuron-monitor accepts the following optional parameters: - ``--verbose`` (int) default=0: Can be 0 to 4, and controls the amount of debugging and verbose information sent to stderr; **0: no output**, **4: maximum verbosity** - ``-c, --config-file`` (string): Allows specifying a valid path to a neuron-monitor JSON configuration file **Example:** .. code-block:: neuron-monitor -c monitor.conf Not specifying any configuration file will enable collecting all the metric groups with a period of 5 seconds for all currently running Neuron applications. Configuration file example ~~~~~~~~~~~~~~~~~~~~~~~~~~ Example of a configuration file which enables all available **metric groups** for every running Neuron application, with a global update period of 1 second and sets an update period of 2 seconds for the ``"neuron_hw_counters"`` metric group: :: { "period": "1s", "neuron_runtimes": [ { "tag_filter": ".*", "metrics": [ { "type": "neuroncore_counters" }, { "type": "memory_used" }, { "type": "neuron_runtime_vcpu_usage" }, { "type": "execution_stats" } ] } ], "system_metrics": [ { "type": "vcpu_usage" }, { "type": "memory_info" }, { "period": "2s", "type": "neuron_hw_counters" } ] } Neuron applications tagging ~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to make application monitoring easier, Neuron applications can be tagged with a 255 character string which identifies that app. Tagging is done using the ``NEURON_PROCESS_TAG`` environment variable. For example: ``NEURON_PROCESS_TAG=my_app_1 python training.py`` will associate the ``my_app_1`` tag with that Python application. If ``NEURON_PROCESS_TAG`` is not specified, the application's PID will be used as a TAG. This tag will be used by neuron-monitor to filter Neuron applications. JSON objects and fields in the configuration file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``"neuron_runtimes"`` - array of objects specifying which Neuron Applications to monitor and what metric groups are enabled for each of them - ``"tag_filter"`` - a regex which will be used to filter Neuron applications tags in order to determine if they will be monitored (optional) - ``"metrics"`` - array of objects specifying which metric groups to capture for this Neuron application - ``"type"`` - type of metric group - ``"period"`` - this field applies to **metric group** objects and sets the amount of time between two updates for that metric group - if can be specified as part of the **root** and/or **neuron_runtime** objects where it applies to all their children, and/or as part of a **metric group** object - if there's no period specified, a default value of **5 seconds** will be used - ``"system_metrics"`` - array of objects specifying which system level metric groups are enabled Neuron Runtime-level metric groups ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - :ref:`neuron-monitor-nc-counters` - NeuronCore related metrics - :ref:`neuron-monitor-memory-used` - data on the amount of memory used by the Neuron application - :ref:`neuron-monitor-vcpu-usage` - Neuron application vCPU utilization data - :ref:`neuron-monitor-execution-stats` - Neuron application execution stats, including error count and latency System-wide metric groups ~~~~~~~~~~~~~~~~~~~~~~~~~ - :ref:`neuron-monitor-vcpu-usage` - system-wide vCPU usage - :ref:`neuron-monitor-memory-info` - system-wide memory usage - :ref:`neuron-monitor-hw-counters` - counters for correctable and uncorrectable memory ecc events Execution model --------------- |image| neuron-monitor waits for one or more **metric groups** to be up for update, then collects the corresponding data, consolidates it into a report which is streamed to stdout as a JSON and goes back to waiting. The JSON output format ---------------------- Whenever the report gets updated, a complete JSON is written to stdout. This is its structure: :: { "neuron_runtime_data": [ { "pid": 0, "address": "", "neuron_runtime_tag", "my_app_1", "error": "", "report": { "neuroncore_counters": { [...] }, "execution_stats": { [...] }, "memory_used": { [...] }, "neuron_runtime_vcpu_usage": { [...] } } } ], "system_data": { "neuron_hw_counters": { [...] }, "vcpu_usage": { [...] }, "memory_info": { [...] } }, "instance_info": { [...] }, "neuron_hardware_info": { [...] }, "neuron_k8s_info": { [...] } } - ``"neuron_runtime_data"`` is an array containing one entry per each Neuron application which passes the filter specified in the settings file - ``"pid"`` is the pid of this Neuron application - ``"neuron_runtime_tag"`` is the configured tag for the Neuron application - ``"error"`` specifies any error that occurred when collecting data from this Neuron application - ``"report"`` will contain the results for the Neuron application-level metric groups; their formats are described below - ``"system_data"`` has a similar structure to ``"neuron_runtime_data"``‘s ``"report"`` but only contains system-level metric groups (not associated to any Neuron application) Regardless of the configuration, the following two JSON objects are always present in the output: .. _neuron-monitor-instance-info: instance_info ~~~~~~~~~~~~~ Contains information about the instance on which neuron-monitor is running. :: "instance_info": { "instance_name": "My_Instance", "instance_id": "i-0011223344556677a", "instance_type": "trn2n.48xlarge", "instance_availability_zone": "us-west-2b", "instance_availability_zone_id": "usw2-az2", "instance_region": "us-west-2", "ami_id": "ami-0011223344556677b", "subnet_id": "subnet-112233ee", "error": "" } Depending on when the instance was launched, the following fields might not be available: - ``instance_availability_zone_id`` : available only for instances launched in 2020-08-24 and later - ``instance_region`` : available only for instances launched on 2020-08-24 and later - ``instance_name`` : available only if ``instance_region`` is set and aws-cli tools are installed ``error`` will contain an error string if getting one of the fields, **except those mentioned above**, resulted in an error. .. _neuron-monitor-hardware-info: neuron_hardware_info ~~~~~~~~~~~~~~~~~~~~ Contains basic information about the Neuron hardware. :: "neuron_hardware_info": { "neuron_device_type": "trainium2", "neuron_device_version": "v4", "neuroncore_version": "v3d", "neuron_device_count": 16, "neuron_device_memory_size": 103079215104, "neuroncore_per_device_count": 4, "logical_neuroncore_config": 2, "error": "" } - ``neuron_device_type``: type of the Neuron Devices on the instance - ``neuroncore_version``: version of the NeuronCores on the instance - ``neuron_device_count`` : number of available Neuron Devices - ``neuron_device_memory_size``: total memory available on each Neuron Device - ``neuroncore_per_device_count`` : number of NeuronCores present on each Neuron Device - ``logical_neuroncore_config`` : the current Logical NeuronCore configuration - ``error`` : will contain an error string if any occurred when getting this information (usually due to the Neuron Driver not being installed or not running). The following JSON object is disabled by default, but can be made available if "k8s_info" is enabled: .. _neuron-monitor-k8s-info: neuron_k8s_info ~~~~~~~~~~~~~~~ Contains information about what Kubernetes pods/containers are using Neuron resources :: "neuron_k8s_info": { "period": 15.030359284, "neuroncores_k8s_info": { "0": { "pod_name": "p0", "namespace": "n0", "container_name": ["c0"] }, "1": { "pod_name": "p0", "namespace": "n0", "container_name": ["c0"] }, ... "neurondevices_k8s_info": { "0": { "pod_name": "p0", "namespace": "n0", "container_name": ["c0"] }, ... } "error": "" }, - ``"neuroncores_k8s_info"`` - object containing information on which Neuron cores are being used by Kubernetes pod/containers, indexed by Neuron core index: ``"neuroncore_index": { neuroncore_k8s_data }`` - ``"pod_name"`` - name of pod using Neuron core - ``"namespace"`` - namespace of pod using Neuron core - ``"container_name"`` - names of containers using Neuron core - ``"neurondevices_k8s_info"`` - object containing information on which Neuron devices are being used by Kubernetes pod/containers, indexed by Neuron device index: ``"neurondevice_index": { neurondevice_k8s_data }`` - ``"pod_name"`` - name of pod using Neuron device - ``"namespace"`` - namespace of pod using Neuron device - ``"container_name"`` - names of containers using Neuron device - ``"error"`` - will contain an error string if any occurred when getting this information For more information on how to enable K8s information, see :ref:`neuron-monitor-k8s-infopy`. .. _neuron-metric-groups: Metric Groups ~~~~~~~~~~~~~ Each **metric group** requested in the settings file will get an entry in the resulting output. The general format for such an entry is: :: "metric_group": { "period": 1.015, // Actual captured period, in seconds "error": "", // Error, if any occurred, otherwise an empty string [...] // Metric group specific data } .. _runtime-level-metric-groups-1: Neuron application level metric groups -------------------------------------- .. _neuron-monitor-nc-counters: neuroncore_counters ~~~~~~~~~~~~~~~~~~~ :: "neuroncore_counters": { "period": 1.000113182, "neuroncores_in_use": { "0": { "neuroncore_utilization": 42.01, "flops": 1234567891011, "v3d": { "nc_v3.0": { "neuroncore_utilization": 21.01 }, "nc_v3.1": { "neuroncore_utilization": 63.01 } } }, "1": { "neuroncore_utilization": 42.02, "flops": 1234567891021, "v3d": { "nc_v3.2": { "neuroncore_utilization": 21.02 }, "nc_v3.3": { "neuroncore_utilization": 63.02 } } }, [...] }, "error": "" } - ``"neuroncores_in_use"`` is an object containing data for all the NeuronCores that were active when the data was captured, indexed by NeuronCore index: ``"neuroncore_index": { neuroncore_data }`` - ``"neuroncore_utilization"`` - NeuronCore utilization, in percent, during the captured period - ``"flops"`` - number of floating point operations per second during the captured period - ``"v3d"`` - only available on Trn2 - contains the utilization for every physical NeuronCore that makes up the current NeuronCore - ``"error"`` - string containing any error that occurred when collecting the data .. _neuron-monitor-execution-stats: execution_stats ~~~~~~~~~~~~~~~ :: "execution_stats": { "period": 1.030613214, "error_summary": { "generic": 0, "numerical": 0, "transient": 0, "model": 0, "runtime": 0, "hardware": 0 }, "execution_summary": { "completed": 123, "completed_with_err": 0, "completed_with_num_err": 0, "timed_out": 0, "incorrect_input": 0, "failed_to_queue": 0 }, "latency_stats": { "total_latency": { "p0": 0.01100001, "p1": 0.01100002, "p25": 0.01100004, "p50": 0.01100008, "p75": 0.01100010, "p99": 0.01100012, "p100": 0.01100013 }, "device_latency": { "p0": 0.01000001, "p1": 0.01000002, "p25": 0.01000004, "p50": 0.01000008, "p75": 0.01000010, "p99": 0.01000012, "p100": 0.01000013 } }, "error": "" }, - ``"error_summary"`` is an object containing the error counts for the captured period indexed by their type - ``"generic"`` - generic execution errors - ``"numeric"`` - NAN errors encountered during execution - ``"transient"`` - recoverable errors, such as ECC corrections - ``"model"`` - model-related errors - ``"runtime"`` - Neuron Runtime errors - ``"hardware"`` - hardware errors such as uncorrectable ECC issues - ``"execution_summary"`` is an object containing all execution outcome counts for the captured period indexed by their type - ``"completed"`` - executions completed successfully - ``"completed_with_err"`` - executions that ended in an error other than a numeric error - ``"completed_with_num_err"`` - executions that ended in a numeric error - ``"timed_out"`` - executions that took longer than the Neuron Runtime configured timeout value - ``"incorrect_input"`` - executions that failed to start due to incorrect input being provided - ``"failed_to_queue"`` - execution requests that were rejected due to Neuron Runtime not being able to queue them - ``"latency_stats"`` contains two objects containing latency percentiles, in seconds, for the data captured for the model executed during the captured period. If there are no models being executed during this time, the two objects will be ``null`` (i.e. ``"total_latency": null``) - ``"total_latency"`` - percentiles, in seconds, representing latency for an execution as measured by the Neuron Runtime - ``"device_latency"`` - percentiles, in seconds, representing execution time exclusively on the Neuron Device - ``"error"`` - string containing any error that occurred when collecting the data .. _neuron-monitor-memory-used: memory_used ~~~~~~~~~~~ :: "memory_used": { "period": 1.00001, "neuron_runtime_used_bytes": { "host": 6997643264, "neuron_device": 12519788544, "usage_breakdown": { "host": { "application_memory": 6996594688, "constants": 0, "dma_buffers": 1048576, "tensors": 0 }, "neuroncore_memory_usage": { "0": { "constants": 193986816, "model_code": 176285056, "model_shared_scratchpad": 0, "runtime_memory": 0, "tensors": 20971520 }, "1": { "constants": 193986816, "model_code": 176285056, "model_shared_scratchpad": 0, "runtime_memory": 0, "tensors": 20971520 }, ... } } "loaded_models": [ { "name": "neff", "uuid": "91f2f66e83ea419dace1da07617ad39f", "model_id": 10005, "is_running": false, "subgraphs": { "sg_00": { "memory_used_bytes": { "host": 20480, "neuron_device": 21001024, "usage_breakdown": { "host": { "application_memory": 20480, "constants": 0, "dma_buffers": 0, "tensors": 0 }, "neuron_device": { "constants": 20971520, "model_code": 29504, "runtime_memory": 0, "tensors": 0 } } }, "neuroncore_index": 0, "neuron_device_index": 12 } } }, ... ], "error": "" } - ``"memory_used"`` summarizes the amount of memory used by the Neuron application - ``"neuron_runtime_used_bytes"`` - current amount of memory used by the Neuron application - ``"host"`` - total host DRAM usage in bytes - ``"neuron_device"`` - total Neuron device memory usage in bytes - ``"usage_breakdown"`` - a breakdown of the total memory usage in the other two fields - ``"host"`` - breakdown of the host memory usage - ``"application_memory"`` - amount of host memory used by the application - this includes all allocations that are not included in the next categories - ``"constants"`` - amount of host memory used for constants during training (or weights during inference) - ``"dma_buffers"`` - amount of host memory used for DMA transfers - ``"tensors"`` - amount of host memory used for tensors - ``"neuroncore_memory_usage"`` - a breakdown of memory allocated on the Neuron Devices and the NeuronCores for which it was allocated - ``"0"`` - ``"64"`` (for trn2-48xlarge) - NeuronCores for which the memory was allocated - ``"constants"`` - amount of device memory used for constants during training (or weights during inference) - ``"model_code"`` - amount of device memory used for models' executable code - ``"model_shared_scratchpad"`` - amount of device memory used for the scratchpad shared by the models - a memory region reserved for the models' internal variables and auxiliary buffers - ``"runtime_memory"`` - amount of device memory used by the Neuron Runtime - ``"tensors"`` - amount of device memory used for tensors - ``"loaded_models"`` - array containing objects representing loaded models - ``"name"`` - name of the model - ``"uuid"`` - unique id for the model - ``"model_id"`` - Neuron application-assigned ID for this model - ``"is_running"`` - true if this model is currently started, false otherwise - "``subgraphs"`` - object containing all the subgraphs for the model, indexed by their name: ``"subgraph_name": { subgraph_data }`` - ``"memory_used_bytes"`` - memory usage for this subgraph - ``"host"`` - total host DRAM usage in bytes - ``"neuron_device"`` - total Neuron device DRAM usage in bytes - ``"usage_breakdown"`` - a breakdown of memory allocated at load time for this model - ``"host"`` - breakdown of host memory allocated for this model - ``"application_memory"`` - amount of host memory allocated for this model by the Neuron Runtime which doesn't fall in any of the next categories - ``"constants"`` - amount of host memory used for constants during training (or weights during inference) - ``"dma_buffers"`` - host memory allocated for DMA transfers for this model - ``"tensors"`` - amount of device memory used for tensors at model load time - ``"neuron_device"`` - a breakdown of device memory allocated for this model - ``"constants"`` - amount of device memory used for constants during training (or weights during inference) - ``"model_code"`` - amount of device memory used for the model's executable code - ``"runtime_memory"`` - amount of device memory used by the Neuron Runtime for this model - ``"tensors"`` - amount of device memory allocated for tensors at this model's load time - ``"neuroncore_index"`` - NeuronCore index on which the subgraph is loaded - ``"neuron_device_index"`` - Neuron device index on which the subgraph is loaded - ``"error"`` - string containing any error that occurred when collecting the data neuron_runtime_vcpu_usage ~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: "neuron_runtime_vcpu_usage": { "period": 1.030604818, "vcpu_usage": { "user": 42.01, "system": 12.34 }, "error": "" } - ``"vcpu_usage"`` - object showing vCPU usage in percentages for the Neuron application during the captured period - ``"user"`` - percentage of time spent in user code by this Neuron Application - ``"system"`` - percentage of time spent in kernel code by this Neuron application - ``"error"`` - string containing any error that occurred when collecting the data System level metric groups -------------------------- .. _neuron-monitor-hw-counters: neuron_hw_counters ~~~~~~~~~~~~~~~~~~ :: "neuron_hw_counters": { "period": 1.030359284, "neuron_devices": [ { "neuron_device_index": 0, "mem_ecc_corrected": 0, "mem_ecc_uncorrected": 0, "sram_ecc_uncorrected": 0, "sram_ecc_corrected": 0 } ], "error": "" }, - ``"neuron_devices"`` - array containing ECC data for all Neuron devices - ``"neuron_device_index"`` - Neuron device index - ``"mem_ecc_corrected"`` - number of corrected ECC events in the Neuron device’s DRAM - ``"mem_ecc_uncorrected"`` - number of uncorrected ECC events in the Neuron device’s DRAM - ``"sram_ecc_uncorrected"`` - number of uncorrected ECC events in the Neuron device’s SRAM - ``"sram_ecc_corrected"`` - number of corrected ECC events in the Neuron device’s SRAM - ``"error"`` - string containing any error that occurred when collecting the data .. _neuron-monitor-vcpu-usage: vcpu_usage ~~~~~~~~~~~~ :: "vcpu_usage": { "period": 0.999974868, "average_usage": { "user": 32.77, "nice": 0, "system": 22.87, "idle": 39.36, "io_wait": 0, "irq": 0, "soft_irq": 0 }, "usage_data": { "0": { "user": 34.41, "nice": 0, "system": 27.96, "idle": 37.63, "io_wait": 0, "irq": 0, "soft_irq": 0 }, "1": { "user": 56.84, "nice": 0, "system": 28.42, "idle": 14.74, "io_wait": 0, "irq": 0, "soft_irq": 0 }, [...] }, "context_switch_count": 123456, "error": "" } - each vCPU usage object contains the following fields: - ``"user"`` - percentage of time spent in user code - ``"nice"`` - percentage of time spent executing niced user code - ``"system"`` - percentage of time spent executing kernel code - ``"idle"`` - percentage of time spent idle - ``"io_wait"`` - percentage of time spent waiting for IO operations - ``"irq"`` - percentage of time spent servicing hardware interrupts - ``"soft_irq"`` - percentage of time spent servicing software interrupts - ``"average_usage"`` - contains the average usage across all vCPUs during the captured period - ``"usage_data"`` - contains per vCPU usage during the captured period - ``"context_switch_count"`` - contains the number of vCPU context switches during the captured period - ``"error"`` - string containing any error that occurred when collecting the data .. _neuron-monitor-memory-info: memory_info ~~~~~~~~~~~ :: "memory_info": { "period": 5.346411129, "memory_total_bytes": 49345835008, "memory_used_bytes": 16042344448, "swap_total_bytes": 0, "swap_used_bytes": 0, "error": "" } - ``"memory_total_bytes"`` - total size of the host memory, in bytes - ``"memory_used_bytes"`` - amount of host memory in use, in bytes - ``"swap_total_bytes"`` - total size of the host swap file, in bytes - ``"swap_used_bytes"`` - amount of swap memory in use, in bytes .. _neuron-monitor-companion-scripts: Companion scripts ----------------- neuron-monitor is installed with three Python companion scripts: :ref:`neuron-monitor-cloudwatchpy`, :ref:`neuron-monitor-prometheuspy`, and :ref:`neuron-monitor-k8s-infopy` .. _neuron-monitor-cloudwatchpy: neuron-monitor-cloudwatch.py ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It requires Python3 and the `boto3 Python module `__. It is installed to: ``/opt/aws/neuron/bin/neuron-monitor-cloudwatch.py``. .. _using-neuron-monitor-cloudwatchpy: Using neuron-monitor-cloudwatch.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: neuron-monitor | neuron-monitor-cloudwatch.py --namespace --region For example: :: neuron-monitor | neuron-monitor-cloudwatch.py --namespace neuron_monitor_test --region us-west-2 .. _neuron-monitor-prometheuspy: neuron-monitor-prometheus.py ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It requires Python3 and the `Prometheus client Python module `__. It is installed to: ``/opt/aws/neuron/bin/neuron-monitor-prometheus.py``. .. _using-neuron-monitor-prometheuspy: Using neuron-monitor-prometheus.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: neuron-monitor | neuron-monitor-prometheus.py --port For example: :: neuron-monitor | neuron-monitor-prometheus.py --port 8008 The default value for ``--port`` is ``8000``. If your data visualization framework is Grafana, we provided a :download:`Grafana dashboard ` which integrates with Prometheus and this script. .. |image| image:: ../../images/nm-img2.png .. _neuron-monitor-k8s-infopy: neuron-monitor-k8s-info.py (Beta) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It requires Python3 and the `gRPC Python package `__. It is installed to: ``/opt/aws/neuron/bin/neuron-monitor-k8s-info.py``. .. important:: This companion script is in Beta and is disabled by default. It only works on EKS, and is currently not supported with EKS auto mode. .. _using-neuron-monitor-k8s-infopy: Using neuron-monitor-k8s-info.py ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :: neuron-monitor | neuron-monitor-prometheus.py --port --enable-k8s-info | neuron-monitor-k8s-info.py --period For example: :: neuron-monitor | neuron-monitor-prometheus.py --port 8008 --enable-k8s-info | neuron-monitor-k8s-info.py --period 30 The default value for ``--period`` is ``15``. Running neuron monitor in Kubernetes environment ------------------------------------------------- For running neuron monitor in Kubernetes environment, please refer to instructions `here `_. ================================================ FILE: tools/neuron-sys-tools/neuron-sysfs-user-guide.rst ================================================ .. _neuron-sysfs-ug: Neuron Sysfs User Guide ======================= .. contents:: Table of contents :local: :depth: 3 Introduction ------------ The kernel provides a few ways in which userspace programs can get system information from the kernel space. Sysfs is one common way to do so. It is a virtual filesystem typically mounted on the ``/sys`` directory and contains information about hardware devices attached to the system and about drivers handling those devices. By navigating the hierarchical structure of the sysfs filesystem and viewing the information provided by its files and directories, you can gather valuable information that can help diagnose and resolve a wide range of hardware and system issues. Thus a sysfs filesystem is set up per Neuron Device under ``/sys/devices/virtual/neuron_device`` to give you an insight into the Neuron Driver and Runtime at system level. By performing several simple CLIs such as reading or writing to a sysfs file, you can get information such as Runtime status, memory usage, Driver info etc. You can even create your own shell scripts to query Runtime and Driver statistics from sysfs and generate customized reports. This user guide will first explain the Neuron sysfs structure and then introduce many ways where you can perform diagnostic works with Neuron sysfs. Neuron Sysfs Filesystem Structure --------------------------------- High Level Overview ^^^^^^^^^^^^^^^^^^^ Here is the high level structure of the Neuron sysfs filesystem, where the total and present counters are not shown: .. code-block:: bash /sys/devices/virtual/neuron_device/ ├── neuron0/ │ ├── subsystem │ ├── uevent │ ├── connected_devices │ ├── core_count │ ├── reset │ ├── power/ │ │ ├── async │ │ ├── control │ │ ├── runtime_active_time │ │ ├── runtime_active_kids │ │ └── ... │ ├── info/ │ │ ├── notify_delay │ │ ├── serial_number │ │ └── architecture/ │ │ ├── arch_type │ │ ├── device_name │ │ └── instance_type ├── stats │ ├── hardware │ │ ├── mem_ecc_uncorrected │ │ ├── mem_ecc_repairable_uncorrected │ │ └── sram_ecc_uncorrected │ ├── memory_usage │ │ └── host_mem │ │ ├── application_memory │ │ ├── constants │ │ ├── dma_buffers │ │ ├── dma_rings │ │ ├── driver_memory │ │ ├── notifications │ │ ├── tensors │ │ └── uncategorized │ └── power │ └── utilization ├── neuron_core0/ │ ├── info/ │ │ └── architecture/ │ │ └── arch_type │ ├── stats/ │ │ ├── status/ │ │ │ ├── exec_bad_input │ │ │ ├── hw_error │ │ │ ├── infer_failed_to_queue │ │ │ ├── resource_nc_error │ │ │ ├── unsupported_neff_version │ │ │ ├── failure │ │ │ ├── infer_completed_with_error │ │ │ ├── invalid_error │ │ │ ├── oob_error │ │ │ ├── success │ │ │ ├── generic_error │ │ │ ├── infer_completed_with_num_error │ │ │ ├── resource_error │ │ │ └── timeout │ │ ├── memory_usage/ │ │ │ ├── device_mem/ │ │ │ │ ├── collectives │ │ │ │ ├── constants │ │ │ │ ├── dma_rings │ │ │ │ ├── driver_memory │ │ │ │ ├── model_code │ │ │ │ ├── model_shared_scratchpad │ │ │ │ ├── nonshared_scratchpad │ │ │ │ ├── notifications │ │ │ │ ├── runtime_memory │ │ │ │ ├── tensors │ │ | │ └── uncategorized │ │ │ └── host_mem │ │ └── other_info/ │ │ ├── flop_count │ │ ├── inference_count │ │ ├── model_load_count │ │ ├── reset_fail_count │ │ ├── reset_req_count │ │ └── nc_time_in_use │ └── ... │── neuron_core1/ │ │ ├── info/ │ │ │ └── ... │ │ └── stats/ │ │ └── ... │ └── ... ├── neuron1 ├── neuron2 ├── neuron3 └── ... Each Neuron Device is represented as a directory under ``/sys/devices/virtual/neuron_device/``, where ``neuron0/`` represents the Neuron Device 0, ``neuron1/`` represents the Neuron Device 1, etc. Each NeuronCore is represented as a directory under a Neuron Device directory, represented as ``neuron_core{0,1,2,...}``. Metrics such as Runtime and Driver info and statistics are collected as per NeuronCore in two directories under the NeuronCore directory, i.e. ``info/`` and ``stats/``. Most of the metrics belong to a category called “counter.” Each counter is represented as a directory, which holds two numerical values as two files: total and present. Each memory usage counter has an additional value called peak. The total value starts accumulating metrics when the Driver is loaded. The present value records the last changed metric value. The peak value records the max value so far. Each counter has the same filesystem structure like this: .. code-block:: dash /sys/devices/virtual/neuron_device/neuron0/neuron_core0/status/ ├── exec_bad_input/ │ ├── total │ └── present ├── hw_error/ │ ├── total │ └── present ├── infer_failed_to_queue/ │ ├── total │ └── present └── ... Description for Each Field ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``info/``: This directory stores general information about hardware and software. None of them are counter types. * ``notify_delay``: The delay between notifications from the Neuron Device. Current settings are on (``0``) or off (``-1``). Off by default. * ``serial_number``: The unique device identifier. * ``architecture/``: This directory stores hardware architecture information. * ``arch_type``: The architecture type of the Neuron Device. Sample architecture types are v1, v2, and v3. You can only read the value. You cannot change it. * ``instance_type``: The instance type of the Neuron Device. Sample instance types are Inf1, Inf2, and Trn1. You can only read the value. You cannot change it. * ``device_type``: The Neuron Device type. Sample Neuron Device types are Inferentia, Inferentia2, and Trainium1. You can only read the value. You cannot change it. ``stats/``: This directory stores Neuron Runtime and Driver statistics. It contains three subdirectories: ``status/``, ``memory_usage/``, and ``other_info/``. * ``status/``: This directory stores the number of each return status of API calls. As explained in :ref:`The LIBNRT API Return Codes `, every API call returns an NRT_STATUS value, which represents the return status of that API call. Our sysfs filesystem stores all ``NRT_STATUS`` as subdirectories under the ``status/`` directory. They all have the counter structure. Thus each ``NRT_STATUS`` subdirectory holds two values (total and present) and records the number of times you receive a certain ``NRT_STATUS``. The following is description for each ``NRT_STATUS`` subdirectory. You should see the description align with what is described in :ref:`The LIBNRT API Return Codes `. * ``memory_usage/``: This directory contains memory usage statistics for both device and host, represented as counters. In this directory, the total counters indicate the current memory usage, present counters represent the memory allocation or deallocation amount in the previous operation, and peak counters indicate the maximum memory usage observed. Additionally, this directory provides detailed breakdown statistics for device and host memory usage. These memory breakdown details correspond to the :ref:`Memory Usage Summary ` section displayed on in Neuron Monitor. * ``device_mem/``: The amount of memory that Neuron Runtime uses for weights, instructions and DMA rings. * This device memory per NeuronCore is further categorized into five types: ``collectives/``, ``constants/``, ``dma_rings/``, ``driver_memory/``, ``model_code/``, ``model_shared_scratchpad/``, ``nonshared_scratchpad/``, ``notifications/``, ``runtime_memory/``, ``tensors/``, and ``uncategorized/``. Each of these categories has total, present, and peak. * ``collectives`` - amount of device memory used for collective communication between workers * ``constants`` - amount of device memory used for constants (for applications running training) or weights (for applications running inferences) * ``dma_rings`` - amount of device memory used for storing model executable code used for data movements * ``driver_memory`` - amount of device memory used by the Neuron Driver * ``model_code`` - amount of device memory used for storing model executable code * ``model_shared_scratchpad`` - amount of device memory used for the shared model scratchpad, a buffer shared between models on the same Neuron Core used for internal model variables and other auxiliary buffers * ``nonshared_scratchpad`` - amount of device memory used for non-shared model scratchpad, a buffer used by a single model for internal model variables and other auxiliary buffers * ``notifications`` - amount of device memory used to store instruction level trace information used to profile workloads ran on the device * ``runtime_memory`` - amount of device memory used by the Neuron Runtime (outside of the previous categories) * ``tensors`` - amount of device memory used for tensors * ``uncategorized`` - amount of device memory that does not belong in any other catagory in this list * ``host_mem/``: The amount of memory that Neuron Runtime uses for input and output tensors. * The host memory per Neuron Device is further categorized into four types: ``application_memory/``, ``constants/``, ``dma_buffers/``, ``dma_rings/``, ``driver_memory/``, ``notifications/``, ``tensors/``, ``uncategorized/``. These categories provide more granular host memory classification compared to :ref:`Host Used Memory ` section. Each of these categories has total, present, and peak * ``hardware/``: Hardware statistics. * ``mem_ecc_uncorrected``: The number of unrepairable uncorrected ECC events in the Neuron device's DRAM. * ``mem_ecc_repairable_uncorrected``: The number of repairable uncorrected ECC events in the Neuron device's DRAM. * ``sram_ecc_uncorrected``: The number of uncorrected ECC events in the Neuron device's SRAM. * ``power/``: Power statistics. * ``utilization``: Reports per-minute power usage statistics as a percentage of max power in the following format: ,,,, **Field descriptions:** status Indicates the sampling state in a string. Valid values are: ``POWER_STATUS_VALID`` - Sampling successful ``POWER_STATUS_NO_DATA`` - No samples available ``POWER_STATUS_INVALID`` - An internal sampling error occurred timestamp Time when the sample was collected in Unix epoch seconds (integer) min_power Minimum power utilization during the sampling period (0.00-100.00%) max_power Maximum power utilization during the sampling period (0.00-100.00%) avg_power Average power utilization during the sampling period (0.00-100.00%) The interface updates these statistics every minute based on continuous power sampling. * ``other_info/``: This directory contains statistics that are not included by ``status/`` and ``memory_usage/``. None of them are counter types. * ``flop_count``: The number of flops. You can use it to calculate the TFLOP/s by ``flop_count`` / time interval * ``inference_count``: The number of successful inferences * ``model_load_count``: The number of successful model loads * ``reset_fail_count``: The number of failed device resets * ``reset_req_count``: The number of device resets requests * ``nc_time_in_use``: The time interval in microseconds between the start and the end of the current execution on hardware Other fields: * ``connected_devices``: The list of connected devices' ids. You should see the same output as neuron-ls's CONNECTED DEVICES. * ``reset``: write to this file resets corresponding the Neuron Device. Read and Write to Sysfs ^^^^^^^^^^^^^^^^^^^^^^^^^ Reading a sysfs file gives the value for the corresponding metric. You can use the cat command to view the contents of the sysfs files.: .. code-block:: bash ubuntu@ip-xxx-xx-xx-xxx:~$ sudo cat /sys/devices/virtual/neuron_device/neuron0/neuron_core0/stats/status/failure/total 0 ubuntu@ip-xxx-xx-xx-xxx:~$ sudo cat /sys/devices/virtual/neuron_device/neuron0/neuron_core0/info/architecture/arch_type NCv2 Sysfs metrics of counter type are write to clear. You can write any value to the file, and the metric will be set to 0: .. code-block:: bash ubuntu@ip-xxx-xx-xx-xxx:~$ echo 1 | sudo tee /sys/devices/virtual/neuron_device/neuron0/neuron_core0/stats/status/failure/total 1 Writing to ``reset`` resets the corresponding Neuron Device. E.g. the below resets Neuron Device 0: .. code-block:: bash ubuntu@ip-xxx-xx-xx-xxx:~$ echo 1 | sudo tee /sys/devices/virtual/neuron_device/neuron0/reset 1 Note ^^^^ All files under ``/sys/devices/virtual/neuron_device/neuron0/power`` such as ``runtime_active_kids`` or ``runtime_status`` are related to generic device power management. They are not created or controlled by our sysfs metrics. The word ``runtime`` in these files does not refer to Neuron Runtime. .. _troubleshoot_via_sysfs: How to Troubleshoot via Sysfs ----------------------------- You can perform simple and easy tasks to troubleshoot your ML jobs with one or a few CLIs to read or write the sysfs filesystem. You can do aggregations across all the NeuronCores and all the Neuron Device to get a summarized view using your scripts. You can also use the Sysfs notification feature to wait passively (without wasting CPU cycles) for changes to the values of Sysfs files. To use this feature, you need to implement a user-space program that calls the poll() function on the Sysfs file that you want to wait on. The poll() function has the following signature: ``unsigned int (*poll) (struct file *, struct poll_table_struct *)``. By default, the Sysfs notification feature is turned off when the driver is loaded. To enable notifications, you can set the value of ``/sys/devices/virtual/neuron_device/neuron0/info/notify_delay`` to 0. To disable notifications, you can set it to -1. Please note that enabling this feature can impact performance. Here is a sample user space program using poll(): .. code-block:: dash #include #include #include #include #include int main(int argc, char * argv[]) { char readbuf[128]; int attr_fd = -1; struct pollfd pfd; int retval = 0; ssize_t read_bytes; if (argc < 2) { fprintf(stderr, "Error: Please specify sysfs file path\n"); exit(1); } attr_fd = open(argv[1], O_RDONLY, 0); if (attr_fd < 0) { perror(argv[1]); exit(2); } read_bytes = read(attr_fd, readbuf, sizeof(readbuf)); if (read_bytes < 0) { perror(argv[1]); exit(3); } printf("%.*s", (int)read_bytes, readbuf); pfd.fd = attr_fd; pfd.events = POLLERR | POLLPRI; pfd.revents = 0; while ((retval = poll(&pfd, 1, 100)) >= 0) { if (pfd.revents & (POLLERR | POLLPRI)) { pfd.revents = 0; lseek(attr_fd, 0, SEEK_SET); read_bytes = read(attr_fd, readbuf, sizeof(readbuf)); if (read_bytes < 0) { perror(argv[1]); exit(4); } printf("%.*s", (int)read_bytes, readbuf); } } return 0; } ================================================ FILE: tools/neuron-sys-tools/neuron-top-user-guide.rst ================================================ .. _neuron-top-ug: Neuron Top User Guide ===================== .. contents:: Table of contents :local: :depth: 2 Overview -------- ``neuron-top`` provides useful information about NeuronCore and vCPU utilization, memory usage, loaded models, and Neuron applications. .. note:: ``neuron-top`` fully supports the newly launched trn2 instances. .. note:: If you are parsing ``neuron-top`` output in your automation environment, you can now replace it with ``neuron-monitor`` (:ref:`neuron-monitor-ug`) which outputs data in a standardized, easier to parse JSON format. Using neuron-top ---------------- Command line arguments ~~~~~~~~~~~~~~~~~~~~~~ Launch ``neuron-top`` by simply typing its name in the shell: ``neuron-top``. User interface ~~~~~~~~~~~~~~ The title section of the user interface shows the application's version number, EC2 instance ID, and the instance type on which it is running: |titleimg| The rest of the user interface is divided in 4 sections. The data shown in these sections applies to the currently selected tab - which can be the 'all' tab, which aggregates data from all running Neuron processes, or a tab representing a single Neuron process: |overview| * The ``NeuronCore Utilization`` section shows the NeuronCore utilization for the currently selected tab. ```` is the version of the NeuronCores on the instance (for example, ``v2`` for trn1 instances and inf2 instances, ``v3`` for trn2 instances with ``LNC=1``, ``v3d`` for trn2 instances with ``LNC=2``) Pressing the 'F' key will toggle between displaying utilization percentages - as seen in the previous image - and teraflops (trillion floating point operations per second), as seen in the image below: |flops| * The ``VCPU Utilization`` section shows: * ``System vCPU usage`` - the two percentages are user% and system% * ``Runtime vCPU usage`` - same breakdown .. _neuron_top_mem_usage: * The ``Memory Usage Summary`` section provides a breakdown of the total memory usage on the Neuron Device as well as on the host: .. _neuron_top_host_mem_usage: * ``Host Used Memory`` - amount of host memory used by the selected application (or an aggregate of all applications if 'All' is selected) * ``Total`` - total amount of host memory used * ``Tensors`` - amount of host memory used for tensors * ``Constants`` - amount of host memory used for constants (for applications running training) or weights (for applications running inferences) * ``DMA Buffers`` - amount of host memory used for DMA transfers * ``App. Memory`` - amount of host memory used by the application that doesn't fall in any of the previous categories .. _neuron_top_device_mem_usage: * ``Device Used Memory`` - amount of device memory used by the selected application (or an aggregate of all applications if 'All' is selected) * ``Total`` - total amount of device memory used * ``Tensors`` - amount of device memory used for tensors * ``Constants`` - amount of device memory used for constants (for applications running training) or weights (for applications running inferences) * ``Model Code`` - amount of device memory used for storing model executable code * ``Runtime Memory`` - amount of device memory used by the Neuron Runtime (outside of the previous categories) * ``Model Scratchpad`` - amount of device memory used for the shared model scratchpad, a shared buffer used for internal model variables and other auxiliary buffers * ``Memory Usage Details`` contains memory usage data organized as a tree which can be expanded/collapsed. The columns are: * ``Model ID`` - the Neuron Runtime identifier for this model instance * ``Host Memory`` - amount of host memory used * ``Device Memory`` - amount of device memory used The tree view shows the amount of memory used for the same categories shown in the ``Memory Usage Summary`` but in this section they are attached to either a model (if the memory has been allocated at model load time for that model), or to a NeuronCore (if the memory can't be associated with a model, but has been allocated for that NeuronCore). The 'parent' shows the total amount of memory used - the sum of its children. .. note:: The up/down/left/right keys can be used to navigate the tree view. The 'x' key expands/collapses the entire tree. The bottom bar shows which Neuron process' data is currently displayed by highlighting its tag using a green font and marking it using a pair of '>', '<' characters. The 'all' tab shows an aggregated view of all the Neuron processes currently running on the instance. |tabbar| .. note:: The '1'-'9' keys select the current tab. 'a'/'d' selects the previous/next tab on the bar. .. |titleimg| image:: ../../images/trn2-neuron-top-header.png .. |overview| image:: ../../images/trn2-neuron-top.png .. |flops| image:: ../../images/trn2-neuron-top-nc.png .. |tabbar| image:: ../../images/nt-2.png ================================================ FILE: tools/profiler/neuron-profile-user-guide.rst ================================================ .. _neuron-profile-ug: Neuron Profiler User Guide ============================ The Neuron Profiler, ``neuron-profile``, is a tool to profile and analyze performance of a ML model compiled with the Neuron compiler and run on NeuronDevices. .. important:: The Neuron Profiler will be replaced by the new Neuron Explorer in a future release. For more details and migration guidance, see :ref:`neuron-explorer-faq`. ``neuron-profile`` helps developers identify performance bottlenecks and optimize their workloads for NeuronDevices. neuron-profile provides insights into NeuronDevice activity including the instructions executed on each compute engine (ex. Tensor engine, Vector engine, etc.), DMA data movement activity, and performance metrics such as engine utilization, DMA throughput, memory usage, and more. NeuronDevice activity is collected by the ``neuron-profile capture`` command which runs the model with tracing enabled. Profiling typically has near zero overhead because NeuronDevices have dedicated on-chip hardware profiling. Additionally, ``neuron-profile`` supports Neuron Kernel Interface (NKI) developers in profiling their kernels. For more information, please refer to :ref:`use-neuron-profile` .. _neuron-profiler-installation: Installation ------------ ``neuron-profile`` comes as part of the ``aws-neuronx-tools`` package, and will be installed to ``/opt/aws/neuron/bin``. .. note:: ``neuron-profile`` requires Ubuntu 22.04 or newer, or Amazon Linux 2023 or newer. Capturing profiles requires an Inferentia or Trainium instance, but processing profiles can be done on any instance type. The Neuron web profile viewer utilizes InfluxDB OSS 2.x to store time series data for the profiled workloads after post processing. Please follow the instructions provided at https://portal.influxdata.com/downloads/ for the correct OS. A sample installation of Neuron Profile and InfluxDB is provided below. Ubuntu ~~~~~~ .. code-block:: bash # Install Neuron Profile . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null < /dev/null echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | sudo tee /etc/apt/sources.list.d/influxdata.list sudo apt-get update && sudo apt-get install influxdb2 influxdb2-cli -y sudo systemctl start influxdb influx setup # Fill in the information to finish the setup Capturing a profile ------------------- The ``neuron-profile`` tool can both capture and post-process profiling information. ``neuron-profile`` takes a compiled model (a NEFF), executes it, and saves the profile results to a NTFF (``profile.ntff`` by default). For this example, we assume a NEFF is already available as ``file.neff`` :: $ neuron-profile capture -n file.neff -s profile.ntff Capturing profiles for multi-worker jobs ---------------------------------------- ``neuron-profile`` can capture profiles for collectives-enabled NEFFs running across multiple NeuronCores, NeuronDevices, or even nodes. This is useful for understanding performance and communication overheads when deploying larger distributed models. The following example, performs a distributed run across all NeuronDevices and NeuronCores on an inf2.24xlarge instances, capturing profiles for all 12 workers (one for each NeuronCore). :: $ neuron-profile capture -n file.neff --collectives-workers-per-node 12 -s output/profile.ntff A profile is saved for each worker in the output directory. :: $ ls output profile_rank_0.ntff profile_rank_2.ntff profile_rank_6.ntff profile_rank_1.ntff profile_rank_3.ntff profile_rank_7.ntff profile_rank_10.ntff profile_rank_4.ntff profile_rank_8.ntff profile_rank_11.ntff profile_rank_5.ntff profile_rank_9.ntff It is also possible to run a distributed job while only capturing a profile for a specific worker instead of all workers. To do that, use the ``--collectives-profile-id`` option. :: $ neuron-profile capture -n file.neff --collectives-profile-id 5 --collectives-workers-per-node 12 -s output/profile.ntff $ ls output profile_rank_5.ntff Providing per-worker inputs ~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, ``neuron-profile capture`` uses all-zero inputs or a single set of inputs specified via positional arguments. For multi-worker jobs where each worker needs different inputs, use the ``--multi-input`` (``-m``) option to specify a file that maps inputs to each worker. Each line in the multi-input file corresponds to one worker and follows the same format as the positional ``inputs`` argument (`` `` pairs separated by spaces). For example, for a 2-worker job: :: # inputs.txt IN1 worker0_x.npy IN2 worker0_y.npy IN1 worker1_x.npy IN2 worker1_y.npy Then capture the profile with: :: $ neuron-profile capture -n file.neff -m inputs.txt --collectives-workers-per-node 2 -s output/profile.ntff .. note:: The ``--multi-input`` option cannot be used together with the positional ``inputs`` argument. Capturing profiles for multi-node jobs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For multi-node jobs, ``neuron-profile`` must be invoked on each node using the ``collectives-worker-start-id`` to specify the global index of the first worker on the given node. For example, for a two node job with a total of four workers and two workers per node, the following commands are run on each node. :: # on node 0 $ neuron-profile capture -n file.neff --collectives-worker-start-id 0 --collectives-workers-per-node 2 --collectives-worker-count 4 # on node 1 $ neuron-profile capture -n file.neff --collectives-worker-start-id 2 --collectives-workers-per-node 2 --collectives-worker-count 4 ``neuron-profile`` saves the profile for a worker on the node where that worker was launched. So in the case above, ``profile_rank_0.ntff`` and ``profile_rank_1.ntff`` are saved to node 0, and ``profile_rank_2.ntff`` and ``profile_rank_3.ntff`` are saved to node 1. Processing and viewing the profile results ------------------------------------------ To analyze and view the collected profiling data, use the ``view`` subcommand of ``neuron-profile``. This command performs two main functions: it post-processes the profiling data and starts up an HTTP server. Once the server is running, you can access the profiling results through your web browser. Please note: Chrome is the officially supported browser for viewing profiling results .. note:: Profiles can be processed and viewed on another machine without Neuron devices. The ``aws-neuronx-tools`` package needs to be installed so that you can run ``neuron-profile view``. To process the profile on another instance, you need to copy the NEFF and NTFF files from your Inf or Trn instance to that instance. Viewing a single profile ~~~~~~~~~~~~~~~~~~~~~~~~ The first way to invoke ``neuron-profile view`` is to pass both the NEFF and the NTFF to this command. It will post-process these artifacts and print out a direct link to the profile view. :: $ neuron-profile view -n file.neff -s profile.ntff View profile at http://localhost:3001/profile/n_fdc71a0b582ee3009711a96e59958af921243921 ctrl-c to exit Viewing profiles for multi-worker jobs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Profiles from multi-worker jobs (i.e. more than one NeuronCore) can either be viewed individually or in a combined collectives view. Since profile data is often similar between workers and processing profile data for all workers can be time-consuming, it is recommended to first explore the profile for a single worker or small subset of workers. Viewing the profile for a specific worker is the same as for single-worker profiles. :: $ neuron-profile view -n file.neff -s output/profile_rank_5.ntff View profile at http://localhost:3001/profile/n_fdc71a0b582ee3009711a96e59958af921243921 To view the profile for multiple workers, pass the directory containing all worker profiles to ``neuron-profile``. :: $ neuron-profile view -n file.neff -d output View profile at http://localhost:3001/profile_cc/p_9a69d907e1350100c9b03745eaa67aa7422842ed |neuron-profile-multiworker-timeline| When viewing profiles with the combined collectives view you can easily switch between the timelines of different workers by clicking the "Rank " tabs. Note: the "CC Aggregated View" currently shows no data. This will be populated in an upcoming release. Viewing multiple profiles ~~~~~~~~~~~~~~~~~~~~~~~~~ Alternatively, when post-processing multiple profiles, it may be desirable to have a persistent server running while processing results in the background. In this case, we can skip passing arguments to the command, which will direct users to the main page listing all available profiles. :: $ neuron-profile view View a list of profiles at http://localhost:3001/ In a separate window, we can kick off the post-processing without launching another server by passing the ``--ingest-only`` flag. :: $ neuron-profile view -n file.neff -s profile.ntff --ingest-only Profile "n_47cf9972d42798d236caa68952d0d29a76d8bd66" is ready to view ``n_47cf9972d42798d236caa68952d0d29a76d8bd66`` is the bucket where the data is stored. We can find this profile at ``localhost:3001/profile/``. Accessing the profiles ~~~~~~~~~~~~~~~~~~~~~~ If ``neuron-profile view`` is run on a remote instance, you may need to use port forwarding to access the profiles. From the local machine, SSH to the remote instance and forward ports 3001 (the default ``neuron-profile`` HTTP server port) and 8086 (the default InfluxDB port). Then in the browser, go to ``localhost:3001`` to view the profiles. :: $ ssh @ -L 3001:localhost:3001 -L 8086:localhost:8086 .. _neuron-profile-ug-alternative-outputs: Alternative output formats ~~~~~~~~~~~~~~~~~~~~~~~~~~ Besides the web view mentioned above, ``neuron-profile`` also supports other output formats such as ``summary-text`` and ``summary-json`` for viewing overall metrics of the profile, as well as ``json`` for a parsable alternative. Profile summary ^^^^^^^^^^^^^^^ You can see a summary of each profile using the command ``neuron-profile view --output-format summary-text -n file.neff -s output/profile_rank_.ntff``. This output includes summary metrics and fields for the NeuronCore (``nc_idx``) and NeuronDevice (``nd_idx``) on which the worker was run. For example, the following shows worker 5 used core 1 on device 3 and took 0.017 seconds (17 ms) to run the model. :: $ neuron-profile view --output-format summary-text -n file.neff -s output/profile_rank_5.ntff | grep -e "nd_idx" -e "nc_idx" -e "total_time" nc_idx 1 nd_idx 2 total_time 0.017 This summary is also available as JSON using ``--output-format summary-json``. JSON ^^^^ You can also view the profile summary and all post-processed profiler events together as a single JSON. To do that, use the ``--output-format json`` option. :: $ neuron-profile view --output-format json --output-file profile.json -n file.neff -s output/profile_rank_5.ntff $ cat profile.json { "summary": [ { "total_time": 0.017, "event_count": 11215 [...] } ], "instruction": [ { "timestamp": 10261883214, "duration": 148, "label": "TensorMatrix", "hlo_name": "%add.1 = add(%dot, %custom-call.44)", "opcode": "MATMUL", "operands": "S[5] (Tensor)++@complete acc_flags=3 row_grp=q0 src=fp16@0x5600[1,0,0][3,1,1] dst=0x2000000[1,0,0][3,1,1] 3*128 " }, [...] ] } Understanding a Neuron profile ------------------------------ The section provides a quick overview on what features and information are available through the Neuron web profile viewer. For more information on terms used, please check out the :ref:`neuron_hw_glossary`. Timeline ~~~~~~~~ |neuron-profile-web-timeline| The execution timeline is plotted based on the elapsed nanoseconds since the start of execution. Starting from the bottom, the ``TensorMatrix Utilization`` shows the efficiency of the TensorEngine, and the ``Pending DMA Count`` and ``DMA Throughput`` rows show the DMA activity. In general, we want these to be as high as possible, and in some cases may help give clues as to whether the workload is memory or compute bound. Next are the individual NeuronCore engine executions. These rows show the start and end times for instructions executed by each engine, and clicking on one of these bars will show more detailed information, as well as any dependencies that were found. For models involving collective compute operations, you will additionally see rows labeled with ``CC-core``, which are used to synchronize the CC operations. Towards the top is the DMA activity. These can include the transfers of input and output tensors, intermediate tensors, and any additional spilling or loading to and from the on-chip SRAM memory. .. _neuron-profile-ug-features: Features ~~~~~~~~ The following are some useful features that may help with navigating a profile: - Dragging your cursor across a portion of the timeline will zoom in to the selected window, providing a more in depth view of the execution during that time period. - Hovering over a point will reveal a subset of information associated with it. - Clicking a point will open a text box below the timeline with all the information associated with it. - Right-clicking a point will drop a marker at a certain location. This marker will persist when zooming in and out. - All marker information can be found by clicking the ``Annotations`` button. - Markers can be saved and loaded by using a provided name for the marker set. - Individual markers can be renamed or deleted in this menu as well. - Time span between markers will automatically be shown, and users can change the marker name next to ``diff vs`` to calculate time between other markers. |neuron-profile-annotation-menu| - The "Search" tab can be used to find and highlight specific points in the profile related to the queried field(s). - Click on the "Box Select" button in the top-right corner of the timeline and then click and drag on any region of the plot to select all events in that region and get summary statistics such as total duration and breakdowns of opcodes, transfer_sizes, and more. View Settings ^^^^^^^^^^^^^ Options within the ``View Settings`` tab can be used to further customize the timeline view. Editing any settings will update the URL accordingly, which can be used to re-visit the current view at a later time. To speed up initial load times, the default will be a ``Minimal View`` which only shows the instructions executed and the model FLOPs utilization (MFU) over time. Changing between the minimal and full views can also be done through the ``Reset to Full View`` or ``Reset to Minimal View`` buttons. - ``DMA color group`` will recolor DMAs based on the selected grouping. For example, "Engine" will re-color the DMAs based on the associated engine. - ``Instruction color group`` will recolor instructions based on the selected grouping. For example, "Layer" will re-color the timeline based on the associated framework layer name. - ``Layer group depth`` will group and color instructions at the selected layer depth. It will apply when ``Instruction color group`` is set to "Layer". **Example:** When ``Layer group depth`` is 2, instructions with layers `model/layer1/op1` and `model/layer1/op2` will be set to the same color. - ``Semaphore IDs`` allows for the selection of multiple semaphore values to show at once within the timeline |neuron-profile-view-settings| Additionally, there are various summary tabs that can be clicked to provide more information on the model/NEFFs. - ``Layer Summary`` shows timing information, FLOPs and instructions counts per layer. - ``Selection Summary`` shows summarized information for all data points in the selected window when using the "Box Select" mode. - ``NEFF Header`` shows details on the profiled NEFF, such as the number of NeuronCores required to execute. - ``NEFF Nodes`` shows input, output, and weight tensor information, including name, size, and shape. - ``Model Info`` shows a summary of the NTFF, such as the NeuronCore the model was executed on, number of notifications, and hardware execution time. - ``DMA Queues Info`` shows more information on the queues used for data movement. - ``NC Memory Usage Info`` shows a snapshot of the device memory usage breakdown before profiling was started. - ``Terminology`` shows a description of metrics provided in the summary table. |neuron-profile-web-summaries| Performance Warnings ~~~~~~~~~~~~~~~~~~~~ Furthermore, ``neuron-profile`` will automatically highlight some potential performance issues with warning annotations. For example if a tensor has been loaded more than 2 times a warning annotation (seen below as an orange box) will be drawn, encircling the dma instructions where the tensor was loaded many times. Hover on the annotation to see more details about loading the tensor. Another kind of warning annotation will highlight areas of high throttling. This provides the user a potential reason for slow down (thermal protection). Specific throttling details are shown when hovering the annotation. |neuron-profile-tensor-reload-annotation| .. _neuron-profile-collectives-barrier: Collectives ~~~~~~~~~~~ For models involving collective operations, the timeline will show a box around all data points related to each operation. Hovering the top left of the box will reveal more information associated with the operation. .. note:: this feature requires profiles to be captured with Neuron Runtime 2.20 or higher. |neuron-profile-cc-op-annotation| Additionally, for any on-device collectives synchronization barrier, a similar box will be display indicating a barrier instead of an actual collectives operation. |neuron-profile-cc-op-barrier| Event Details ~~~~~~~~~~~~~ The information when a point is clicked is grouped by categories such as `Timing` or `IDs` for convenience. Each row will also include a tool tip on the right side, which can be hovered for an explanation on what the field represents. For instruction `Operands` specifically, clicking on the tooltip will reveal a breakdown of fields that compose an operand, as well as a generic example for reference. The examples may not apply directly to the currently viewed profile. |neuron-profile-click-tooltip| .. _neuron-profile-framework-stack-trace: Framework Stack Trace ---------------------------- The Framework Stack Trace feature shows up in the Event Details when an instruction on the device profile is clicked. This can we used to map the device instructions back to framework level code in JAX or PyTorch to better understand what part of the application code resulted in a particular device instruction. |neuron-profile-stack-trace-event-details| To enable tracking of the stack trace information, you need to set environment variables before compiling your NEFF: :: export XLA_IR_DEBUG=1 export XLA_HLO_DEBUG=1 Once you have the NEFF, you can simply capture the profile as usual. While viewing the profile use the ``--framework-source-root`` to pass the path to framework source files. This is optional and is only needed if you want to view your code along side the profile. :: $ neuron-profile view -n file.neff -s profile.ntff --framework-source-root /path/to/framework/source/files |neuron-profile-stack-trace-viewer| Searching Profiles ~~~~~~~~~~~~~~~~~~ Searching helps identify specific data points that may be worth investigating, such as all instructions related to a specific layer or operation. In the "Search" tab, select the corresponding field of interest and enter the value to search for. Multiple fields can be searched together. Please refer to the tooltip within the tab for more help on the query syntax. The search results will also include a summary of all data points found within the current time range. |neuron-profile-search-summary| Hardware Errors ~~~~~~~~~~~~~~~ Invalid code can lead to errors on Neuron hardware. These errors will be displayed in Neuron Profile's Custom Notification timeline, as shown below. For example an Out of Bounds (OOB) error is displayed as: |neuron-profile-oob-error| Users can correlate the error to the time it occurred and view nearby events to help debug. .. _neuron-profile-scratchpad-mem-usage: View Scratchpad Usage With Memory Tracker ------------------------------------------ The Memory Tracker feature in Neuron Profiler provides detailed insights into scratchpad memory usage over time, showing how memory is allocated and utilized by different tensors during model execution. This is particularly useful for understanding memory bottlenecks and optimizing memory usage patterns. To enable Memory Tracker, you need to set environment variables before compiling your NEFF: :: export XLA_IR_DEBUG=1 export XLA_HLO_DEBUG=1 Then compile your model with these debug flags enabled. After compilation, capture the profile with the ``--enable-dge-notifs`` flag or set ``NEURON_RT_ENABLE_DGE_NOTIFICATIONS=1``: :: $ neuron-profile capture -n file.neff --enable-dge-notifs Finally, view the profile with Memory Tracker enabled: :: $ neuron-profile view -n file.neff -s profile.ntff --enable-memory-tracker The Memory Tracker displays a timeline showing scratchpad memory usage over time, with a detailed breakdown of which tensors are consuming memory at any given point. This visualization helps identify: - Peak scratchpad memory usage - Memory allocation patterns - Tensor-specific memory consumption - Potential memory optimization opportunities |neuron-profiler-memory-tracker| You can interact with the Memory Tracker timeline similar to other profile views - clicking on memory usage bars will show detailed information about the tensors using memory at that time, and you can zoom in to specific time ranges to get a more detailed view of memory allocation patterns. Viewing Profiles with Perfetto ------------------------------ Perfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data. Users of Neuron Profiler have the option of viewing their profiles in the Perfetto UI. To process your profile and generate a Perfetto trace file that can be viewed in the Perfetto UI run the following command: :: $ neuron-profile view -n file.neff -s profile.ntff --output-format perfetto This will generate a ntff.pftrace file. Go to https://ui.perfetto.dev/ in your browser and open the ntff.pftrace file to view your profile in Perfetto. .. note:: When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers. |neuron-profile-perfetto-device| .. _neuron-profile-large-perfetto-profiles: Viewing Large Profiles In Perfetto ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Your browser may run out of memory when viewing ``ntff.pftrace`` (Perfetto trace) files that are more than a few hundred MB. To get around this problem you can use the trace processor script by running the following command on your local system where you wish to view the profile :: curl -LO https://get.perfetto.dev/trace_processor chmod +x ./trace_processor ./trace_processor --httpd ntff.pftrace Now go to https://ui.perfetto.dev/ in your browser and in the dialog box that pops up click the “YES, use loaded trace” button. For more information on using the trace processor script and viewing large traces, please refer to the Perfetto documentation at https://perfetto.dev/docs/visualization/large-traces. Showing Dependencies In Perfetto ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default Neuron Profiler does not process dependencies for profiles to be viewed in Perfetto because Perfetto renders the full dependency chain which can be visually overwhelming. To include dependencies that can be viewed when clicking instructions and DMAs in the Perfetto UI, use the ``--show-perfetto-flows`` flag when processing your profile. :: $ neuron-profile view -n file.neff -s profile.ntff --output-format perfetto --show-perfetto-flows CLI reference ------------- .. rubric:: neuron-profile capture .. rubric:: neuron-profile capture .. code-block:: text neuron-profile capture [parameters] [inputs...] Takes a given compiled NEFF, executes it, and collects the profile results. When no inputs are provided, all-zero inputs are used, which may result in inf or NaNs. It is recommended to use ``--ignore-exec-errors``. **Parameters** ``-n, --neff`` (string) The compiled NEFF to profile. ``-s, --session-file`` (string) The file to store profile session information in. ``--ignore-exec-errors`` Ignore errors during execution. ``inputs`` (positional args) List of inputs in the form of `` `` separated by space. For example: ``IN1 x.npy IN2 y.npy``. The following ``neuron-profile capture`` arguments are only relevant for multi-worker jobs: ``-m, --multi-input`` (string) Path to a file that describes the input list for each requested worker. Each line in the file should correspond to one worker and follow the same format as the ``inputs`` positional argument (i.e. `` `` pairs separated by spaces). Cannot be used together with the ``inputs`` positional argument. If ``inputs`` is used instead, all workers will use the same inputs. ``--collectives-profile-id`` (string) Worker id which will be profiled. Passing ``all`` profiles all workers. (default: ``all``) ``-r, --collectives-workers-per-node`` (int) The number of workers on the current node. The global worker id (rank) of worker n on current node is ``collectives-worker-start-id+n``. ``--collectives-worker-count`` (int) Total number of Neuron workers across all nodes for this collectives run. ``--collectives-worker-start-id`` (int) The rank offset for the first worker on the current node. For example, if node 0 has workers 0,1 and node 1 has workers 2,3 then ``collectives-worker-start-id`` for node 0 and 1 will be 0 and 2, respectively. (default: ``0``) .. rubric:: neuron-profile view .. code-block:: text neuron-profile view [parameters] **Parameters** ``-n, --neff-path`` (string) The compiled NEFF file location. ``-s, --session-file`` (string) The profile results NTFF file location. ``-d, --session-dir`` (string) Directory containing profile files for multi-worker runs. ``--output-format`` (string) How the processed profile should be presented. The default ``db`` writes processed data to the database. ``summary-text`` and ``summary-json`` print the summary data as a table or json, respectively, without writing to the database. The ``perfetto`` option writes processed data to Perfetto's native protobuf based tracing format, and can be visualized in the Perfetto UI. The ``JSON`` option writes processed data to human-readable JSON. (default: ``db``) ``--output-file`` (string) File path to write results to, if applicable for the given output format. ``--db-endpoint`` (string) The endpoint of InfluxDB. (default: ``http://localhost:8086``) ``--db-org`` (string) The org name of InfluxDB. ``--db-bucket`` (string) Name of the InfluxDB bucket where ingested profile data is stored. Also used in the URL for viewing the profile. (Optional) ``--port`` (int) The port number of the http server. (default: ``3001``) ``--force`` Force overwrite an existing profile in the database. ``--terminology`` Print a helpful table of terminology used by the profiler. ``--enable-memory-tracker`` Enable Memory Tracker to view scratchpad usage over time with a breakdown of usage per tensor. This requires having set ``XLA_IR_DEBUG=1`` and ``XLA_HLO_DEBUG=1`` before NEFF compilation and passing ``--enable-dge-notifs`` when capturing the profile. FAQ --- Difference between TensorE and TensorMatrixE Rows in Timeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - TensorE includes instruction trace for LoadStationary (LoadWeight) - TensorMatrixE includes instruction trace for MultiplyMoving (Matmul) - Both instruction traces happen on the same TensorE engine, but we separate them into two rows to de-clutter the timeline due to the background load stationary feature (loading stationary matrix for the next matmul in parallel to current matmul). See more info in :ref:`NKI architecture guide `. Out of memory (OOM) when capturing a profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If ``neuron-profile capture`` fails due to device out-of-memory (OOM), you can increase available memory using using the single-IO mode: Single-IO creates one shared I/O buffer on the device equal to the size of the largest I/O tensor. All inputs and outputs then point to slices of this shared buffer instead of allocating separate tensors. This significantly lowers the device memory needed during capture at the cost of producing incorrect outputs. Example usage: :: neuron-profile capture --single-io -n file.neff -s profile.ntff Important: with ``--single-io``, the profiled performance characteristics (e.g., timing, utilization, bandwidth) are representative, but the model outputs are intentionally not correct. Use this option only to get accurate performance measurements when device memory is tight; do not use it for correctness/accuracy validation. If you are able to make changes to your model itself to reduce memory usage, consider the following: - Reduce batch size - Lower numerical precision - Reduce number of layers In some cases, a full device profile isn’t necessary to understand performance at a high level. You can instead capture a system profile, which shows overall model execution time and a runtime API trace across all workers and does not require extra device memory. See :ref:`System Profiles overview `. Troubleshooting --------------- Outputting to Unsupported NumPy Data Type ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When running ``neuron-profile capture --save-output-npy``, you may encounter an error if the output tensor uses a data type that NumPy doesn't natively support: :: failed to save output output_hbm to file: unsupported type for npy output: bfloat16 To work around this, use ``--save-output`` instead to save the output as raw binary, then convert it to the desired data type using NumPy and the ``ml_dtypes`` library. This preserves the precision of the output since it is written to binary instead of casting to a supported data type. :: # Capture with raw binary output neuron-profile capture --save-output -n file.neff # Convert from raw binary to bfloat16 import numpy as np import ml_dtypes output = np.fromfile('output0.npy', dtype=np.uint16) output = output.view(ml_dtypes.bfloat16) InfluxDB not installed ~~~~~~~~~~~~~~~~~~~~~~ :: $ neuron-profile view -n file.neff -s profile.ntff ERRO[0001] To install influxdb, go to https://portal.influxdata.com/downloads/ and follow the instructions there influxdb not setup correctly: exec: "influx": executable file not found in $PATH :: $ neuron-profile view -n file.neff -s profile.ntff ERRO[0000] influxdb token not setup correctly: exit status 1 Try executing "systemctl start influxdb" and "influx setup" Running ``neuron-profile view`` without InfluxDB installed will result in an error and a pointer to the InfluxDB installation instructions. Please follow the provided instructions and retry. Too many open files ~~~~~~~~~~~~~~~~~~~ :: influxdb2client E! Write error: internal error: unexpected error writing points to database: [shard 10677] open /home/ubuntu/.influxdbv2/engine/data/7caae65aaa48380d/autogen/10677/index/0/MANIFEST: too many open files InfluxDB will encounter "too many open files" and out of memory errors after a few hundred buckets have been created. Two ways to solve this are to delete unused buckets or increase the system file descriptor limit. To increase the file descriptor limit, add the following lines to ``/etc/security/limits.d/efa.conf`` and ``/etc/security/limits.conf``: :: * soft nofile 1048576 * hard nofile 1048576 Add the following lines to /etc/sysctl.conf :: fs.file-max = 197341270 vm.max_map_count=1048576 Commit changes by running ``sudo sysctl -p``. .. |neuron-profile-web-timeline| image:: /images/neuron-profile-web-timeline_2-11.png .. |neuron-profile-annotation-menu| image:: /images/neuron-profile-annotation-menu_2-21.png .. |neuron-profile-view-settings| image:: /images/neuron-profile-view-settings_2-26.png .. |neuron-profile-web-summaries| image:: /images/neuron-profile-web-summaries_2-21.png .. |neuron-profile-tensor-reload-annotation| image:: /images/neuron-profile-tensor-reload-annotation.png .. |neuron-profile-multiworker-timeline| image:: /images/neuron-profile-multiworker-timelime_2-16.png .. |neuron-profile-cc-op-annotation| image:: /images/neuron-profile-cc-op-annotation.png .. |neuron-profile-cc-op-barrier| image:: /images/neuron-profile-cc-op-barrier.png .. |neuron-profile-click-tooltip| image:: /images/neuron-profile-click-tooltip.png .. |neuron-profile-oob-error| image:: /images/neuron-profile-oob-error.png .. |neuron-profile-search-summary| image:: /images/neuron-profile-search-summary.png .. |neuron-profiler-memory-tracker| image:: /images/neuron-profiler-memory-tracker.png .. |neuron-profile-stack-trace-event-details| image:: /images/neuron-profile-stack-trace-event-details.png .. |neuron-profile-stack-trace-viewer| image:: /images/neuron-profile-stack-trace-viewer.png .. |neuron-profile-perfetto-device| image:: /images/neuron-profiler2-perfetto-device.png When viewing UI "FATAL - Failed metadata query" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you are SSH port forwarding the web UI from a remote machine to your local desktop you will need to port forward both the web UI (3001) and the database (8086) like so: :: ssh -L 3001:localhost:3001 -L 8086:localhost:8086 remote_machine Visual Artifacts when viewing profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Some users have reported visual artifacts when viewing certain profiles in browsers other than Chrome. If you encounter this issue, please try using Chrome. For more details, refer to the GitHub issue: https://github.com/aws-neuron/aws-neuron-sdk/issues/1033 ================================================ FILE: tools/profiler/neuron-profiler-2-0-beta-user-guide.rst ================================================ .. _neuron-profiler-2-0-guide: Neuron Profiler 2.0 (Beta) User Guide ===================================== Overview -------- Neuron Profiler 2.0 offers a user-friendly experience for capturing and analyzing application performance through both high-level system profiles and detailed device-level profiles. Users can profile their workloads using framework-specific APIs within their application code or by setting an environment variable before execution. This tool supports profiling for both single-node and distributed workloads, integrating with environments such as ParallelCluster and EKS. Once captured, profile results can be explored through multiple interfaces: the Neuron Profiler UI, the open-source trace viewer `Perfetto `_, or by exporting to a human-readable JSON format. This flexibility in data capture and visualization enables users to gain comprehensive insights into their application's performance across various scenarios and scales. .. important:: The Neuron Profiler will be replaced by the new Neuron Explorer in a future release. For more details and migration guidance, see :ref:`neuron-explorer-faq`. .. note:: Neuron Profiler 2.0 is a set of new features currently in beta that enhance and simplify the experience of capturing and viewing profiles. It is not a replacement of :ref:`Neuron Profiler `, which is the existing feature set specifically for capturing and viewing device profiles. .. _system-profiles-overview: Key benefits ~~~~~~~~~~~~ - End-to-end timing of model execution and a Neuron Runtime API trace across all workers, helping identify scheduling gaps, synchronization, and host/runtime overheads. - No extra device memory usage by default, making system profiles ideal when device memory is limited or when only high-level insights are needed. - Option to capture device profiles for individual models during your workload. - Flexible capture and viewing: enable via environment variables or framework APIs; view in the Neuron Profiler UI, in Perfetto, or export as JSON. Capturing profiles ------------------ Neuron Profiler 2.0 offers several flexible options for capturing profiles. Users can either set an environment variable ``NEURON_RT_INSPECT_ENABLE`` or use the PyTorch or JAX profiling APIs from their application code for fine-grained control over which sections of their code are profiled. PyTorch and JAX users who prefer not to modify their application code can still enable profiling by setting the environment variable before running their application. JAX User Experience ------------------- JAX Setup ~~~~~~~~~~~~ Follow the :ref:`JAX Setup ` instructions to install the required JAX Neuron Plugin and the latest Neuron Driver, Runtime and Tools packages. JAX Profiler ~~~~~~~~~~~~ The JAX context-managed profiling API allows you to profile blocks of code. This will capture a system profile including a Neuron Runtime API trace and Python trace for your application code in the captured block. This will also capture device profiles for any compiled graphs (NEFFs) executed on NeuronCores within this block. To use the profiler, import the ``jax`` package. .. code-block:: python import jax Profiling is enabled for all code enclosed in the context when using ``with jax.profiler.trace(os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"]):`` .. note:: It is important to pass the output directory ``os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"]`` to ``with jax.profiler.trace`` and run ``export NEURON_RT_INSPECT_OUTPUT_DIR=`` before enabling profiling. This ensures all captured profile data is saved to the correct output directory. Custom Annotations in JAX ~~~~~~~~~~~~~~~~~~~~~~~~~ To add custom annotations to blocks of code in your profile, you can use ``jax.profiler.TraceAnnotation``. Annotation names can be created at runtime, such as in the :ref:`example here ` using ``with jax.profiler.TraceAnnotation("my_label"+str(i)):``. For more information on TraceAnnotations, see the official `JAX documentation `_. JAX Profiling using environment variable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of using the jax.profiler context manager, you can enable profiling for your entire application using an environment variable. This is desirable if you want to capture a profile without modifying your application code. To enable profiling with the environment variable ``NEURON_RT_INSPECT_ENABLE=1`` and ``NEURON_RT_INSPECT_OUTPUT_DIR=./output`` before running your application. For example: .. code-block:: shell # make sure to remove call to with jax.profiler.trace from python script NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python jax_script.py When using the ``NEURON_RT_INSPECT_ENABLE`` environment variable instead of ``jax.profiler``, system profiles will not contain a framework and application code trace, only Neuron Runtime API trace. Do not set the ``NEURON_RT_INSPECT_ENABLE`` environment variable and use the ``jax.profiler`` within your application code at the same time. Use one or the other. For more profiling options that can be set through environment variables, see the section :ref:`Profile Capture Environment Variables `. .. _neuron-profile-full-jax-example: Full JAX Example ~~~~~~~~~~~~~~~~ Create a file ``jax_script.py`` which performs repeated matrix multiplications distributed across Neuron devices. .. code-block:: python from functools import partial import os import jax import jax.numpy as jnp from jax.sharding import Mesh, NamedSharding, PartitionSpec as P from jax.experimental.shard_map import shard_map from time import sleep os.environ["XLA_FLAGS"] = "--xla_dump_hlo_snapshots --xla_dump_to=./dump" jax.config.update("jax_default_prng_impl", "rbg") mesh = Mesh(jax.devices(), ('i',)) def device_put(x, pspec): return jax.device_put(x, NamedSharding(mesh, pspec)) lhs_spec = P('i', None) lhs = device_put(jax.random.normal(jax.random.key(0), (128, 128)), lhs_spec) rhs_spec = P('i', None) rhs = device_put(jax.random.normal(jax.random.key(1), (128, 16)), rhs_spec) @jax.jit @partial(shard_map, mesh=mesh, in_specs=(lhs_spec, rhs_spec), out_specs=rhs_spec) def matmul_allgather(lhs_block, rhs_block): rhs = jax.lax.all_gather(rhs_block, 'i', tiled=True) return lhs_block @ rhs with jax.profiler.trace(os.environ["NEURON_RT_INSPECT_OUTPUT_DIR"]): out = matmul_allgather(lhs, rhs) for i in range(10): with jax.profiler.TraceAnnotation("my_label"+str(i)): out = matmul_allgather(lhs, rhs) sleep(0.001) expected = lhs @ rhs with jax.default_device(jax.devices('cpu')[0]): equal = jnp.allclose(jax.device_get(out), jax.device_get(expected), atol=1e-3, rtol=1e-3) print("Tensors are the same") if equal else print("Tensors are different") Set your profile output directory and run the script: .. code-block:: shell export NEURON_RT_INSPECT_OUTPUT_DIR=./output python jax_script.py PyTorch User Experience ----------------------- PyTorch Setup ~~~~~~~~~~~~~ Follow the :ref:`PyTorch Setup ` instructions to install the required PyTorch Neuron packages as well as the latest Neuron Driver, Runtime and Tools. PyTorch Profiler ~~~~~~~~~~~~~~~~ The PyTorch context-managed profiling API allows you to profile blocks of code. This will capture a system profile including a Neuron Runtime API trace and Python trace for your application code in the captured block. This will also capture device profiles for any compiled graphs executed on NeuronCores within this block. To use the profiler, import it in your application: .. code-block:: python from torch_neuronx.experimental import profiler Then profile a block of code using: .. code-block:: python with torch_neuronx.experimental.profiler.profile( port=9012, profile_type='system', target='neuron_profile_perfetto', output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'], ms_duration=30000) as profiler: After modifying your code to call the profiler, run your application as you normally would but set the environment variable ``NEURON_RT_INSPECT_OUTPUT_DIR`` to specify the output directory. .. code-block:: shell NEURON_RT_INSPECT_OUTPUT_DIR=./output python application.py .. note:: it is essential to set ``output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR']`` when starting the profiler from your application code. This ensures that all profile data sources dump to the same output directory. PyTorch Profiling using Environment Variable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of using the ``torch_neuronx.experimental.profiler.profile`` context manager, you can enable profiling for your entire application using environment variable. This is desirable if you want to capture a profile without modifying your application code. To enable profiling with environment variable ``NEURON_RT_INSPECT_ENABLE=1`` and ``NEURON_RT_INSPECT_OUTPUT_DIR=./output`` before running your application. For example .. code-block:: shell # make sure to remove call to with torch_neuronx.experimental.profiler.profile from python script NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python pytorch_script.py When using the ``NEURON_RT_INSPECT_ENABLE`` environment variable instead of ``torch_neuronx.experimental.profiler.profile`` system profiles will not contain a framework and application code trace, only Neuron Runtime API trace. Do not set the ``NEURON_RT_INSPECT_ENABLE`` environment variable and use the ``torch_neuronx.experimental.profiler.profile`` within your application code at the same time. Use one or the other. For more profiling options that can be set through environment variables, see the section :ref:`Profile Capture Environment Variables `. Full PyTorch Example ~~~~~~~~~~~~~~~~~~~~ Create a file ``train_torchrun_context.py`` with the following contents .. code-block:: python import os import torch import torch.nn as nn import torch.nn.functional as F # XLA imports import torch_xla import torch_xla.core.xla_model as xm import torch_xla.debug.profiler as xp import torch_neuronx from torch_neuronx.experimental import profiler os.environ["NEURON_CC_FLAGS"] = "--cache_dir=./compiler_cache" # Global constants EPOCHS = 2 # Declare 3-layer MLP Model class MLP(nn.Module): def __init__(self, input_size=10, output_size=2, layers=[5, 5]): super(MLP, self).__init__() self.fc1 = nn.Linear(input_size, layers[0]) self.fc2 = nn.Linear(layers[0], layers[1]) self.fc3 = nn.Linear(layers[1], output_size) def forward(self, x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) def main(): # Fix the random number generator seeds for reproducibility torch.manual_seed(0) # XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance) device = xm.xla_device() # Start the profiler context-manager with torch_neuronx.experimental.profiler.profile( port=9012, profile_type='system', target='neuron_profile_perfetto', output_dir=os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'], ms_duration=30000) as profiler: # IMPORTANT: the model has to be transferred to XLA within # the context manager, otherwise profiling won't work model = MLP().to(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) loss_fn = torch.nn.NLLLoss() # start training loop print('----------Training ---------------') model.train() for epoch in range(EPOCHS): optimizer.zero_grad() train_x = torch.randn(1, 10).to(device) train_label = torch.tensor([1]).to(device) # forward loss = loss_fn(model(train_x), train_label) # back loss.backward() optimizer.step() # XLA: collect ops and run them in XLA runtime xm.mark_step() print('----------End Training ---------------') if __name__ == '__main__': main() Run this workload with the following command: .. code-block:: shell NEURON_RT_INSPECT_OUTPUT_DIR="output" python simple_demo.py .. _neuron-profiler-non-framework-user-experience: Non-framework Specific User Experience -------------------------------------- You can also control profiling with environment variables. This is useful when you can’t easily change your application code, such as when running an executable which calls the Neuron Runtime or in a containerized environment where the application code is built into the container image. .. _neuron-profiler-capture-environment-variables: Profile Capture Environment Variables -------------------------------------- .. _core-control-variables: Core control variables ~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Description - Default behavior * - ``NEURON_RT_INSPECT_ENABLE`` - Set to ``1`` to enable profiling - Enables system profiling and disables device profiling. To control which profile types are captured, see :ref:`Profile type selection ` * - ``NEURON_RT_INSPECT_OUTPUT_DIR`` - Directory for profile data output - Default directory for captured profile data is ``./output`` .. _profile-type-selection: Profile type selection ~~~~~~~~~~~~~~~~~~~~~~~ .. note:: When ``NEURON_RT_INSPECT_ENABLE`` set to ``1``, ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` is enabled by default (set to 1) and ``NEURON_RT_INSPECT_DEVICE_PROFILE`` is disabled by default (set to ``0``). When ``NEURON_RT_INSPECT_ENABLE`` = 1, two different profile types are available: .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Profile type - Description - Enable capture - Disable capture * - ``NEURON_RT_INSPECT_SYSTEM_PROFILE`` - System-level - Captures runtime system events and operations - Set to ``1`` - Set to ``0`` * - ``NEURON_RT_INSPECT_DEVICE_PROFILE`` - Device-level - Captures detailed NeuronCore hardware metrics - Set to ``1`` - Set to ``0`` .. note:: These variables have no effect if ``NEURON_RT_INSPECT_ENABLE`` is not set to ``1``. .. _advanced-config-vars: Advanced configuration ~~~~~~~~~~~~~~~~~~~~~~~ .. list-table:: :widths: auto :header-rows: 1 :align: left * - Variable - Profile type - Description - Default behavior * - ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` - System-level - Maximum trace events per NeuronCore before oldest events are overwritten - 1,000,000 .. note:: Increasing the event limit will consume more host memory. Example Capturing Profile of Application Using Environment Variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of using the PyTorch or JAX profilers you can profile your Python application (or any application calling the Neuron Runtime API) using environment variables. .. code-block:: shell NEURON_RT_INSPECT_ENABLE=1 NEURON_RT_INSPECT_OUTPUT_DIR=./output python app.py See :ref:`Profile Capture Environment Variables ` for other profiling options that can be set via environment variable. Example Capturing Profile of nccom-test Using Environment Variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Profiling can be enabled using environment variables. For simplicity, we have a quick way to generate a Neuron workload through using :ref:`nccom-test `. nccom-test is a benchmarking tool which is already available with Neuron AMI. .. code-block:: shell export NEURON_RT_INSPECT_ENABLE=1 export NEURON_RT_INSPECT_OUTPUT_DIR=./output nccom-test allr allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512 .. note:: If you have problems with nccom-test add the --debug flag. If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores. To understand the profiling output see this section: :ref:`Inspect Output ` CLI reference for System Profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In addition to controlling profiling with environment variables, you can use the ``neuron-profile inspect`` command line interface for profiling applications. This provides the same functionality as environment variables but helps you avoid typos, invalid arguments, and provides a useful ``--help`` command to explain available options. .. code-block:: shell Usage: neuron-profile [OPTIONS] inspect [inspect-OPTIONS] [userscript...] Application Options: -v, --version Show version and exit Help Options: -h, --help Show this help message [inspect command options] -o, --output-dir= Output directory for the captured profile data, including system and device profiles (default: ./output) -n, --num-trace-events= Maximum number of trace events to capture when profiling. Once hitting this limit, no new events are recorded --capture-system-profiles Disable capture of system profile data. Can reduce output size. --capture-device-profiles Disable capture of device profile data. Can reduce output size. [inspect command arguments] userscript: Run command/script that launches a Neuron workload. E.g. 'python app.py' or './runscript.sh' Example of using System Profiles CLI ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ User can provide any type of their own script to generate a Neuron workload such as Pytorch to the System Profiles CLI. For simplicity, we have a quick way to generate a Neuron workload through using ``nccom-test``. ``nccom-test`` is a benchmarking tool which is already available with Neuron AMI and ``aws-neuronx-tools`` package. .. code-block:: shell ubuntu@ip-172-31-63-210:~$ neuron-profile inspect -o inspect-output-nccom-test nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512 INFO[0000] Running command "nccom-test allg -b 512kb -e 512kb -r 32 -n 10 -d fp32 -w 1 -f 512" with profiling enabled size(B) count(elems) type time:avg(us) algbw(GB/s) busbw(GB/s) 524288 131072 fp32 24.15 21.71 21.03 Avg bus bandwidth: 21.0339GB/s .. note:: If you have problems with nccom-test add the --debug flag. If using a trn1.2xlarge instance, change -r 32 to -r 2 to use fewer neuron cores. .. _neuron-profiler-inspect-output: ``neuron-profile inspect`` Output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The above command shows a Neuron workload execution is being traced and output to ``inspect-output-nccom-test`` directory. You will see the output directory contains a single NEFF file and a device profile (NTFF) for all Neuron Cores which executed that NEFF. You will also see ``ntrace.pb`` and ``trace_info.pb`` files storing the system profile data. Below showing what the outputs will look like: .. code-block:: shell ubuntu@ip-172-31-63-210:~$ tree inspect-output-nccom-test inspect-output-nccom-test ├── i-012590440bb9fd263_pid_98399 │ ├── 14382885777943380728_instid_0_vnc_0.ntff │ ├── 14382885777943380728_instid_0_vnc_1.ntff │ ├── 14382885777943380728_instid_0_vnc_10.ntff │ ├── 14382885777943380728_instid_0_vnc_11.ntff ... │ ├── 14382885777943380728_instid_0_vnc_8.ntff │ ├── 14382885777943380728_instid_0_vnc_9.ntff │ ├── cpu_util.pb │ ├── host_mem.pb │ ├── neff_14382885777943380728.neff │ ├── ntrace.pb │ └── trace_info.pb └── 2 directories, 74 files To view a summary of the captured profile data run the command .. code-block:: shell neuron-profile view -d inspect-output-nccom-test --output-format summary-text EKS User Experience ------------------- Capturing a profile on EKS is most easily done through setting of environment variables as described in the section :ref:`Non-framework specific User Experience `. By using environment variables, users do not need to change application code in their container image or modify their run commands. Update the deployment yaml to include the ``NEURON_RT_INSPECT_ENABLE`` and ``NEURON_RT_INSPECT_OUTPUT_DIR`` environment variables. For distributed workloads, it’s important that ``NEURON_RT_INSPECT_OUTPUT_DIR`` points to a directory on a shared volume which all workers have access to. .. code-block:: yaml apiVersion: v1 kind: Pod metadata: name: trn1-mlp spec: restartPolicy: Never schedulerName: default-scheduler nodeSelector: beta.kubernetes.io/instance-type: trn1.32xlarge containers: - name: trn1-mlp env: - name: NEURON_RT_INSPECT_ENABLE value: "1" - name: NEURON_RT_INSPECT_OUTPUT_DIR value: "/shared/output" command: ['torchrun'] args: - '--nnodes=1' - '--nproc_per_node=32' - 'train_torchrun.py' image: ${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPO}:mlp imagePullPolicy: IfNotPresent resources: limits: aws.amazon.com/neuron: 16 .. note:: EKS users running PyTorch and JAX applications are still free to change their application code and use the PyTorch or JAX Python profiling APIs if they want finer-grained control over profiling. However, using the environment variables conveniently allows profiling without modifying the container image or application code. Processing and Viewing Profiles ------------------------------- Users have three output options for interacting with their captured profiles * Neuron Profiler UI - Neuron’s custom UI which allows easily drilling down to detailed device profiles from high level system profiles * Perfetto - Allows sharing profiles as a single file and viewing your profiles in the Perfetto UI at https://ui.perfetto.dev/ * JSON - human-readable text output that enables simple scripting Neuron Profiler UI ~~~~~~~~~~~~~~~~~~ To view a profile in the Neuron Profiler UI run the following command to process a profile and launch the UI .. code-block:: shell neuron-profile view -d ./output To view profiles with the Neuron Profiler UI running locally you will need to have InfluxDB installed on your system. To install and setup InfluxDB follow the :ref:`directions in the official Neuron Profile documentation `. Neuron Profiler System Profile UI ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The system profile timeline shows a trace of Neuron Runtime API calls, ML framework function calls, CPU utilization, and memory usage on each of the instances in your workload. The Neuron Runtime API trace is grouped by NeuronCore IDX and ec2 instance ID. For example, all events in the row labeled nrt-nc-003-i-0f207fb2a99bd2d08 are associated with NeuronCore 3 and instance i-0f207fb2a99bd2d08. Framework function traces are grouped by thread id and ec2 instance id. For example, all events in the row framework-3266405268-i-0f207fb2a99bd2d08 are framework or application function calls made on thread 3266405268 running on instance i-0f207fb2a99bd2d08. |neuron-profiler2-annotate-system-ui| Clicking on trace events in the timeline shows a “Event attributes” view with a list of attributes associated with that event. For example, clicking on an nrt_execute event (the Neuron Runtime API call for executing a compiled model on a NeuronCore) will show events such as Flop count (the number of floating point operations for a single execution of the model), the model name, and the NeuronCore idx and ec2 instance id associated with the function call. |neuron-profiler2-attributes-window| Neuron Profiler 2.0 allows users to drill-down from a system timeline to a device profile timeline in order to see a detailed view of hardware activity during the execution of a graph. To do this, select an nrt_execute event in the timeline and in the “Event attributes” view select the "Open device profile" button under the Model Name attribute. This will open a new window with a device profile. For help understanding a device profile see the section documentation section "Understanding a Neuron Profile" |neuron-profiler2-drilldown-device| To see a list of all device profiles that were captured during your workload press the “Device Profiles” button at the bottom of the timeline. From this list you can see all unique compiled graphs (NEFFs) that were executed on NeuronCores during your workload. For each graph there is a link to a device profile that will show a detailed view of hardware activity on the NeuronCore during execution of this graph. |neuron-profiler2-device-profile-list| Viewing Profiles with Perfetto ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Perfetto is an open-source trace analysis toolkit with a powerful UI for visualizing and analyzing trace data. Users of Neuron Profiler have the option of viewing their profiles in the Perfetto UI. The ``--output-format perfetto`` option writes processed data to Perfetto's native protobuf-based tracing format which can be visualized in the Perfetto UI at https://ui.perfetto.dev/. Example: .. code-block:: shell neuron-profile view -d ./output --output-format perfetto This will generate a ``system_profile.pftrace`` file for the system profile and a ``device_profile_model_.pftrace`` file for each unique compiled model that was executed on a Neuron Device. To view the system profile, go to https://ui.perfetto.dev/ and open the ``system_profile.pftrace`` file. .. note:: When loading trace files in the Perfetto UI, your data is processed locally and not uploaded to Perfetto’s servers. |neuron-profiler2-perfetto-timeline| To view a device profile go to https://ui.perfetto.dev/ and open the ``device_profile_model_.pftrace`` file. This will show a detailed view of hardware activity on the NeuronCore during execution of this graph. |neuron-profiler2-perfetto-device-timeline| .. note:: Your browser may run out of memory when viewing ``*.pftrace`` (Perfetto trace) files that are more than a few hundred MB. See the section :ref:`Viewing Large Profiles in Perfetto ` for directions on how to view large traces using the trace processor. Perfetto Output View Options ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When outputting to Perfetto it is possible to group your traces by different attributes. This is useful for larger profiles involving many NeuronCores and instances. The following options are available: .. list-table:: Perfetto output view options :header-rows: 1 :widths: 30 70 * - CLI option - Description * - ``--system-trace-primary-group`` - First-order grouping of trace events (maps to a Perfetto process / process group of rows). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``instance_id,process_id``. * - ``--system-trace-secondary-group`` - Second-order grouping of trace events (maps to a Perfetto thread / single row). Provide a comma-delimited list of field names. Allowed fields: ``instance_id``, ``worker_gid``, ``thread_id``, ``lnc_idx``, ``process_id``. Default: ``worker_gid,lnc_idx, thread_id``. For example, the following profile uses ``neuron-profile view --output-format=perfetto --system-trace-primary-group=instance_id,process_id --system-trace-secondary-group=lnc_idx,thread_id`` to group the system profile first by unique combinations of instance_id and process_id, and then in each of those groups there are rows of events with unique combinations of lnc_idx and thread_id. |neuron-profiler2-perfetto-grouping| Grouping By Global Worker ID ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By default, Perfetto traces are grouped by ``worker_gid`` which is a unique global identifier for each NeuronCore across all instances in a distributed workload. When clicking on an event in the trace you will see fields for both ``lnc_idx`` (local NeuronCore index on that process) and ``worker_gid`` (global NeuronCore index across all instances). It is possible for ``lnc_idx`` to be the same for different processes on the same instance or across different instances in a distributed workload. However, ``worker_gid`` is unique for each NeuronCore across all instances. The image below shows how to correlate the naming of tracks (rows) in the Perfetto UI to both ``lnc_idx`` and ``worker_gid``. |neuron-profiler2-perfetto-gid| Generating JSON Output From Profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``--output-format`` json option writes processed profile data to human-readable JSON that can be used for scripting and manual inspection. .. code-block:: shell neuron-profile view -d ./output --output-format json This will generate a ``system_profile.json`` file containing the system profile data and a ``device_profile_model_.json`` file for each unique compiled model that was executed on a Neuron Device. The system_profile.json JSON contains the following data types: * ``trace_events``: Neuron Runtime API trace events and Framework/Application trace events containing timestamps, durations, names, and the ec2 instance-id to differentiate between events from different compute nodes in a distributed workload. .. code-block:: json { "Neuron_Runtime_API_Event": { "duration": 27094, "group": "nrt-nc-000", "id": 1, "instance_id": "i-0f207fb2a99bd2d08", "lnc_idx": "0", "name": "nrt_tensor_write", "parent_id": 0, "process_id": "1627711", "size": "4", "tensor_id": "4900392441224765051", "tensor_name": "_unknown_", "thread_id": 1627711, "timestamp": 1729888371056597613, "type": 11 }, "Framework_Event": { "duration": 3758079, "group": "framework-80375131", "instance_id": "i-0f207fb2a99bd2d08", "name": "PjitFunction(matmul_allgather)", "process_id": "701", "thread_id": 80375131, "timestamp": 1729888382798557372, "type": 99999 } } * ``mem_usage``: sampled host memory usage .. code-block:: json { "duration": 1, "instance_id": "i-0f207fb2a99bd2d08", "percent_usage": 9.728179797845964, "timestamp": 1729888369286687792, "usage": 51805806592 } * ``cpu_util``: sampled CPU utilization. Results are provided per core and per ec2 instance involved in a distributed workload .. code-block:: json { "cpu_id": "47", "duration": 1, "instance_id": "i-0f207fb2a99bd2d08", "timestamp": 1729888371287337243, "util": 2.3255813 }, Processing only system or device profiles ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To reduce processing times it is possible to skip processing of system or device profiles. Sometimes users may only be interested in one or want to start with a limited set of profiling data before exploring the full profile. To skip processing of device profiles use the ``--ignore-device-profile`` option. To skip processing of system profiles use the ``--ignore-system-profile`` option. These options can be used with the ``--output-format`` values ``db`` (default), ``perfetto``, or ``json``. For example: .. code-block:: shell neuron-profile view -d ./output --ignore-device-profile --output-format perfetto .. _neuron-profiler-filtering-system-profiles: Filtering System Profiles -------------------------- This guide explains how to filter system trace events to optimize memory usage, reduce output size, and speed up trace processing. **Capture-time filtering** reduces memory usage and trace file size by only collecting specific events, but filtered data cannot be recovered later. **Processing-time filtering** preserves the complete trace and allows flexible analysis with different filters, but requires more memory and storage during capture. Capture-Time Filtering ~~~~~~~~~~~~~~~~~~~~~~ Configure filters before trace capture using environment variables or API functions. You can use NeuronCore filters to only capture events for specific NeuronCores (for example only events associated with NeuronCore 0 or all the NeuronCores on a specific NeuronDevice). You can use event type filters to only capture specific events (for example model execute or collectives events). It is possible to combine both NeuronCore and event type filters. Filtering by NeuronCore ^^^^^^^^^^^^^^^^^^^^^^^ If capture is enabled for a NeuronCore then a ring buffer will be allocated in host memory for storing those core's events. Thus filtering by NeuronCore decreases host memory usage during capture. Default Behavior """"""""""""""""" By default, all visible NeuronCores are enabled for capture. Using Environment Variables """"""""""""""""""""""""""" .. code-block:: shell # Filter to capture events only from NeuronCore 0 export NEURON_RT_INSPECT_EVENT_FILTER_NC=0 # Filter to capture events from NeuronCores 0, 2, and 4 export NEURON_RT_INSPECT_EVENT_FILTER_NC=0,2,4 # Filter to capture events from a range of NeuronCores (0 through 3) export NEURON_RT_INSPECT_EVENT_FILTER_NC=0-3 # Reset to default behavior unset NEURON_RT_INSPECT_EVENT_FILTER_NC # Back to capturing all visible cores Using API Functions """"""""""""""""""" .. code-block:: c #include // Allocate and configure trace options nrt_sys_trace_config_t *config; nrt_sys_trace_config_allocate(&config); nrt_sys_trace_config_set_defaults(config); // Enable capture only for specific NeuronCores // Disable all cores since by default they are all enabled int num_cores = 128; for (int i=0; i // Get all available event types const char **event_types = nullptr; size_t count = 0; NRT_STATUS status = nrt_sys_trace_get_event_types(&event_types, &count); if (status == NRT_SUCCESS) { printf("Available event types:\n"); for (size_t i = 0; i < count; ++i) { printf(" %s\n", event_types[i]); } // Free the event types array for (size_t i = 0; i < count; ++i) { free((void*)event_types[i]); } free((void*)event_types); } Using Environment Variables """"""""""""""""""""""""""" The ``NEURON_RT_INSPECT_EVENT_FILTER_TYPE`` environment variable supports: * **Default**: If not set, all event types are captured * **Specific event types**: Use exact event names from ``nrt_sys_trace_get_event_types()`` * **Event categories**: Use ``hardware`` or ``software`` to filter by category * **Exclusion**: Use ``^`` prefix to exclude specific events from a category .. code-block:: shell # Filter to capture only specific event types export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=nrt_load,nrt_execute,nc_exec_running # Filter to capture all hardware events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware # Filter to capture all software events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software # Filter to capture all hardware events EXCEPT cc_exec export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,^cc_running # Filter to capture all software events EXCEPT nrt_load export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=software,^nrt_load # Mix categories and specific events export NEURON_RT_INSPECT_EVENT_FILTER_TYPE=hardware,nrt_tensor_write,nrt_tensor_read # Reset to default behavior unset NEURON_RT_INSPECT_EVENT_FILTER_TYPE # Back to capturing all event types The ``hardware`` group contains events that are executed on the NeuronCore. These are ``nc_exec_running``, ``cc_running``, ``cc_exec_barrier``, ``numerical_err``, ``nrt_model_switch``, ``timestamp_sync_point``, ``hw_notify``. The ``software`` group contains all other events. Using API Functions """"""""""""""""""" Use the ``nrt_sys_trace_config_set_capture_enabled_for_event_type`` API to filter by event type. .. code-block:: c #include // Configure trace options nrt_sys_trace_config_t *config; nrt_sys_trace_config_allocate(&config); nrt_sys_trace_config_set_defaults(config); // By default, all event types are enabled // Disable specific event types (others remain enabled) nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "device_exec", false); // Or disable all first, then enable only specific ones const char **all_event_types = nullptr; size_t all_count = 0; nrt_sys_trace_get_event_types(&all_event_types, &all_count); // Disable all event types first for (size_t i = 0; i < all_count; ++i) { nrt_sys_trace_config_set_capture_enabled_for_event_type(config, all_event_types[i], false); } // Enable only specific event types nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "model_load", true); nrt_sys_trace_config_set_capture_enabled_for_event_type(config, "nrt_execute", true); // Verify which event types are enabled const char **enabled_types = nullptr; size_t enabled_count = 0; nrt_sys_trace_config_get_enabled_event_types(config, &enabled_types, &enabled_count); printf("Enabled event types: %zu\n", enabled_count); for (size_t i = 0; i < enabled_count; ++i) { printf(" %s\n", enabled_types[i]); } // Clean up memory (caller is responsible) for (size_t i = 0; i < enabled_count; ++i) { free((void*)enabled_types[i]); } free((void*)enabled_types); for (size_t i = 0; i < all_count; ++i) { free((void*)all_event_types[i]); } free((void*)all_event_types); // Start tracing nrt_sys_trace_start(config); // Your application code here... // Cleanup nrt_sys_trace_stop(); nrt_sys_trace_config_free(config); .. _neuron-profile-system-timestamp-adjustment: Adjusting Hardware Timestamps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hardware events executed on the NeuronCore use device-specific timestamps that are in a different time domain than CPU timestamps. To enable accurate correlation between hardware and software events in the JSON system trace output, the runtime automatically adjusts hardware event timestamps to the CPU time domain using synchronization point events. How Timestamp Adjustment Works ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ System trace events are generated from multiple independent time domains: the CPU host and each ML accelerator devices operating with their own clocks. To align events from different domains, the runtime performs software-based time synchronization after event collection. **Sync Point Events**: After each execution, a special ``timestamp_sync_point`` event captures nearly simultaneous timestamps from both the host CPU (``cpu_timestamp_ns``) and the device (``nc_timestamp_ns``). These sync events are used to adjust the timestamps of hardware events to the CPU domain. These synchronization events are included in the returned event trace and serve as reference points for timestamp adjustment. Users can see the sync point used for aligning hardware events in the timeline. **Adjustment Algorithm**: For each hardware event, the runtime: - Uses the sync point with matching exec_id for that NeuronCore - Calculates the time difference between the hardware event and the sync point (in device time) - Applies that same time difference to the sync point's CPU timestamp - Formula: ``adjusted_timestamp = sync_cpu_timestamp + (event_device_timestamp - sync_device_timestamp)`` Illustration:: Sync_Point HW_Event │ │ ▼ ▼ Device Time ─●────────────────●───> |-------Δt------>| - sync_device_timestamp and sync_cpu_timestamp occur ~simultaneously, though their clocks differ CPU Time ────●────────────────●───> - Calc Δt = event_device_timestamp - sync_device_timestamp (elapsed time since sync point on device) |-------Δt------>| - Add Δt to sync_cpu_timestamp to get adjusted_timestamp |neuron-profiler2-syncpoint-timeline| **Hardware Events**: Hardware events that require timestamp adjustment include: - ``nc_exec_running`` (NeuronCore execution start/stop) - ``cc_running`` (collective communication execution) - ``cc_exec_barrier`` (collective communication barriers) - ``numerical_err`` (numerical errors) - ``nc_model_switch`` (NeuronCore model switching) Tips ^^^^ 1. **Memory Optimization**: Use NeuronCore filtering to avoid allocating ring buffers for unused cores and decrease host memory usage. Use both event type or NeuronCore to decrease output trace sizes. 2. **Event Type Discovery**: Use ``nrt_sys_trace_get_event_types()`` to discover available event types 3. **Category Filtering**: Use ``hardware``/``software`` categories for broad filtering 4. **Exclusion Filtering**: Use ``^`` prefix to exclude specific events from categories 5. **Combine Filters**: Use both NeuronCore and event type filters together for maximum optimization Processing-Time Filtering ~~~~~~~~~~~~~~~~~~~~~~~~~~ Apply filters when viewing or processing already captured profiles. This approach allows you to analyze the same trace data in different ways without recapturing. The filters can be used for any ``neuron-profile`` output format including ``--output-format json`` and ``--output-format perfetto``. Filtering by NeuronCore ^^^^^^^^^^^^^^^^^^^^^^^ Use the ``--system-trace-filter-neuron-core`` to only process events for specific NeuronCores. The IDs are local to the instance and not global IDs. If the ``--system-trace-filter-neuron-core`` argument is not set then events from all NeuronCores will be included in the processed trace. .. code-block:: shell # Filter by single neuron core neuron-profile view -d ./output --system-trace-filter-neuron-core "0" --output-format perfetto # Filter by multiple neuron cores neuron-profile view -d ./output --system-trace-filter-neuron-core "0,1,2,3" --output-format perfetto Filtering by Event Type ^^^^^^^^^^^^^^^^^^^^^^^ Use the ``--system-trace-filter-event-type`` to only process specific trace events types. If the ``--system-trace-filter-event-type`` argument is not set then all event types will be included in the processed trace. .. code-block:: shell # Filter by single event type neuron-profile view -d ./output --system-trace-filter-event-type "nrt_execute" --output-format perfetto # Filter by multiple event types neuron-profile view -d ./output --system-trace-filter-event-type "nrt_execute,nrt_load" --output-format perfetto Filtering by Instance ID ^^^^^^^^^^^^^^^^^^^^^^^^ Use the ``--system-trace-filter-instance-id`` to only process events for specific ec2 instances. If the ``--system-trace-filter-instance-id`` argument is not set then events from all instances will be included in the processed trace. .. code-block:: shell # Filter by single instance neuron-profile view -d ./output --system-trace-filter-instance-id "i-abc123" --output-format perfetto # Filter by multiple instances (comma-separated) neuron-profile view -d ./output --system-trace-filter-instance-id "i-abc123,i-def456,i-ghi789" --output-format perfetto Troubleshooting --------------- Incomplete JAX Profiles ~~~~~~~~~~~~~~~~~~~~~~~ If your JAX profile has fewer events than expected or lacks the Runtime API trace, check whether ``jax.profiler.stop_trace`` is being called inside a ``with jax.profiler.trace`` context block. This can prematurely stop tracing. Use ``jax.profiler.stop_trace`` only when profiling was started with ``jax.profiler.start_trace``, not when using the context-managed ``with jax.profiler.trace`` API. Also when using ``jax.profiler`` within your script ensure that the environment variable ``NEURON_RT_INSPECT_ENABLE`` is not set to 1. Additionally, ensure that ``NEURON_RT_INSPECT_OUTPUT_DIR`` is set to the correct output directory and this is the output directory passed to ``with jax.profiler.trace``. Dropped Events in System Profile ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When processing a system profile, you may see a warning indicating that some trace events were dropped during capture. .. code-block:: shell WARN[0000] Warning: 1001 trace events were dropped during capture (stored 530560 out of 531561 total events). Consider increasing buffer size, reducing trace duration, or filtering events. This means during capture the trace event buffers filled and oldest events were overwritten. If you need to avoid dropping events for the full duration of your workload consider the following adjustments: * Increase buffer size by setting ``NEURON_RT_INSPECT_SYS_TRACE_MAX_EVENTS_PER_NC`` (see :ref:`Profile Capture Environment Variables `). This will increase host memory usage. * Apply capture-time filters (NeuronCores / event types) (see :ref:`Filtering System Profiles `.) * Shorten profiled region: limit the code span under the profiling context / runtime. .. |neuron-profiler2-annotate-system-ui| image:: /images/neuron-profiler2-annotate-system-ui.png .. |neuron-profiler2-attributes-window| image:: /images/neuron-profiler2-attributes-window.png .. |neuron-profiler2-device-profile-list| image:: /images/neuron-profiler2-device-profile-list.png .. |neuron-profiler2-drilldown-device| image:: /images/neuron-profiler2-drilldown-device.png .. |neuron-profiler2-perfetto-timeline| image:: /images/neuron-profiler2-perfetto-timeline.png .. |neuron-profiler2-perfetto-device-timeline| image:: /images/neuron-profiler2-perfetto-device-timeline.png .. |neuron-profiler2-perfetto-grouping| image:: /images/neuron-profiler2-perfetto-grouping.png .. |neuron-profiler2-syncpoint-timeline| image:: /images/neuron-profiler2-syncpoint-timeline.png .. |neuron-profiler2-perfetto-gid| image:: /images/neuron-profiler2-perfetto-gid.png ================================================ FILE: tools/tensorboard/getting-started-tensorboard-neuronx-plugin.rst ================================================ .. _neuronx-plugin-tensorboard: NeuronX Plugin for TensorBoard (Trn1) ====================================== .. contents:: Table of Contents :local: :depth: 2 Overview -------- This guide is for developers who want to better understand how their model is executed using Neuron SDK through TensorBoard. The Neuron plugin for TensorBoard provides metrics to the performance of machine learning tasks accelerated using the Neuron SDK. It is compatible with TensorBoard versions 1.15 and higher. It provides visualizations and profiling results for graphs executed on NeuronCores. .. note:: The following information is compatible with Neuron SDK for Trn1. For a walkthrough on Inf1, please check out the guide :ref:`neuron-plugin-tensorboard`. Enable profiling on Trn1 ------------------------ .. note:: Profiling is currently only supported with PyTorch Neuron (``torch-neuronx``). Please refer to the following guides: - PyTorch-Neuron - :ref:`torch-neuronx-profiling-with-tb` Launch TensorBoard ------------------ In this step, we will process the Neuron profile data and launch TensorBoard. 1. Install the Neuron plugin for Tensorboard on your EC2 instance. .. code:: bash pip install tensorboard-plugin-neuronx --extra-index-url https://pip.repos.neuron.amazonaws.com .. note:: If using TensorBoard >= 2.5, please use the ``--load_fast=false`` option when launching. ``tensorboard --logdir results --load_fast=false`` 2. After you see the following message, TensorBoard is ready to use. By default, TensorBoard will be launched at ``localhost:6006``. :: ... Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.4.1 at http://localhost:6006/ (Press CTRL+C to quit) View results in TensorBoard --------------------------- In this step, we will view the Neuron plugin for TensorBoard from a browser on your local development machine. 1. Connect to the EC2 instance where TensorBoard is running while enabling port forwarding. In this example, we assume TensorBoard has been launched using the default address ``localhost:6006``. .. code:: bash # if Ubuntu-based AMI ssh -i ubuntu@ -L 6006:localhost:6006 # if AL2-based AMI ssh -i ec2-user@ -L 6006:localhost:6006 2. In a browser, visit |tensorboard_address|. 3. In the top navigation bar, switch from ``Graphs`` to ``Neuron``. If it does not show up, please wait a while and refresh the page while the plugin loads. If the issue persists, check the ``Inactive`` dropdown list on the right and check for ``Neuron``. |image1| 4. If TensorBoard failed to find the generated logs, you will see the following message: |image2| In this case, please make sure the version of the ``aws-neuronx-tools`` package and the Neuron framework package is from Neuron release 2.6 or newer. Neuron Trace View ----------------- |image3| The trace view gives a high level timeline of execution by aligning Neuron events, such as Neuron Device execution, data transfers, and Collective Compute synchronization (if applicable), with other events from the XLA profiler. Use this view to better understand bottlenecks during the run, and potentially experiment with how execution changes by moving the ``mark_step()`` call which will execute the graph. Neuron Operator View -------------------- |image4| The operator view can show timing information for both the framework operators and HLO operators by selecting the ``operator-framework`` and ``operator-hlo`` tools respectively. The pie charts show breakdowns of the time taken by device, as well as per operator on a single device. The table below lists out the operators and can be sorted by clicking on the columnn headers. For fused operations, hover over the ``?`` to see which operators are being executed. For a quick glance at the most time consuming operators, click the ``Time %`` column in the table to sort by the relative time spent on this type of operation compared to the rest of the model. Neuron Operator Timeline View ----------------------------- |image5| The operator timeline view is a detailed look into a single execution with Neuron. A high level overview at the top breaks down the execution into categories, including Neuron Runtime setup time, as well as NeuronCore compute engine and DMA engine busyness. Activity on the compute and DMA engines are further categorized as compute, control, and data transfer intervals which are shown as separate processes, with each showing a hierarchical view of the framework operators and their corresponding HLO operation. The fused operations can be a result of compiler optimizations or are operations that are running in parallel on the device. Each bar can be clicked to show information regarding which operators are overlapped. This view can give better insight into how operators translate to Neuron, as well as how certain Neuron compiler options may improve performance. Troubleshooting --------------- TensorBoard launch fails ~~~~~~~~~~~~~~~~~~~~~~~~ :: ImportError: cannot import name 'Mapping' from 'collections' This is an issue with Python 3.10 and a dependency of an old tensorboard version. To workaround this error, please run ``pip install --upgrade tensorboard``. For more information, see https://github.com/tensorflow/tensorboard/pull/5490. .. |image1| image:: /images/Neuron_Profiler_Tensorboard_Dropdown.jpg .. |image2| image:: /images/tb-plugin-img12.png :height: 2914 :width: 5344 :scale: 10% .. |image3| image:: /images/Neuron_Profiler_Runtime_Trace_Original.jpg .. |image4| image:: /images/Neuron_Profiler_T1_Op_Framework_View.png .. |image5| image:: /images/TB_Operator_Timeline_2-10.png .. |tensorboard_address| raw:: html localhost:6006 ================================================ FILE: tools/tensorboard/index.rst ================================================ .. _tensorboard-neuron: TensorBoard =========== TensorBoard integration with AWS Neuron provides powerful visualization and debugging capabilities for machine learning workloads. The Neuron TensorBoard plugins enable developers to monitor training progress, analyze model performance, and debug compilation issues through familiar TensorBoard interfaces. .. toctree:: :maxdepth: 1 :hidden: TensorBoard for NeuronX TensorBoard for Trn1 -------------------- .. grid:: 1 :gutter: 3 .. grid-item-card:: TensorBoard Plugin for NeuronX (Trn1) :link: /tools/tensorboard/getting-started-tensorboard-neuronx-plugin :link-type: doc :class-header: sd-bg-primary sd-text-white Comprehensive guide for using the TensorBoard Neuron plugin on Trn1 instances, including installation, configuration, and advanced visualization features. .. grid-item-card:: Profiling PyTorch NeuronX (``torch-neuronx``) with TensorBoard :link: /tools/tutorials/torch-neuronx-profiling-with-tb :link-type: doc :class-header: sd-bg-primary sd-text-white Step-by-step tutorial for monitoring PyTorch training progress on Trn1 instances using TensorBoard scalars, metrics visualization, and performance tracking. ================================================ FILE: tools/third-party-solutions.rst ================================================ .. _third-party-tool-solutions: Third-party solutions ====================== AWS Neuron integrates with multiple third-party partner solutions that alow you to run deep learning workloads on Amazon EC2 instances powered by AWS Trainium and AWS Inferentia chips. The following list gives an overview of third-party solutions that work with AWS Neuron. Datadog """"""" Datadog, an observability and security platform, provides real-time monitoring for cloud infrastructure and ML operations. Datadog is excited to launch its AWS Neuron integration, which pulls metrics collected by Neuron SDK’s Neuron Monitor tool into Datadog, enabling users to track the performance of their Trainium and Inferentia-based instances. By providing real-time visibility into model performance and hardware usage, Datadog helps customers ensure efficient training and inference, optimized resource utilization, and the prevention of service slowdowns. `Datadog documentation `_ ================================================ FILE: tools/tutorials/index.rst ================================================ .. _neuron-tools-tutorials: Tutorials ============ .. toctree:: :hidden: :maxdepth: 1 performance-profiling-vllm torch-neuronx-profiling-with-tb tutorial-tensorboard-scalars-mnist tutorial-neuron-monitor-mnist .. grid:: 1 2 2 2 :gutter: 3 .. grid-item-card:: Profiling a vLLM Inference Workload :link: /tools/tutorials/performance-profiling-vllm :link-type: doc :class-card: sd-border-1 Learn how to capture and analyze device-level and system-level profiles for vLLM inference workloads on AWS Trainium. .. grid-item-card:: Profiling a NKI Kernel :link: /nki/guides/use-neuron-profile :link-type: doc :class-card: sd-border-1 Learn how to profile a NKI kernel with Neuron Explorer. .. grid-item-card:: Profiling PyTorch Neuron with TensorBoard :link: tutorial-tensorboard-scalars-mnist :link-type: doc :class-card: sd-border-1 Learn how to use Neuron's plugin for TensorBoard that allows users to measure and visualize performance on a torch runtime level or an operator level. .. grid-item-card:: Track System Resource Utilization during Training with Neuron Monitor :link: tutorial-neuron-monitor-mnist :link-type: doc :class-card: sd-border-1 Learn how to monitor resource utilization using neuron-monitor, Prometheus and Grafana while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron. .. grid-item-card:: Track Training Progress in TensorBoard using PyTorch Neuron :link: torch-neuronx-profiling-with-tb :link-type: doc :class-card: sd-border-1 Learn how to track training progress in TensorBoard while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron. ================================================ FILE: tools/tutorials/performance-profiling-vllm.rst ================================================ .. meta:: :description: Learn how to use Neuron Explorer to capture and analyze system-level and device-level profiles for vLLM inference workloads on AWS Trainium :date-modified: 12/02/2025 Profiling a vLLM Inference Workload on AWS Trainium ========================================================================== This tutorial outlines the steps involved in using Neuron Explorer to capture and view system-level and device-level profiles for a vLLM-hosted inference workload on AWS Trainium. Overview -------- By following this tutorial you will learn how to: * Launch a vLLM-hosted inference workload on AWS Trainium with system and device-level profiling enabled * View the system-level profile using Perfetto * Identify regions within the system profile that show LLM context-encoding (prefill) and token generation (decode) running on the NeuronDevices, along with the names of the associated compute graphs * View the device-level profiles for context-encoding & token generation compute graphs in the Neuron Explorer UI Prepare your environment ------------------------ The following steps show how to launch a Trainium EC2 instance using the latest Neuron Deep Learning AMI (DLAMI) and then install vLLM so that an example vLLM-hosted model can be profiled using the Neuron Explorer. If you would prefer to use a containerized environment (Docker, EKS), please refer to the Neuron documentation to get started with a Neuron Deep Learning Container (DLC) image that has vLLM pre-installed. 1. Launch a Trainium instance (trn1.32xlarge, trn2.3xlarge, trn2.48xlarge) 1. Option 1: Launch the instance using the latest AWS Deep Learning AMI (DLAMI), which includes the Neuron SDK preinstalled. Once the instance is launched, please SSH into it and use the virtual environment for neuronx-distributed-inference by following this command - 1. ``source /opt/aws_neuronx_venv_pytorch_2_9_nxd_inference/bin/activate`` 2. Option 2: If using a fresh Linux instance, manually install the latest Neuron packages by following the AWS Neuron installation guide. 2. Install vLLM 1. Refer to the Neuron documentation which outlines how to install the Neuron vLLM fork from source. Step 1: Save a smaller version of your model -------------------------------------------- When profiling LLMs it is usually desirable to use only a subset of the model's layers in order to understand model performance and to identify possible bottlenecks. Capturing traces for the entire model could lead to an excessive volume of profiling data, making analysis cumbersome. To address this, the following script takes the Qwen3-8B-base model, truncates it to the first 4 layers, and saves the resulting smaller model for profiling purposes. .. code-block:: python import transformers model_id = "Qwen/Qwen3-8B-Base" config = transformers.AutoConfig.from_pretrained(model_id) config.num_hidden_layers = 4 config.layer_types = ["full_attention"] * 4 tokenizer = transformers.AutoTokenizer.from_pretrained(model_id) output_dir = "4layer_qwen3" model = transformers.AutoModelForCausalLM.from_pretrained(model_id, config=config) model.save_pretrained(output_dir) tokenizer.save_pretrained(output_dir) Save the above python script as ``save_4layer_qwen.py`` and then run it using the python interpreter: .. code-block:: bash python3 ./save_4layer_qwen.py Once the script has completed, you should see the new ``4layer_qwen`` directory which contains the truncated model. Step 2: Run a vLLM offline inference workload with profiling enabled -------------------------------------------------------------------- In this step, you will run a small vLLM offline inference script that will compile, run, and profile your 4-layer Qwen3 model on the Trainium chips. Begin by saving the following python script as ``qwen3_offline_inference.py``: .. code-block:: python import os os.environ['VLLM_NEURON_FRAMEWORK'] = "neuronx-distributed-inference" # Enable Neuron profiling via environment variables os.environ['XLA_IR_DEBUG'] = "1" os.environ['XLA_HLO_DEBUG'] = "1" os.environ['NEURON_FRAMEWORK_DEBUG'] = "1" os.environ['NEURON_RT_INSPECT_ENABLE'] = "1" os.environ['NEURON_RT_INSPECT_SYSTEM_PROFILE'] = "1" os.environ['NEURON_RT_INSPECT_DEVICE_PROFILE'] = "1" os.environ['NEURON_RT_INSPECT_OUTPUT_DIR'] = "./neuron_profiles" from vllm import LLM, SamplingParams # Sample prompts. prompts = [ "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(top_k=1) # Create an LLM instance using the 4-layer Qwen3 model llm = LLM( model="4layer_qwen3", max_num_seqs=4, max_model_len=128, additional_config={ "override_neuron_config": { "enable_bucketing":False, }, }, enable_prefix_caching=False, tensor_parallel_size=8) # Run inference using the sample prompts outputs = llm.generate(prompts, sampling_params) Next, run the offline inference script with a Python interpreter: .. code-block:: bash python3 ./qwen3_offline_inference.py After ~60s the script should complete, and you will see a new ``neuron_profiles`` directory which contains both system-level and device-level profile traces for this example inference workload. Step 3: Visualize the system profile for your model --------------------------------------------------- .. note:: System profiles are currently viewed using the open-source Perfetto tool. Viewing of system profiles will be natively supported by the Neuron Explorer UI in an upcoming release. Run the following command to generate a Perfetto compatible file from the system profile traces that you previously captured: .. code-block:: bash neuron-explorer view -d ./neuron_profiles --ignore-device-profile \ --output-format perfetto The above command generates a file called ``system_profile.pftrace`` in your working directory. Copy the ``system_profile.pftrace`` file to your local machine and open up the Perfetto UI in your local web browser. In the left-hand menu, choose "Open trace file" and select your ``system_profile.pftrace`` file to view the system profile. Expand the first row under Default Workspace and you will see a timeline view similar to the following: .. image:: /tools/profiler/images/perf-profiling-1.png The system profile shows a high-level chronological view of the various Neuron Runtime API calls that took place during your example inference workload. If you hover the mouse cursor over the various pink/green bars you can see which specific API call occurred at each time point, such as ``nrt_tensor_read``, ``nrt_tensor_write``, ``nrt_execute``, and ``nrt_load_collectives``. Look for the **nrt_execute** bar identified below and select it. This will open an information dialog providing details of the specific ``nrt_execute`` call: .. image:: /tools/profiler/images/perf-profiling-2.png .. image:: /tools/profiler/images/perf-profiling-3.png In the Arguments pane you will find useful information such as the following: * device_profile - the unique name of the device profile associated with this event * nc_idx - the index of the NeuronCore that is associated with this API call * model_name - path to the compiled Neuron Executable File Format (NEFF) compute graph associated with this event In the above screenshot, notice that the model_name field provides additional information about what is happening during this part of the model execution: .. code-block:: text tmp/nxd_model/context_encoding_model/_tp0_bk0/model.MODULE_6d1668c2294e2409dd72+ad9e832d.neff * ``context_encoding_model`` - indicates that this is handling context-encoding (prefill) during vLLM inference (other model names will alternatively include token_generation_model to indicate the token-generation / decode phase of inference). * ``tp0`` - indicates that this profile is associated with the rank0 of the tensor-parallel (TP) replica group * ``bk0`` - indicates that this profile is associated with the first sequence bucket as configured in Neuronx Distributed Inference (NxDI) NeuronConfig. Step 4: Visualize device profiles in Neuron Explorer ---------------------------------------------------- In this step, you will view a device profile for your model in Neuron Explorer UI. If you look inside the ``neuron_profiles`` directory that was created during Step 2, you will see many Neuron Executable File Format (NEFF) and their associated Neuron Trace File Format (NTFF) files. For each pair of NEFF/NTFF files, the NEFF represents the Neuron-compiled compute graph for a portion of your model, and the NTFF represents the device-level profile trace for that specific compute graph. While you are free to view any of the device-level profiles using the Neuron Explorer UI, it is often more useful to start from the system-level profile and identify a specific device-level profile of interest. Let's refer back to the nrt_execute region of the system-level profile that was covered in the previous section. Please find and left-click this region to bring up the information dialog at the bottom of Perfetto: .. image:: /tools/profiler/images/perf-profiling-4.png .. image:: /tools/profiler/images/perf-profiling-5.png In the device_profile field, note that numerical ID that is included at the end of the device profile name, in this case 2120860766. This ID is what you will use to locate the NEFF/NTFF pair associated with this specific nrt_execute API call. Use the following find command (substituting-in your device profile ID) to locate the NEFF/NTFF files associated with your identified ID: .. code-block:: bash find ./neuron_profiles -name \*2120860766\* | sort .. image:: /tools/profiler/images/perf-profiling-6.png In the above output you can see that there is a single NEFF file ``neff_2120860766.neff``, and multiple NTFF files ``2120860766_instid_0_vnc_0.ntff`` ... ``2120860766_instid_0_vnc_7.ntff`` each representing the profile trace for one of the 8 NeuronCores that participated in this inference request. These are the files you will open in the Neuron profiler UI to inspect the device-level execution. Please copy the NEFF and one of the NTFF files to your local machine, as you will need to upload the files to the Neuron Explorer UI using your web browser. To view the Neuron Profile Web UI, execute the ``view`` command to start the Neuron Explorer web UI: .. code-block:: bash $ neuron-explorer view --data-path ./ --output-format parquet ```` is a path that neuron-explorer will use for storing and managing profiles. The above command also prints a URL that you can click to open the web UI: .. code-block:: text View a list of profiles at http://localhost:3001/ If ``neuron-explorer view`` is run on a remote instance, you may need to use port forwarding to access the web UI. By default, ``neuron-explorer`` creates a web server on port 3001 and the API server on port 3002. To enable connection to your browser on your local computer, you must to establish an SSH tunnel to both ports 3001 and 3002. For example: .. code-block:: bash ssh -L 3001:localhost:3001 -L 3002:localhost:3002 @ -fN If you created an EC2 instance with PEM credentials, include them in the SSH tunnel as seen below: .. code-block:: bash ssh -i ~/my-ec2.pem -L 3001:localhost:3001 -L 3002:localhost:3002 ubuntu@[PUBLIC_IP_ADDRESS] -fN Once the SSH tunnel is setup, you can now open a browser and navigate to http://localhost:3001. With the Neuron Explorer UI open, go to "Profile Manager", and click "Upload Profile" at the top-right of the screen. Give your profile an appropriate name, and upload the NEFF and NTFF files that you previously identified: .. image:: /tools/profiler/images/perf-profiling-7.png After a few seconds, you should receive a message indicating that NEFF/NTFF were uploaded successfully: .. image:: /tools/profiler/images/perf-profiling-8.png Within the Neuron Explorer UI, go tot he Profile Manager screen and look for your newly uploaded profile. .. image:: /tools/profiler/images/perf-profiling-9.png Depending on the size of your profile, it could take a few minutes before the Status field shows "PROCESSED". Once processing is complete, click the profile name to open the profile: .. image:: /tools/profiler/images/perf-profiling-10.png Confirmation ------------ Congratulations, you have now successfully generated both system-level and device-level profiles for a vLLM inference workload using Neuron Explorer and learned how to visualize them. This knowledge will enable you to effectively analyze the performance characteristics of your workload and identify potential optimization opportunities. Clean up -------- After completing your profiling experiments, remember to terminate the instance you launched to avoid unnecessary costs. Next steps ---------- Now that you've completed this tutorial, try profiling your own model to analyze its workload. Identify performance gaps, apply optimizations, and profile again to measure the improvements. For a deeper dive into performance analysis, check out Neuron's blog series on profiling. ================================================ FILE: tools/tutorials/torch-neuronx-profiling-with-tb.rst ================================================ .. _torch-neuronx-profiling-with-tb: Profiling PyTorch NeuronX with TensorBoard ============================================================== .. contents:: Table of Contents :local: :depth: 2 Introduction ------------ Neuron provides a plugin for TensorBoard that allows users to measure and visualize performance on a torch runtime level or an operator level. With this information, it becomes quicker to identify any performance bottleneck allowing for quicker addressing of that issue. For more information on the Neuron plugin for TensorBoard, see :ref:`neuronx-plugin-tensorboard`. Setup ----- Prerequisites ~~~~~~~~~~~~~ 1. Initial `Trn1 setup for PyTorch (torch-neuronx) `__ has been done Environment ~~~~~~~~~~~ :: #activate python virtual environment and install tensorboard_plugin_neuron source ~/aws_neuron_venv_pytorch_p38/bin/activate pip install tensorboard_plugin_neuronx #create work directory for the Neuron Profiling tutorials mkdir -p ~/neuron_profiling_tensorboard_examples cd ~/neuron_profiling_tensorboard_examples Part 1: Operator Level Trace for ``xm.markstep()`` workflow ------------------------------------------------------------- Goal ~~~~ After completing this tutorial, the user should be able to understand the features of the Operator Level Trace. The user should also be able to form a narrative/surface level analysis from what is being presented in the Operator Level Trace. Set Up ~~~~~~ Let’s set up a directory containing the material for this demo :: cd ~/neuron_profiling_tensorboard_examples mkdir tutorial_1 cd tutorial_1 # this is where our code will be written touch run.py Here is the code for ``run.py``: :: import os import torch import torch_neuronx from torch_neuronx.experimental import profiler import torch_xla.core.xla_model as xm os.environ["NEURON_CC_FLAGS"] = "--cache_dir=./compiler_cache" device = xm.xla_device() class NN(torch.nn.Module): def __init__(self): super().__init__() self.layer1 = torch.nn.Linear(4,4) self.nl1 = torch.nn.ReLU() self.layer2 = torch.nn.Linear(4,2) self.nl2 = torch.nn.Tanh() def forward(self, x): x = self.nl1(self.layer1(x)) return self.nl2(self.layer2(x)) with torch.no_grad(): model = NN() inp = torch.rand(4,4) output = model(inp) with torch_neuronx.experimental.profiler.profile( port=9012, profile_type='operator', ms_duration=10000 ): # IMPORTANT: the model has to be transferred to XLA within # the context manager, otherwise profiling won't work neuron_model = model.to(device) neuron_inp = inp.to(device) output_neuron = neuron_model(neuron_inp) xm.mark_step() print("==CPU OUTPUT==") print(output) print() print("==TRN1 OUTPUT==") print(output_neuron) Understanding the Code ~~~~~~~~~~~~~~~~~~~~~~ For this first tutorial, we’ll be using a simple Feed forward NN model. However, once the TensorBoard dashboard is up, we’ll see some interesting and unexpected things. A simple model is helpful since it is easy to reference back to. Another important part is the “operator” profiling type we specified in the context manager. **Low Level:** The “operator“ dashboard is the dashboard that contains the Operator Level Trace This view also only zooms in on the NeuronDevice, while the ”trace“ dashboard shows processes from all devices. The Operator Level Trace View is organized by levels of abstraction, with the top level showing the model class. The next lower tier shows model components, and the lowest tier shows specific operators that occur for a specific model component. This view is useful for identifying model bottlenecks at the operator level. We also print out the outputs from the CPU model and the TRN1 model to note the small differences in output. Running The Profiler ~~~~~~~~~~~~~~~~~~~~ :: python run.py **Output:** Initial Output & Compilation Success :: 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************************************************** Analyzing dependencies of Block1 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************************************************** Analyzing dependencies of Block1 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************************************************** Dependency reduction of sg0000 0% 10 20 30 40 50 60 70 80 90 100% |----|----|----|----|----|----|----|----|----|----| *************************************************** Processing the Neuron Profiler Traces :: torch_neuron: Waiting for XLA profile completion ... torch_neuron: translate_xplane: Processing plane: '/host:CPU' torch_neuron: XLA decode - Read filename 2023_04_28_00_54_04 torch_neuron: XLA decode - Read date parts ['2023', '04', '28', '00', '54', '04'] torch_neuron: XLA decode - Read start date 2023-04-28 00:54:04 from directory stamp torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline_split.json' torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline_split.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_op_timeline_split.json' torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline.json' torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_op_timeline.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_op_timeline.json' torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_hlo_op.json' torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_hlo_op.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_hlo_op.json' torch_neuron: translate_xplane: Processing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_framework_op.json' torch_neuron: translate_xplane: Writing plane: '/host:Neuron-runtime:profile//c1a992f0ea378f7a_1/model10001/node5/plugins/neuron/1682643254/neuron_framework_op.json' to 'temp_profiler_logs/c1a992f0ea378f7a_1/neuron_framework_op.json' Printing output from CPU model and Trn1 Model: :: ==CPU OUTPUT== tensor([[-0.1396, -0.3266], [-0.0327, -0.3105], [-0.0073, -0.3268], [-0.1683, -0.3230]]) ==TRN1 OUTPUT== tensor([[-0.1396, -0.3266], [-0.0328, -0.3106], [-0.0067, -0.3270], [-0.1684, -0.3229]], device='xla:1') Loading the Operators Level Trace in TensorBoard ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run ``tensorboard --load_fast=false --logdir logs/`` Take note of the port (usually 6006) and enter ``localhost:`` into the local browser (assuming port forwarding is set up properly) .. note:: Check :ref:`Tensorboard Interface Overview` to understand TensorBoard interface The Operator Level Trace views are the same format plus an id at the end; ``year_month_day_hour_minute_second_millisecond_id``. The Tool dropdown will have 3 options: operator-framework, operator-hlo, and operator-timeline. Operator Framework View ~~~~~~~~~~~~~~~~~~~~~~~ |tensorboard-operator-framework-view| This view contains a pie-chart displaying the proportional execution time for each of the model operators on the framework level for a neuron device. The list of operators is shown in the bottom along with other details about number of occurrences, execution time and neuron device and core. Operator HLO View ~~~~~~~~~~~~~~~~~ |tensorboard-operator-hlo-view| This view contains a pie-chart displaying the proportional execution time for each of the model operators on the hlo level for a Neuron device. The list of operators is shown in the bottom along with other details about number of occurrences, execution time and neuron device and core. .. note:: For this simple model, the pie chart will be the same as the framework view. This won't be the case for larger and more complex models. Operator Trace View ~~~~~~~~~~~~~~~~~~~ |tensorboard-operator-trace-view| .. _trace_view_sections: Trace View Sections ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Notice there are four sections: Process Overview, Control, Execution, and Data Transfer. In each section there are more subdivisions with each layer representing a certain level of abstraction. Also important to note that the timescale axis is aligned between the two sections. This is important to note as sometimes there are gaps in the process execution. Most of the time, there are data transfer operations happening in between the gaps. Fusion Operators ^^^^^^^^^^^^^^^^ **Simple Case:** Zooming in on the operations, we can recognize some operations for a neural network, such as a dot product and transpose, but sometimes there will be fused operators (fusion operators). To understand these operators, click on it, and on the bottom of the dashboard, some information will appear. |tensorboard-operator-trace-fusion-simple| Notice in the above example the fusion operator is fusing the operator before and after itself on the timeline. More specifically, ``fused_3`` is a fusion of ``NN[model]/input`` and ``NN[model]/ReLU[nl1]/Tensor_1/aten__relu_maximum``. These kinds of fusions occur when the ``neuronx-cc`` compiler has found an optimization relating to the two operators. Most often this would be the execution of the operators on separate compute engines or another form of parallelism. **Complex Case:** Most often, the order of fusion operators can get a little complicated or contain "hidden" information. For the first example, let’s zoom into the data transfer section such that we see the timescale range from 6000 ns. to 6600 ns. It should look similar to below: |tensorboard-operator-trace-fusion-complex| Looking at ``fused_16`` (11452 ns) we see it's surrounded by other fused operators. Furthermore, the ``fused_16`` operator fuses more than two operators: ``NN[model]/Linear[layer1]/aten__addmm_add``, ``NN[model]/input``, and ``NN[model]/Linear[layer1]/aten__addmm_dot``. These operators can be found in the timeline, but sometimes the fused operators may not exist in the timeline due to it occurring within another operation. We go over an example of this case in Part 2. Understanding the Low Level Timeline ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking at the trace we can look behind the scenes at how the model is executed on neuron hardware. Before proceeding with the analysis, it is worth recalling the way we defined the model for this tutorial: .. code:: python class NN(torch.nn.Module): def __init__(self): super().__init__() self.layer1 = torch.nn.Linear(4,4) self.nl1 = torch.nn.ReLU() self.layer2 = torch.nn.Linear(4,2) self.nl2 = torch.nn.Tanh() def forward(self, x): x = self.nl1(self.layer1(x)) return self.nl2(self.layer2(x)) Analysis ^^^^^^^^ **Input Operators:** We see input operators here. This is because in a markstep flow, we need to transfer inputs to the xla device. This is represented by the ``SyncTensorsGraph.53`` call. **ReLU at the beginning:** The first couple of blocks in the Process Data Transfer section initially appear to be confusing. There is an ``Input`` (0 ns.) block followed by a ``ReLU`` (100 ns.) operator. Under the hood here, ``ReLU`` is rewritten as an ``elementwise_max(arr,0)``, (0 here means an array with zeros) but to create this operation, the zeros have to be set in memory, which is a data operation. A general rule is that if an operator appears this early in the data transfer section, it most likely means there is an operation lowering involving setting some values into memory for use later on. **Memory allocation for Linear[layer1]:** We resume with the data transfer operations. Here, memory is getting allocated for specific operators, and sometimes the allocated inputs get loaded onto operators while the rest of the input gets allocated. This can be seen at ``fused_18`` (11811 ns.) and ``fused_23`` (12181 ns.). Eventually the input gets fully allocated, and other allocations occur for dot products, transpose, and broadcast operators for ``Linear[layer1]`` and ``Linear[layer2]``. Conclusion ^^^^^^^^^^^ There are a few conclusions that can be determined from analyzing the timeline. We can see that we’ve been able to save a bit of time due to parallelism with fusion operations, and saving some compute time with preloading operations (ex. ``ReLU``). A clear trend is that a majority of the time is spent on data transfer operations. It is also evident that even a simple Feed Forward NN becomes complicated when put under a microscope in the profiler. Facts such as the implementation of ``ReLU`` in the runtime/architecture, aren’t explicitly stated in the profiler, but do make themselves known by the unusual ordering placement of the trace blocks and unusual fusion operators. In terms of action items that can be taken based on our narrative, there really isn’t any. This is a very very simple model that outputs after 8 microseconds, and we chose it because it is simple to understand. In more realistic examples we will aim to do more compute than data transfer on the hardware, and where possible to overlap data transfer and compute between sequential operations. The profiler revealed a lot of optimizations that were done, via fusion operators and parallelism. However, the end goal of this tool is to be able to improve performance by revealing the bottlenecks of the model. .. note:: While we did explain some of the quirks visible in the profiler at a microscopic level, it isn’t necessary to do so for normal use. This tutorial introduced the microscopic explanation for these occurrences to show to the user that this is *indeed* what happens in the hardware when executing a simple FFNN. Part 2: Operator Level Trace with ``torch_neuronx.trace()`` workflow ---------------------------------------------------------------------- Set Up ~~~~~~ The setup will be similar to Part 1. :: cd ~/neuron_profiling_tensorboard_examples mkdir tutorial_2 cd tutorial_2 # this is where our code will be written touch run.py Here is the code for ``run.py``: :: import os import time import torch import torch_neuronx from torch_neuronx.experimental import profiler class NN(torch.nn.Module): def __init__(self): super().__init__() self.layer1 = torch.nn.Linear(4,4) self.nl1 = torch.nn.ReLU() self.layer2 = torch.nn.Linear(4,2) self.nl2 = torch.nn.Tanh() def forward(self, x): x = self.nl1(self.layer1(x)) return self.nl2(self.layer2(x)) model = NN() model.eval() inp = torch.rand(4,4) output = model(inp) with torch_neuronx.experimental.profiler.profile( port=9012, profile_type='operator', ms_duration=10000, traced_only=True): neuron_model = torch_neuronx.trace(model,inp,compiler_workdir="./compiler_cache") neuron_model(inp) print("==CPU OUTPUT==") print(output) print() print("==INF2 OUTPUT==") print(output_neuron) Important code differences from Part 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. ``import torch_xla.core.xla_model as xm`` is no longer necessary 2. Set ``traced_only=True`` in ``torch_neuronx.experimental.profiler.profile()``. This option is necessary for traced models, otherwise the generated profile will not be accurate or not work. 3. Tracing the model with ``torch_neuronx.trace()`` and removing ``xm.markstep()``. Otherwise, the code is the same as Part 1. Running Part 2 ~~~~~~~~~~~~~~~~~ To Run: :: python run.py The output will look almost identical as Part 1 Loading the Operators Level Trace in TensorBoard ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Run ``tensorboard --load_fast=false --logdir logs/``, just like Part 1. .. note:: Check :ref:`Tensorboard Interface Overview` to understand TensorBoard interface Timeline View: |tensorboard-operator-trace-view-traced| Notable Differences in Timeline View from Part 1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **No Input Operators:** For a traced model, we do not transfer the input to an xla device, so these operations are not seen on the timeline. This also affects scheduling, which is why the time taken in the profiling is less than the markstep one. **Combined Loading of Linear[layer1] and Tanh:** ``fused_19`` (5824 ns) contains a fusion between ``Linear[layer1]`` and ``Tanh[nl2]``. This might be a bit odd, but such data loading parallelism can be understood by understanding how tanh is implemented. Typically, functions like tanh are implemented by lookup tables that require being pre-loaded onto memory, which is a data transfer operation. A bulk of data transfer operations are done in the beginning to optimize computations. .. note:: Despite these differences, the big picture conclusion drawn from Part 1 still holds, as the two timelines are more similar than different. Some new insights drawn is that the traced model performs better than the markstep flow, since this was profiling a single forward pass. .. |tensorboard-url-image| image:: /images/Neuron_Profiler_Tensorboard_Url.jpg .. |tensorboard-NEURON-header| image:: /images/Neuron_Profiler_Tensorboard_Header.jpg .. |tensorboard-NEURON-dropdown| image:: /images/Neuron_Profiler_Tensorboard_Dropdown.jpg .. |tensorboard-run-tool-dropdowns| image:: /images/Neuron_Profiler_Tensorboard_Run_Tool_Dropdowns.jpg .. |tensorboard-run-trace-original| image:: /images/Neuron_Profiler_Runtime_Trace_Original.jpg .. |tensorboard-run-trace-selected-section| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection.jpg .. |tensorboard-run-trace-selected-section-zoomed| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed.jpg .. |tensorboard-run-trace-selected-section-zoomed-named-traces| image:: /images/Neuron_Profiler_Runtime_Trace_Section_Selection_Zoomed_Named_Traces.jpg .. |tensorboard-operator-framework-view| image:: /images/Neuron_Profiler_T1_Op_Framework_View.png .. |tensorboard-operator-hlo-view| image:: /images/Neuron_Profiler_T1_Op_HLO_View.png .. |tensorboard-operator-trace-view| image:: /images/Neuron_Profiler_T1_Op_Trace_View.png .. |tensorboard-operator-trace-view-traced| image:: /images/Neuron_Profiler_T1_Op_Trace_View_Traced.png .. |tensorboard-operator-trace-fusion-simple| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Simple.png .. |tensorboard-operator-trace-fusion-complex| image:: /images/Neuron_Profiler_T1_Op_Trace_Fusion_Complex.png ================================================ FILE: tools/tutorials/tutorial-neuron-monitor-mnist.rst ================================================ .. _track-system-monitor: Track System Resource Utilization during Training with neuron-monitor using PyTorch Neuron ========================================================================================== .. contents:: Table of Contents :local: :depth: 2 This tutorial explains how to monitor resource utilization using **neuron-monitor**, **Prometheus** and **Grafana** while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron. Multi-layer Perceptron MNIST Model ---------------------------------- This tutorial is based on the MNIST example for PyTorch Neuron on Trainium. For the full tutorial, please see :ref:`Multi-Layer Perceptron Training Tutorial `. The Training Job ---------------- For this tutorial, we will make the original script do more work thus giving us more system utilization data to observe. The training loop is simply repeated 1000 times: .. code:: python for run in range(0, 1000): print(f'Run {run}') model.train() ... Save the following code as :download:`train_monitor.py ` and you can run it as ``python3 train_monitor.py`` on a Trn1 instance. .. literalinclude:: /src/examples/pytorch/mnist_mlp/train_monitor.py :language: python Setting up **Prometheus** and **Grafana** ----------------------------------------- .. note:: The setup presented in the following paragraphs can be extended to monitor any number of instances running training jobs or inference workloads. For this tutorial, we will set everything up on a single Trn1 instance running Amazon Linux 2. Setting up **Prometheus** ~~~~~~~~~~~~~~~~~~~~~~~~~ For a more detailed guide on how to install **Prometheus** visit their official guide at https://prometheus.io/docs/prometheus/latest/getting_started/. Download and unzip a prebuilt **Prometheus** binary on your Trn1 instance: .. code:: bash wget https://github.com/prometheus/prometheus/releases/download/v2.38.0/prometheus-2.38.0.linux-amd64.tar.gz tar -xzvf prometheus-2.38.0.linux-amd64.tar.gz cd prometheus-2.38.0.linux-amd64/ Create a config and add a scrape target: .. code:: bash vim prometheus.yml .. code:: yml scrape_configs: - job_name: 'neuron' # Scrape target every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:8000'] Finally, start **Prometheus**: .. code:: bash ./prometheus --config.file=prometheus.yml Setting up **Grafana** ~~~~~~~~~~~~~~~~~~~~~~ For a more detailed guide on how to install **Grafana** visit their official guide at https://grafana.com/grafana/download. Add the Grafana repo to dnf: .. code:: bash sudo vim /etc/yum.repos.d/grafana.repo [grafana] name=grafana baseurl=https://packages.grafana.com/oss/rpm repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://packages.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt Install and start **Grafana**: .. code:: bash sudo dnf install -y grafana sudo /bin/systemctl start grafana-server.service By default, **Grafana** will run a HTTP server on port 3000. If you need to change that, update its config and restart the service: .. code:: bash sudo vim /etc/grafana/grafana.ini ... sudo /bin/systemctl start grafana-server.service Using your favorite web browser, access the Grafana webpage and add a new dashboard. The default user and password are both 'admin': .. image:: tutorial_grafana_login.png :alt: Image: image.png Next, you'll add a Prometheus data source by going to ``Configuration`` -> ``Data Sources``: .. image:: tutorial_grafana_data_sources.png :alt: Image: image.png ... and adding the local **Prometheus** server as a data source: .. image:: tutorial_grafana_add_prometheus.png :alt: Image: image.png Finally, upload the sample dashboard :download:`neuron-monitor-grafana.json ` to **Grafana**: .. image:: tutorial_grafana_upload_dash.png :alt: Image: image.png Monitoring the Training Workload -------------------------------- Start the training job which, due to the artificially added complexity, will take more than 15 minutes: .. code:: bash python train_monitor.py On the same instance, start ``neuron-monitor`` and its companion script, ``neuron-monitor-prometheus.py``: .. code:: bash neuron-monitor | neuron-monitor-prometheus.py Once they are running, you can use your web browser, access the **Grafana** server running on your Trn1 instance and view a timeline of the system utilization. The upper part of the dashboard contains: - a list of the currently monitored instances (for this tutorial there is a single Trn1 instance) - aggregated metrics for stats such as NeuronCore utilization, NeuronCores in use, iteration success rates, error rates etc. - a timeline of execution status rates and execution latencies .. image:: tutorial_grafana_dash_1.png :alt: Image: image.png The lower part of the dashboard contains: - one line of charts containing a timeline of Neuron resource utilization (NeuronCore, vCPU and memory utilization) - one line of charts containing a timeline of host resource utilization (vCPU and memory utilization) .. image:: tutorial_grafana_dash_2.png :alt: Image: image.png ================================================ FILE: tools/tutorials/tutorial-tensorboard-scalars-mnist.rst ================================================ .. _tb_track_training_minst: Track Training Progress in TensorBoard using PyTorch Neuron ============================================================ .. contents:: Table of Contents :local: :depth: 2 This tutorial explains how to track training progress in TensorBoard while running a multi-layer perceptron MNIST model on Trainium using PyTorch Neuron. Multi-layer perceptron MNIST model ---------------------------------- This tutorial is based on the MNIST example for PyTorch Neuron on Trainium. For the full tutorial, please see :ref:`Multi-Layer Perceptron Training Tutorial `. Output TensorBoard logs ----------------------- To generate TensorBoard logs, we first modify the training script to use the ``SummaryWriter``: .. code:: python from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter('./output') In the training loop, we can then use the ``add_scalar`` API to log the loss per step. .. code:: python writer.add_scalar("step loss", loss, idx) At the end of the script, add ``writer.flush()`` to ensure all logs are written. Save the following code as :download:`train_tb.py ` and run it as ``python3 train_tb.py`` on a Trn1 instance. The generated logs can be found in the ``./output`` directory that was passed to ``SummaryWriter``. .. literalinclude:: /src/examples/pytorch/mnist_mlp/train_tb.py :language: python View loss in TensorBoard ------------------------ In order to view your training metrics, install TensorBoard in your Python environment: .. code:: bash pip install tensorboard Then, launch TensorBoard with the ``./output`` directory .. code:: bash tensorboard --logdir ./output Once running, open a new SSH connection to the instance and port-forward TCP port 6006 (ex: -L 6006:127.0.0.1:6006). Once the tunnel is established, TensorBoard can then be accessed via web browser at the following URL: `http://localhost:6006 `__. Please note that you will not be able to access TensorBoard if you disconnect your port-forwarding SSH session to the Trainium instance. .. image:: tb-scalars.png :alt: Image: image.png In TensorBoard, you can now see the loss per step plotted. When capturing loss for multiple runs, you can plot them together on the same graph to compare runs. Be sure to change the output directory for different runs, for example ``./output/run1`` for the first, ``./output/run2`` for the second, etc.