Full Code of axolotl-ai-cloud/axolotl for AI

main b0294b3427da cached

1070 files

5.4 MB

1.5M tokens

3886 symbols

1 requests

Download .txt

Showing preview only (5,837K chars total). Download the full file or copy to clipboard to get everything.

Repository: axolotl-ai-cloud/axolotl
Branch: main
Commit: b0294b3427da
Files: 1070
Total size: 5.4 MB

Directory structure:
gitextract_1sp7sr39/

├── .axolotl-complete.bash
├── .bandit
├── .coderabbit.yaml
├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yaml
│   │   ├── config.yml
│   │   ├── docs.yml
│   │   └── feature-request.yaml
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── SECURITY.md
│   ├── SUPPORT.md
│   ├── release-drafter.yml
│   └── workflows/
│       ├── base.yml
│       ├── docs.yml
│       ├── lint.yml
│       ├── main.yml
│       ├── multi-gpu-e2e.yml
│       ├── nightlies.yml
│       ├── precommit-autoupdate.yml
│       ├── preview-docs.yml
│       ├── pypi.yml
│       ├── tests-nightly.yml
│       └── tests.yml
├── .gitignore
├── .mypy.ini
├── .pre-commit-config.yaml
├── .runpod/
│   ├── .gitignore
│   ├── Dockerfile
│   ├── README.md
│   ├── hub.json
│   ├── requirements.txt
│   ├── src/
│   │   ├── config/
│   │   │   └── config.yaml
│   │   ├── handler.py
│   │   ├── test_input.json
│   │   ├── train.py
│   │   └── utils.py
│   ├── test-input.json
│   └── tests.json
├── CITATION.cff
├── CNAME
├── FAQS.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── VERSION
├── _quarto.yml
├── benchmarks/
│   ├── bench_entropy.py
│   ├── bench_scattermoe_lora.py
│   └── bench_selective_logsoftmax.py
├── cicd/
│   ├── Dockerfile-uv.jinja
│   ├── Dockerfile.jinja
│   ├── __init__.py
│   ├── cicd.sh
│   ├── cleanup.py
│   ├── cleanup.sh
│   ├── e2e_tests.py
│   ├── multigpu.py
│   ├── multigpu.sh
│   └── single_gpu.py
├── codecov.yml
├── deepspeed_configs/
│   ├── zero1.json
│   ├── zero1_torch_compile.json
│   ├── zero2.json
│   ├── zero2_torch_compile.json
│   ├── zero3.json
│   ├── zero3_bf16.json
│   ├── zero3_bf16_cpuoffload_all.json
│   └── zero3_bf16_cpuoffload_params.json
├── devtools/
│   ├── README.md
│   └── dev_chat_template.yml
├── docker/
│   ├── Dockerfile
│   ├── Dockerfile-base
│   ├── Dockerfile-base-next
│   ├── Dockerfile-base-nightly
│   ├── Dockerfile-cloud
│   ├── Dockerfile-cloud-no-tmux
│   ├── Dockerfile-cloud-uv
│   ├── Dockerfile-tests
│   ├── Dockerfile-uv
│   └── Dockerfile-uv-base
├── docker-compose.yaml
├── docs/
│   ├── .gitignore
│   ├── amd_hpc.qmd
│   ├── attention.qmd
│   ├── batch_vs_grad.qmd
│   ├── checkpoint_saving.qmd
│   ├── cli.qmd
│   ├── custom_integrations.qmd
│   ├── dataset-formats/
│   │   ├── conversation.qmd
│   │   ├── index.qmd
│   │   ├── inst_tune.qmd
│   │   ├── pretraining.qmd
│   │   ├── stepwise_supervised.qmd
│   │   ├── template_free.qmd
│   │   └── tokenized.qmd
│   ├── dataset_loading.qmd
│   ├── dataset_preprocessing.qmd
│   ├── debugging.qmd
│   ├── docker.qmd
│   ├── expert_quantization.qmd
│   ├── faq.qmd
│   ├── fsdp_qlora.qmd
│   ├── getting-started.qmd
│   ├── gradient_checkpointing.qmd
│   ├── inference.qmd
│   ├── input_output.qmd
│   ├── installation.qmd
│   ├── lora_optims.qmd
│   ├── lr_groups.qmd
│   ├── mac.qmd
│   ├── mixed_precision.qmd
│   ├── multi-gpu.qmd
│   ├── multi-node.qmd
│   ├── multimodal.qmd
│   ├── multipack.qmd
│   ├── nccl.qmd
│   ├── nd_parallelism.qmd
│   ├── optimizations.qmd
│   ├── optimizers.qmd
│   ├── qat.qmd
│   ├── quantize.qmd
│   ├── ray-integration.qmd
│   ├── reward_modelling.qmd
│   ├── rlhf.qmd
│   ├── scripts/
│   │   ├── examples-allowlist.yml
│   │   ├── generate_config_docs.py
│   │   └── generate_examples_docs.py
│   ├── sequence_parallelism.qmd
│   ├── streaming.qmd
│   ├── telemetry.qmd
│   ├── torchao.qmd
│   └── unsloth.qmd
├── examples/
│   ├── LiquidAI/
│   │   ├── README.md
│   │   ├── lfm2-350m-fft.yaml
│   │   ├── lfm2-8b-a1b-lora.yaml
│   │   └── lfm2-vl-lora.yaml
│   ├── alst/
│   │   ├── README.md
│   │   ├── llama3-8b-deepspeed-alst.yaml
│   │   └── llama3-8b-fsdp2-alst.yaml
│   ├── apertus/
│   │   ├── README.md
│   │   └── apertus-8b-qlora.yaml
│   ├── arcee/
│   │   ├── README.md
│   │   └── afm-4.5b-qlora.yaml
│   ├── archived/
│   │   ├── README.md
│   │   ├── cerebras/
│   │   │   ├── btlm-ft.yml
│   │   │   └── qlora.yml
│   │   ├── code-llama/
│   │   │   ├── 13b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   ├── 34b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   ├── 7b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   └── README.md
│   │   ├── dbrx/
│   │   │   ├── 16bit-lora.yaml
│   │   │   ├── 8bit-lora.yaml
│   │   │   ├── README.md
│   │   │   └── fft-ds-zero3.yaml
│   │   ├── deepcoder/
│   │   │   └── deepcoder-14B-preview-lora.yml
│   │   ├── falcon/
│   │   │   ├── config-7b-lora.yml
│   │   │   ├── config-7b-qlora.yml
│   │   │   └── config-7b.yml
│   │   ├── gemma/
│   │   │   └── qlora.yml
│   │   ├── gptj/
│   │   │   └── qlora.yml
│   │   ├── jeopardy-bot/
│   │   │   └── config.yml
│   │   ├── mpt-7b/
│   │   │   ├── README.md
│   │   │   └── config.yml
│   │   ├── openllama-3b/
│   │   │   ├── README.md
│   │   │   ├── config.yml
│   │   │   ├── lora.yml
│   │   │   └── qlora.yml
│   │   ├── pythia/
│   │   │   └── lora.yml
│   │   ├── pythia-12b/
│   │   │   ├── README.md
│   │   │   └── config.yml
│   │   ├── qwen/
│   │   │   ├── README.md
│   │   │   ├── lora.yml
│   │   │   ├── qlora.yml
│   │   │   ├── qwen2-moe-lora.yaml
│   │   │   └── qwen2-moe-qlora.yaml
│   │   ├── redpajama/
│   │   │   ├── README.md
│   │   │   └── config-3b.yml
│   │   ├── replit-3b/
│   │   │   └── config-lora.yml
│   │   ├── stablelm-2/
│   │   │   ├── 1.6b/
│   │   │   │   ├── fft.yml
│   │   │   │   └── lora.yml
│   │   │   └── README.md
│   │   ├── starcoder2/
│   │   │   └── qlora.yml
│   │   ├── tiny-llama/
│   │   │   ├── README.md
│   │   │   ├── lora-mps.yml
│   │   │   ├── lora.yml
│   │   │   ├── pretrain.yml
│   │   │   └── qlora.yml
│   │   ├── xgen-7b/
│   │   │   └── xgen-7b-8k-qlora.yml
│   │   └── yi-34B-chat/
│   │       ├── README.md
│   │       └── qlora.yml
│   ├── cloud/
│   │   ├── baseten.yaml
│   │   └── modal.yaml
│   ├── cohere/
│   │   └── command-r-7b-qlora.yml
│   ├── colab-notebooks/
│   │   └── colab-axolotl-example.ipynb
│   ├── deepcogito/
│   │   ├── cogito-v1-preview-llama-3B-lora.yml
│   │   └── cogito-v1-preview-qwen-14B-lora.yml
│   ├── deepseek-v2/
│   │   ├── fft-fsdp-16b.yaml
│   │   └── qlora-fsdp-2_5.yaml
│   ├── devstral/
│   │   ├── README.md
│   │   └── devstral-small-qlora.yml
│   ├── distributed-parallel/
│   │   ├── README.md
│   │   ├── llama-3_1-8b-hsdp-tp.yaml
│   │   └── qwen3-8b-fsdp-tp-cp.yaml
│   ├── eaft/
│   │   └── eaft-example.yml
│   ├── falcon-h1/
│   │   ├── falcon-h1-1b-deep-qlora.yaml
│   │   ├── falcon-h1-1b-qlora.yaml
│   │   ├── falcon-h1-34b-qlora.yaml
│   │   ├── falcon-h1-3b-qlora.yaml
│   │   ├── falcon-h1-500m-qlora.yaml
│   │   └── falcon-h1-7b-qlora.yaml
│   ├── gemma2/
│   │   ├── qlora.yml
│   │   └── reward-model.yaml
│   ├── gemma3/
│   │   ├── gemma-3-1b-qlora.yml
│   │   ├── gemma-3-270m-qlora.yml
│   │   ├── gemma-3-4b-qlora.yml
│   │   └── gemma-3-4b-vision-qlora.yml
│   ├── gemma3n/
│   │   ├── README.md
│   │   ├── gemma-3n-e2b-qlora.yml
│   │   ├── gemma-3n-e2b-vision-audio-qlora.yml
│   │   └── gemma-3n-e2b-vision-qlora.yml
│   ├── glm4/
│   │   └── qlora-32b.yaml
│   ├── glm45/
│   │   ├── README.md
│   │   └── glm-45-air-qlora.yaml
│   ├── glm46v/
│   │   ├── README.md
│   │   ├── glm-4-6v-flash-ddp.yaml
│   │   └── glm-4-6v-flash-qlora.yaml
│   ├── glm47-flash/
│   │   ├── README.md
│   │   ├── lora.yaml
│   │   ├── lora_fsdp.yaml
│   │   ├── qlora.yaml
│   │   └── qlora_fsdp.yaml
│   ├── gpt-oss/
│   │   ├── README.md
│   │   ├── gpt-oss-120b-fft-fsdp2-offload.yaml
│   │   ├── gpt-oss-20b-fft-deepspeed-zero3.yaml
│   │   ├── gpt-oss-20b-fft-fsdp2-offload.yaml
│   │   ├── gpt-oss-20b-fft-fsdp2.yaml
│   │   ├── gpt-oss-20b-sft-lora-singlegpu.yaml
│   │   └── gpt-oss-safeguard-20b-sft-lora-singlegpu.yaml
│   ├── granite4/
│   │   ├── README.md
│   │   └── granite-4.0-tiny-fft.yaml
│   ├── hunyuan/
│   │   ├── README.md
│   │   └── hunyuan-v1-dense-qlora.yaml
│   ├── internvl3_5/
│   │   ├── README.md
│   │   └── internvl3_5-8b-qlora.yml
│   ├── jamba/
│   │   ├── README.md
│   │   ├── qlora.yaml
│   │   ├── qlora_deepspeed.yaml
│   │   └── qlora_fsdp_large.yaml
│   ├── kimi-linear/
│   │   ├── README.md
│   │   └── kimi-48b-lora.yaml
│   ├── llama-2/
│   │   ├── README.md
│   │   ├── fft_optimized.yml
│   │   ├── gptq-lora.yml
│   │   ├── lisa.yml
│   │   ├── loftq.yml
│   │   ├── lora.yml
│   │   ├── qlora-fsdp.yml
│   │   ├── qlora.yml
│   │   └── relora.yml
│   ├── llama-3/
│   │   ├── 3b-fp8-fsdp2.yaml
│   │   ├── 3b-qat-fsdp2.yaml
│   │   ├── 3b-qat-mxfp4.yaml
│   │   ├── 3b-qat-nvfp4.yaml
│   │   ├── README.md
│   │   ├── diffusion/
│   │   │   ├── pretrain-1b.yaml
│   │   │   └── sft-1b.yaml
│   │   ├── fft-8b-liger-fsdp.yaml
│   │   ├── fft-8b.yaml
│   │   ├── instruct-dpo-lora-8b.yml
│   │   ├── instruct-lora-8b.yml
│   │   ├── lora-1b-deduplicate-dpo.yml
│   │   ├── lora-1b-deduplicate-sft.yml
│   │   ├── lora-1b-kernels.yml
│   │   ├── lora-1b-ray.yml
│   │   ├── lora-1b-sample-packing-sequentially.yml
│   │   ├── lora-1b.yml
│   │   ├── lora-8b.yml
│   │   ├── opentelemetry-qlora.yml
│   │   ├── qlora-1b-gdpo.yaml
│   │   ├── qlora-1b-kto.yaml
│   │   ├── qlora-1b.yml
│   │   ├── qlora-fsdp-405b.yaml
│   │   ├── qlora-fsdp-70b.yaml
│   │   ├── qlora.yml
│   │   └── sparse-finetuning.yaml
│   ├── llama-3-vision/
│   │   └── lora-11b.yaml
│   ├── llama-4/
│   │   ├── README.md
│   │   ├── do-no-use-fa2/
│   │   │   ├── maverick-qlora-fsdp1.yaml
│   │   │   ├── scout-qlora-fsdp1.yaml
│   │   │   ├── scout-qlora-single-h100.yaml
│   │   │   └── scout-vision-qlora-fsdp.yaml
│   │   ├── scout-qlora-flexattn-fsdp2.yaml
│   │   ├── scout-qlora-single-h100-flex.yaml
│   │   └── scout-vision-qlora-fsdp2-flex.yaml
│   ├── llava/
│   │   └── lora-7b.yaml
│   ├── magistral/
│   │   ├── README.md
│   │   ├── magistral-small-fsdp-qlora.yaml
│   │   ├── magistral-small-qlora.yaml
│   │   ├── think/
│   │   │   ├── README.md
│   │   │   └── magistral-small-think-qlora.yaml
│   │   └── vision/
│   │       ├── README.md
│   │       └── magistral-small-vision-24B-qlora.yml
│   ├── mamba/
│   │   └── config.yml
│   ├── mimo/
│   │   ├── README.md
│   │   └── mimo-7b-qlora.yaml
│   ├── ministral/
│   │   ├── README.md
│   │   └── ministral-small-qlora.yaml
│   ├── ministral3/
│   │   ├── README.md
│   │   ├── ministral3-3b-qlora.yaml
│   │   ├── think/
│   │   │   ├── README.md
│   │   │   └── ministral3-3b-think-qlora.yaml
│   │   └── vision/
│   │       ├── README.md
│   │       └── ministral3-3b-vision-qlora.yml
│   ├── mistral/
│   │   ├── README.md
│   │   ├── bigstral/
│   │   │   └── bigstral-ds-zero3.yaml
│   │   ├── config.yml
│   │   ├── dpo/
│   │   │   └── mistral-dpo-qlora.yml
│   │   ├── lora.yml
│   │   ├── mistral-qlora-fsdp.yml
│   │   ├── mixtral/
│   │   │   ├── mixtral-8x22b-qlora-fsdp.yml
│   │   │   ├── mixtral-qlora-fsdp.yml
│   │   │   ├── mixtral.yml
│   │   │   └── mixtral_22.yml
│   │   ├── mps/
│   │   │   └── lora-mps.yml
│   │   ├── orpo/
│   │   │   └── mistral-qlora-orpo.yml
│   │   └── qlora.yml
│   ├── mistral-small/
│   │   ├── README.md
│   │   └── mistral-small-3.1-24B-lora.yml
│   ├── mistral4/
│   │   ├── README.md
│   │   ├── fft-text.yml
│   │   ├── fft-vision.yml
│   │   ├── qlora-text.yml
│   │   └── qlora-vision.yml
│   ├── nemotron/
│   │   └── nemotron-mini-4b-qlora.yaml
│   ├── olmo3/
│   │   ├── README.md
│   │   └── olmo3-7b-qlora.yaml
│   ├── orpheus/
│   │   ├── README.md
│   │   └── finetune.yml
│   ├── phi/
│   │   ├── README.md
│   │   ├── lora-3.5.yaml
│   │   ├── phi-ft.yml
│   │   ├── phi-qlora.yml
│   │   ├── phi2-ft.yml
│   │   ├── phi3-ft-fsdp.yml
│   │   └── phi3-ft.yml
│   ├── pixtral/
│   │   └── lora-12b.yml
│   ├── plano/
│   │   ├── README.md
│   │   └── plano-4b-qlora.yaml
│   ├── qat_nvfp4/
│   │   ├── Gemma3-12B_baseline.yml
│   │   ├── Gemma3-12B_qat.yml
│   │   ├── Math-Gemma3-12B_baseline.yml
│   │   ├── Math-Gemma3-12B_qat.yml
│   │   ├── Math-Gemma3-27B_baseline.yml
│   │   ├── Math-Gemma3-27B_qat.yml
│   │   ├── Math-Qwen2.5-72B_baseline.yml
│   │   ├── Math-Qwen2.5-72B_qat.yml
│   │   ├── Qwen2.5-72B_baseline.yml
│   │   └── Qwen2.5-72B_qat.yml
│   ├── qwen2/
│   │   ├── adamw-pretrain-fsdp2.yaml
│   │   ├── dpo.yaml
│   │   ├── muon-pretrain-fsdp2.yaml
│   │   ├── prm.yaml
│   │   ├── qlora-fsdp.yaml
│   │   └── reward-model.yaml
│   ├── qwen2-vl/
│   │   └── lora-7b.yaml
│   ├── qwen2_5-vl/
│   │   └── lora-7b.yaml
│   ├── qwen3/
│   │   ├── 32b-qlora.yaml
│   │   ├── 8b-qat-fsdp2.yml
│   │   ├── README.md
│   │   ├── qlora-fsdp.yaml
│   │   └── reward-model.yaml
│   ├── qwen3-next/
│   │   ├── README.md
│   │   └── qwen3-next-80b-a3b-qlora.yaml
│   ├── qwen3.5/
│   │   ├── 122b-a10b-moe-qlora-fsdp.yaml
│   │   ├── 122b-a10b-moe-qlora.yaml
│   │   ├── 27b-fft.yaml
│   │   ├── 27b-qlora-fsdp.yaml
│   │   ├── 27b-qlora.yaml
│   │   ├── 35b-a3b-moe-qlora-fsdp.yaml
│   │   ├── 35b-a3b-moe-qlora.yaml
│   │   ├── 9b-fft-vision.yaml
│   │   ├── 9b-lora-vision.yaml
│   │   └── README.md
│   ├── seed-oss/
│   │   ├── README.md
│   │   └── seed-oss-36b-qlora.yaml
│   ├── slurm/
│   │   ├── README.md
│   │   └── axolotl.slurm
│   ├── smolvlm2/
│   │   ├── README.md
│   │   └── smolvlm2-2B-lora.yaml
│   ├── streaming/
│   │   ├── README.md
│   │   ├── pretrain.yaml
│   │   └── sft.yaml
│   ├── swanlab/
│   │   ├── README.md
│   │   ├── custom_trainer_profiling.py
│   │   ├── dpo-swanlab-completions.yml
│   │   ├── dpo-swanlab-full-featured.yml
│   │   └── lora-swanlab-profiling.yml
│   ├── trinity/
│   │   ├── README.md
│   │   └── trinity-nano-preview-qlora.yaml
│   └── voxtral/
│       ├── README.md
│       ├── voxtral-mini-audio-qlora.yml
│       └── voxtral-mini-qlora.yml
├── index.qmd
├── pyproject.toml
├── requirements-dev.txt
├── requirements-tests.txt
├── requirements.txt
├── scripts/
│   ├── chat_datasets.py
│   ├── cloud-entrypoint-term.sh
│   ├── cloud-entrypoint.sh
│   ├── cutcrossentropy_install.py
│   ├── motd
│   └── unsloth_install.py
├── setup.py
├── src/
│   ├── axolotl/
│   │   ├── __init__.py
│   │   ├── cli/
│   │   │   ├── __init__.py
│   │   │   ├── args.py
│   │   │   ├── art.py
│   │   │   ├── checks.py
│   │   │   ├── cloud/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── baseten/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── template/
│   │   │   │   │       ├── run.sh
│   │   │   │   │       └── train_sft.py
│   │   │   │   └── modal_.py
│   │   │   ├── config.py
│   │   │   ├── delinearize_llama4.py
│   │   │   ├── evaluate.py
│   │   │   ├── inference.py
│   │   │   ├── main.py
│   │   │   ├── merge_lora.py
│   │   │   ├── merge_sharded_fsdp_weights.py
│   │   │   ├── preprocess.py
│   │   │   ├── quantize.py
│   │   │   ├── train.py
│   │   │   ├── utils/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── diffusion.py
│   │   │   │   ├── fetch.py
│   │   │   │   ├── load.py
│   │   │   │   ├── sweeps.py
│   │   │   │   └── train.py
│   │   │   └── vllm_serve.py
│   │   ├── common/
│   │   │   ├── __init__.py
│   │   │   ├── architectures.py
│   │   │   ├── const.py
│   │   │   └── datasets.py
│   │   ├── convert.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── attention/
│   │   │   │   └── __init__.py
│   │   │   ├── builders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── causal.py
│   │   │   │   └── rl.py
│   │   │   ├── chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── format/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── chatml.py
│   │   │   │   │   ├── llama3x.py
│   │   │   │   │   └── shared.py
│   │   │   │   └── messages.py
│   │   │   ├── datasets/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat.py
│   │   │   │   └── transforms/
│   │   │   │       ├── __init__.py
│   │   │   │       └── chat_builder.py
│   │   │   ├── trainers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── dpo/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args.py
│   │   │   │   │   └── trainer.py
│   │   │   │   ├── grpo/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args.py
│   │   │   │   │   ├── async_trainer.py
│   │   │   │   │   ├── fast_async_trainer.py
│   │   │   │   │   ├── replay_buffer.py
│   │   │   │   │   ├── sampler.py
│   │   │   │   │   └── trainer.py
│   │   │   │   ├── mamba.py
│   │   │   │   ├── mixins/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_checkpointing.py
│   │   │   │   │   ├── checkpoints.py
│   │   │   │   │   ├── distributed_parallel.py
│   │   │   │   │   ├── optimizer.py
│   │   │   │   │   ├── packing.py
│   │   │   │   │   ├── rng_state_loader.py
│   │   │   │   │   └── scheduler.py
│   │   │   │   ├── trl.py
│   │   │   │   └── utils.py
│   │   │   ├── training_args.py
│   │   │   └── training_args_base.py
│   │   ├── datasets.py
│   │   ├── evaluate.py
│   │   ├── integrations/
│   │   │   ├── LICENSE.md
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── config.py
│   │   │   ├── cut_cross_entropy/
│   │   │   │   ├── ACKNOWLEDGEMENTS.md
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   └── args.py
│   │   │   ├── densemixer/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── plugin.py
│   │   │   ├── diffusion/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── callbacks.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── plugin.py
│   │   │   │   ├── trainer.py
│   │   │   │   └── utils.py
│   │   │   ├── grokfast/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── optimizer.py
│   │   │   ├── kd/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── callbacks.py
│   │   │   │   ├── chat_template.py
│   │   │   │   ├── collator.py
│   │   │   │   ├── collator_online_teacher.py
│   │   │   │   ├── kernels/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── liger.py
│   │   │   │   │   └── models.py
│   │   │   │   ├── topk_logprob/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── forward_kl.py
│   │   │   │   ├── trainer.py
│   │   │   │   └── utils.py
│   │   │   ├── kernels/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── autotune_callback.py
│   │   │   │   ├── autotune_collector.py
│   │   │   │   ├── constants.py
│   │   │   │   ├── libs/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── scattermoe_lora/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── kernels/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── lora_ops.py
│   │   │   │   │       │   ├── ops.py
│   │   │   │   │       │   └── single.py
│   │   │   │   │       ├── layers.py
│   │   │   │   │       ├── lora_ops.py
│   │   │   │   │       ├── parallel_experts.py
│   │   │   │   │       ├── parallel_linear_lora.py
│   │   │   │   │       ├── selective_dequant.py
│   │   │   │   │       └── selective_dequant_kernel.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── sonicmoe/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── patch.py
│   │   │   │       ├── routing.py
│   │   │   │       └── weight_converter.py
│   │   │   ├── liger/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── deepseekv2.py
│   │   │   │   │   ├── jamba.py
│   │   │   │   │   ├── llama4.py
│   │   │   │   │   ├── qwen3.py
│   │   │   │   │   └── qwen3_moe.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── utils.py
│   │   │   ├── llm_compressor/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── utils.py
│   │   │   ├── lm_eval/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── cli.py
│   │   │   ├── spectrum/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── model_snr_results/
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-1.5B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-1.5B.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-3B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-3B.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-7B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-7B.json
│   │   │   │       ├── snr_results_google-gemma-2-2b.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-1B-Instruct.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-1B.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-3B-Instruct.json
│   │   │   │       └── snr_results_meta-llama-Llama-3.2-3B.json
│   │   │   └── swanlab/
│   │   │       ├── README.md
│   │   │       ├── __init__.py
│   │   │       ├── args.py
│   │   │       ├── callbacks.py
│   │   │       ├── completion_logger.py
│   │   │       ├── plugins.py
│   │   │       └── profiling.py
│   │   ├── kernels/
│   │   │   ├── __init__.py
│   │   │   ├── geglu.py
│   │   │   ├── lora.py
│   │   │   ├── quantize.py
│   │   │   ├── swiglu.py
│   │   │   └── utils.py
│   │   ├── loaders/
│   │   │   ├── __init__.py
│   │   │   ├── adapter.py
│   │   │   ├── adapters/
│   │   │   │   └── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── model.py
│   │   │   ├── patch_manager.py
│   │   │   ├── processor.py
│   │   │   ├── tokenizer.py
│   │   │   └── utils.py
│   │   ├── logging_config.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   └── mamba/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_mamba.py
│   │   │       └── modeling_mamba.py
│   │   ├── monkeypatch/
│   │   │   ├── __init__.py
│   │   │   ├── accelerate/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── fsdp2.py
│   │   │   │   └── parallelism_config.py
│   │   │   ├── attention/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── flash_attn_4.py
│   │   │   │   ├── flex_attn.py
│   │   │   │   ├── sage_attn.py
│   │   │   │   └── xformers.py
│   │   │   ├── btlm_attn_hijack_flash.py
│   │   │   ├── data/
│   │   │   │   ├── __init__.py
│   │   │   │   └── batch_dataset_fetcher.py
│   │   │   ├── deepspeed_utils.py
│   │   │   ├── fsdp2_qlora.py
│   │   │   ├── gradient_checkpointing/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── offload_cpu.py
│   │   │   │   └── offload_disk.py
│   │   │   ├── llama_attn_hijack_flash.py
│   │   │   ├── llama_attn_hijack_xformers.py
│   │   │   ├── lora_kernels.py
│   │   │   ├── loss/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chunked.py
│   │   │   │   └── eaft.py
│   │   │   ├── mistral_attn_hijack_flash.py
│   │   │   ├── mixtral/
│   │   │   │   └── __init__.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── apertus/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── activation.py
│   │   │   │   ├── kimi_linear/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── configuration_kimi.py
│   │   │   │   │   ├── modeling_kimi.py
│   │   │   │   │   ├── patch_kimi_linear.py
│   │   │   │   │   └── tokenization_kimi.py
│   │   │   │   ├── llama4/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   ├── mistral3/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── mistral_common_tokenizer.py
│   │   │   │   ├── pixtral/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling_flash_attention_utils.py
│   │   │   │   ├── qwen3_5/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   ├── qwen3_next/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   └── voxtral/
│   │   │   │       ├── __init__.py
│   │   │   │       └── modeling.py
│   │   │   ├── moe_quant.py
│   │   │   ├── multipack.py
│   │   │   ├── peft/
│   │   │   │   ├── __init__.py
│   │   │   │   └── utils.py
│   │   │   ├── relora.py
│   │   │   ├── ring_attn/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adapters/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── batch.py
│   │   │   │   └── patch.py
│   │   │   ├── scaled_softmax_attn.py
│   │   │   ├── stablelm_attn_hijack_flash.py
│   │   │   ├── tiled_mlp/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── patch.py
│   │   │   ├── trainer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── lr.py
│   │   │   │   ├── trl.py
│   │   │   │   ├── trl_vllm.py
│   │   │   │   └── utils.py
│   │   │   ├── trainer_accelerator_args.py
│   │   │   ├── trainer_fsdp_optim.py
│   │   │   ├── transformers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── trainer_context_parallel.py
│   │   │   │   └── trainer_loss_calc.py
│   │   │   ├── transformers_fa_utils.py
│   │   │   ├── unsloth_.py
│   │   │   ├── utils.py
│   │   │   └── xformers_/
│   │   │       └── __init__.py
│   │   ├── processing_strategies.py
│   │   ├── prompt_strategies/
│   │   │   ├── __init__.py
│   │   │   ├── alpaca_chat.py
│   │   │   ├── alpaca_instruct.py
│   │   │   ├── alpaca_w_system.py
│   │   │   ├── base.py
│   │   │   ├── bradley_terry/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat_template.py
│   │   │   │   └── llama3.py
│   │   │   ├── chat_template.py
│   │   │   ├── completion.py
│   │   │   ├── context_qa.py
│   │   │   ├── creative_acr.py
│   │   │   ├── dpo/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat_template.py
│   │   │   │   ├── chatml.py
│   │   │   │   ├── llama3.py
│   │   │   │   ├── passthrough.py
│   │   │   │   ├── user_defined.py
│   │   │   │   └── zephyr.py
│   │   │   ├── input_output.py
│   │   │   ├── jinja_template_analyzer.py
│   │   │   ├── kto/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chatml.py
│   │   │   │   ├── llama3.py
│   │   │   │   └── user_defined.py
│   │   │   ├── llama2_chat.py
│   │   │   ├── messages/
│   │   │   │   ├── __init__.py
│   │   │   │   └── chat.py
│   │   │   ├── metharme.py
│   │   │   ├── orcamini.py
│   │   │   ├── orpo/
│   │   │   │   ├── __init__.py
│   │   │   │   └── chat_template.py
│   │   │   ├── pretrain.py
│   │   │   ├── pygmalion.py
│   │   │   ├── stepwise_supervised.py
│   │   │   └── user_defined.py
│   │   ├── prompt_tokenizers.py
│   │   ├── prompters.py
│   │   ├── scripts/
│   │   │   ├── __init__.py
│   │   │   ├── vllm_serve_lora.py
│   │   │   └── vllm_worker_ext.py
│   │   ├── telemetry/
│   │   │   ├── __init__.py
│   │   │   ├── callbacks.py
│   │   │   ├── errors.py
│   │   │   ├── manager.py
│   │   │   ├── runtime_metrics.py
│   │   │   └── whitelist.yaml
│   │   ├── train.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── bench.py
│   │       ├── callbacks/
│   │       │   ├── __init__.py
│   │       │   ├── comet_.py
│   │       │   ├── dynamic_checkpoint.py
│   │       │   ├── generation.py
│   │       │   ├── lisa.py
│   │       │   ├── mlflow_.py
│   │       │   ├── models.py
│   │       │   ├── opentelemetry.py
│   │       │   ├── perplexity.py
│   │       │   ├── profiler.py
│   │       │   ├── qat.py
│   │       │   ├── swanlab.py
│   │       │   ├── tokens_per_second.py
│   │       │   └── trackio_.py
│   │       ├── chat_templates/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   └── templates/
│   │       │       ├── alpaca.jinja
│   │       │       ├── aya.jinja
│   │       │       ├── chatml.jinja
│   │       │       ├── cohere.jinja
│   │       │       ├── command_a.jinja
│   │       │       ├── command_a_rag.jinja
│   │       │       ├── command_a_tool_use.jinja
│   │       │       ├── deepseek_v2.jinja
│   │       │       ├── deepseek_v3.jinja
│   │       │       ├── exaone.jinja
│   │       │       ├── exaone4.jinja
│   │       │       ├── falcon_h1.jinja
│   │       │       ├── gemma.jinja
│   │       │       ├── gemma3.jinja
│   │       │       ├── gemma3n.jinja
│   │       │       ├── jamba.jinja
│   │       │       ├── llama3.jinja
│   │       │       ├── llama3_2_vision.jinja
│   │       │       ├── llama4.jinja
│   │       │       ├── llava.jinja
│   │       │       ├── metharme.jinja
│   │       │       ├── mistral_v1.jinja
│   │       │       ├── mistral_v2v3.jinja
│   │       │       ├── mistral_v3_tekken.jinja
│   │       │       ├── mistral_v7_tekken.jinja
│   │       │       ├── phi_3.jinja
│   │       │       ├── phi_35.jinja
│   │       │       ├── phi_4.jinja
│   │       │       ├── pixtral.jinja
│   │       │       ├── qwen2_vl.jinja
│   │       │       ├── qwen3.jinja
│   │       │       ├── qwen3_5.jinja
│   │       │       └── qwen_25.jinja
│   │       ├── collators/
│   │       │   ├── __init__.py
│   │       │   ├── batching.py
│   │       │   ├── core.py
│   │       │   ├── mamba.py
│   │       │   └── mm_chat.py
│   │       ├── comet_.py
│   │       ├── config/
│   │       │   ├── __init__.py
│   │       │   └── models/
│   │       │       └── __init__.py
│   │       ├── ctx_managers/
│   │       │   ├── __init__.py
│   │       │   └── sequence_parallel.py
│   │       ├── data/
│   │       │   ├── __init__.py
│   │       │   ├── lock.py
│   │       │   ├── rl.py
│   │       │   ├── sft.py
│   │       │   ├── shared.py
│   │       │   ├── streaming.py
│   │       │   ├── utils.py
│   │       │   └── wrappers.py
│   │       ├── datasets.py
│   │       ├── dict.py
│   │       ├── distributed.py
│   │       ├── environment.py
│   │       ├── freeze.py
│   │       ├── generation/
│   │       │   ├── __init__.py
│   │       │   └── sft.py
│   │       ├── import_helper.py
│   │       ├── logging.py
│   │       ├── lora.py
│   │       ├── mistral/
│   │       │   ├── __init__.py
│   │       │   ├── mistral3_processor.py
│   │       │   └── mistral_tokenizer.py
│   │       ├── mlflow_.py
│   │       ├── model_shard_quant.py
│   │       ├── optimizers/
│   │       │   ├── __init__.py
│   │       │   └── adopt.py
│   │       ├── quantization.py
│   │       ├── samplers/
│   │       │   ├── __init__.py
│   │       │   ├── multipack.py
│   │       │   └── utils.py
│   │       ├── schedulers.py
│   │       ├── schemas/
│   │       │   ├── __init__.py
│   │       │   ├── config.py
│   │       │   ├── datasets.py
│   │       │   ├── deprecated.py
│   │       │   ├── dynamic_checkpoint.py
│   │       │   ├── enums.py
│   │       │   ├── fsdp.py
│   │       │   ├── integrations.py
│   │       │   ├── internal/
│   │       │   │   └── __init__.py
│   │       │   ├── model.py
│   │       │   ├── multimodal.py
│   │       │   ├── peft.py
│   │       │   ├── quantization.py
│   │       │   ├── training.py
│   │       │   ├── trl.py
│   │       │   ├── utils.py
│   │       │   ├── validation.py
│   │       │   └── vllm.py
│   │       ├── tee.py
│   │       ├── tokenization.py
│   │       ├── trackio_.py
│   │       ├── train.py
│   │       ├── trainer.py
│   │       └── wandb_.py
│   └── setuptools_axolotl_dynamic_dependencies.py
├── styles.css
└── tests/
    ├── __init__.py
    ├── cli/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── test_cli_base.py
    │   ├── test_cli_evaluate.py
    │   ├── test_cli_fetch.py
    │   ├── test_cli_inference.py
    │   ├── test_cli_interface.py
    │   ├── test_cli_merge_lora.py
    │   ├── test_cli_merge_sharded_fsdp_weights.py
    │   ├── test_cli_preprocess.py
    │   ├── test_cli_sweeps.py
    │   ├── test_cli_train.py
    │   ├── test_cli_version.py
    │   ├── test_nested_options.py
    │   └── test_utils.py
    ├── conftest.py
    ├── constants.py
    ├── core/
    │   ├── chat/
    │   │   ├── __init__.py
    │   │   ├── format/
    │   │   │   └── __init__.py
    │   │   └── test_messages.py
    │   ├── test_async_grpo.py
    │   └── test_builders.py
    ├── e2e/
    │   ├── .gitignore
    │   ├── __init__.py
    │   ├── integrations/
    │   │   ├── test_cut_cross_entropy.py
    │   │   ├── test_fp8.py
    │   │   ├── test_hooks.py
    │   │   ├── test_kd.py
    │   │   ├── test_liger.py
    │   │   ├── test_llm_compressor.py
    │   │   ├── test_scattermoe_lora_kernels.py
    │   │   ├── test_scattermoe_lora_olmoe.py
    │   │   └── test_sonicmoe.py
    │   ├── kernels/
    │   │   ├── test_geglu.py
    │   │   ├── test_lora.py
    │   │   ├── test_quantize.py
    │   │   └── test_swiglu.py
    │   ├── multigpu/
    │   │   ├── __init__.py
    │   │   ├── patched/
    │   │   │   ├── __init__.py
    │   │   │   └── test_sp.py
    │   │   ├── solo/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_flex.py
    │   │   │   ├── test_gdpo.py
    │   │   │   └── test_grpo.py
    │   │   ├── test_dist_muon_fsdp2.py
    │   │   ├── test_eval.py
    │   │   ├── test_fp8_fsdp2.py
    │   │   ├── test_fsdp1.py
    │   │   ├── test_fsdp2.py
    │   │   ├── test_gemma3.py
    │   │   ├── test_llama.py
    │   │   ├── test_locking.py
    │   │   ├── test_ray.py
    │   │   └── test_tp.py
    │   ├── patched/
    │   │   ├── __init__.py
    │   │   ├── lora_kernels/
    │   │   │   ├── __init__.py
    │   │   │   └── test_lora_kernel_patching.py
    │   │   ├── test_4d_multipack_llama.py
    │   │   ├── test_activation_checkpointing.py
    │   │   ├── test_cli_integrations.py
    │   │   ├── test_fa_xentropy.py
    │   │   ├── test_falcon_samplepack.py
    │   │   ├── test_flattening.py
    │   │   ├── test_fsdp2_qlora.py
    │   │   ├── test_fused_llama.py
    │   │   ├── test_llama_s2_attention.py
    │   │   ├── test_lora_llama_multipack.py
    │   │   ├── test_mistral_samplepack.py
    │   │   ├── test_mixtral_samplepack.py
    │   │   ├── test_model_patches.py
    │   │   ├── test_peft_embeddings.py
    │   │   ├── test_phi_multipack.py
    │   │   ├── test_resume.py
    │   │   ├── test_unsloth_integration.py
    │   │   └── test_unsloth_qlora.py
    │   ├── solo/
    │   │   ├── __init__.py
    │   │   ├── test_flex.py
    │   │   └── test_relora_llama.py
    │   ├── test_activation_offloading.py
    │   ├── test_deepseekv3.py
    │   ├── test_diffusion.py
    │   ├── test_dpo.py
    │   ├── test_embeddings_lr.py
    │   ├── test_evaluate.py
    │   ├── test_falcon.py
    │   ├── test_gemma2.py
    │   ├── test_gemma3_text.py
    │   ├── test_imports.py
    │   ├── test_llama.py
    │   ├── test_llama_pretrain.py
    │   ├── test_llama_vision.py
    │   ├── test_load_model.py
    │   ├── test_lora_llama.py
    │   ├── test_mamba.py
    │   ├── test_mistral.py
    │   ├── test_mixtral.py
    │   ├── test_optimizers.py
    │   ├── test_packing_loss.py
    │   ├── test_phi.py
    │   ├── test_preprocess.py
    │   ├── test_process_reward_model_smollm2.py
    │   ├── test_profiler.py
    │   ├── test_qat.py
    │   ├── test_quantization.py
    │   ├── test_qwen.py
    │   ├── test_reward_model_smollm2.py
    │   ├── test_save_first_step.py
    │   ├── test_schedulers.py
    │   ├── test_streaming.py
    │   ├── test_tokenizer.py
    │   └── utils.py
    ├── fixtures/
    │   ├── alpaca/
    │   │   └── alpaca.json
    │   ├── conversation.json
    │   ├── conversation.missingturns.json
    │   ├── conversation.tokenized.json
    │   └── conversation.tokenized_llama2chat.json
    ├── hf_offline_utils.py
    ├── integrations/
    │   ├── __init__.py
    │   ├── test_diffusion.py
    │   ├── test_diffusion_callback.py
    │   ├── test_kd_chat_template.py
    │   ├── test_liger.py
    │   ├── test_routing_parity.py
    │   ├── test_scattermoe_autotune_telemetry.py
    │   ├── test_scattermoe_lora.py
    │   ├── test_scattermoe_lora_kernels.py
    │   ├── test_sonicmoe.py
    │   ├── test_sonicmoe_gradients.py
    │   └── test_swanlab.py
    ├── monkeypatch/
    │   ├── test_llama_attn_hijack_flash.py
    │   ├── test_pixtral_flash_attention_patch.py
    │   ├── test_qwen3_next_modeling_patch.py
    │   ├── test_trainer_accelerator_args.py
    │   ├── test_trainer_context_parallel_patch.py
    │   ├── test_trainer_loss_calc.py
    │   ├── test_trl_vllm.py
    │   └── test_voxtral_modeling_patch.py
    ├── patched/
    │   └── test_validation.py
    ├── prompt_strategies/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── messages/
    │   │   ├── __init__.py
    │   │   └── test_chat.py
    │   ├── test_alpaca.py
    │   ├── test_chat_template_ds_schema_unification.py
    │   ├── test_chat_template_utils.py
    │   ├── test_chat_templates.py
    │   ├── test_chat_templates_advanced.py
    │   ├── test_chat_templates_mistral.py
    │   ├── test_chat_templates_thinking.py
    │   ├── test_chat_templates_tool_call_string_arguments.py
    │   ├── test_dpo_chat_templates.py
    │   ├── test_dpo_chatml.py
    │   ├── test_jinja_template_analyzer.py
    │   ├── test_raw_io.py
    │   └── test_stepwise.py
    ├── telemetry/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── test_callbacks.py
    │   ├── test_errors.py
    │   ├── test_manager.py
    │   └── test_runtime_metrics.py
    ├── test_chunked_xentropy.py
    ├── test_context_parallel_batch_size.py
    ├── test_convert.py
    ├── test_data.py
    ├── test_datasets.py
    ├── test_dict.py
    ├── test_exact_deduplication.py
    ├── test_freeze.py
    ├── test_loaders.py
    ├── test_logging_config_file_capture.py
    ├── test_lora.py
    ├── test_normalize_config.py
    ├── test_opentelemetry_callback.py
    ├── test_packed_batch_sampler.py
    ├── test_packed_dataset.py
    ├── test_packed_pretraining.py
    ├── test_perplexity.py
    ├── test_prompt_tokenizers.py
    ├── test_prompters.py
    ├── test_revision_parameter.py
    ├── test_save_deduplicated.py
    ├── test_schedulers.py
    ├── test_streaming.py
    ├── test_tensor_parallel_batch_size.py
    ├── test_tokenizers.py
    ├── test_train.py
    ├── test_triton_kernels.py
    ├── test_utils_tee.py
    ├── test_validation_dataset.py
    └── utils/
        ├── callbacks/
        │   └── test_dynamic_checkpoint.py
        ├── data/
        │   └── test_utils.py
        ├── lora/
        │   ├── test_config_validation_lora.py
        │   ├── test_freeze_lora.py
        │   └── test_merge_lora.py
        ├── schemas/
        │   └── validation/
        │       ├── test_activation_offloading.py
        │       ├── test_default_values.py
        │       ├── test_fsdp.py
        │       └── test_moe_quant.py
        ├── test_grpo_rw_fnc.py
        ├── test_import_helper.py
        ├── test_mistral3_processor.py
        └── test_train.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .axolotl-complete.bash
================================================
#!/bin/bash

_axolotl_completions() {
    local cur prev
    COMPREPLY=()
    cur="${COMP_WORDS[COMP_CWORD]}"
    prev="${COMP_WORDS[COMP_CWORD-1]}"

    # If we're completing the first argument (the command)
    if [[ $COMP_CWORD -eq 1 ]]; then
        mapfile -t COMPREPLY < <(compgen -W "delinearize-llama4 fetch lm-eval merge-sharded-fsdp-weights quantize vllm-serve evaluate inference merge-lora preprocess train" -- "$cur")
        return 0
    fi

    # Commands that should complete with directories and YAML files
    local -a yaml_commands=("merge-sharded-fsdp-weights" "quantize" "vllm-serve" "evaluate" "inference" "merge-lora" "preprocess" "train")

    # Check if previous word is in our list
    if [[ " ${yaml_commands[*]} " =~ (^|[[:space:]])$prev($|[[:space:]]) ]]; then
        # Use filename completion which handles directories properly
        compopt -o filenames
        mapfile -t COMPREPLY < <(compgen -f -- "$cur")

        # Filter to only include directories and YAML files
        local -a filtered=()
        for item in "${COMPREPLY[@]}"; do
            if [[ -d "$item" ]] || [[ "$item" == *.yaml ]] || [[ "$item" == *.yml ]]; then
                filtered+=("$item")
            fi
        done
        COMPREPLY=("${filtered[@]}")

        return 0
    fi

    # Default: no completion
    return 0
}

# Remove the -o nospace option - let filenames handle it
complete -F _axolotl_completions axolotl


================================================
FILE: .bandit
================================================
[bandit]
exclude = tests
skips = B101,B615,B102,B110


================================================
FILE: .coderabbit.yaml
================================================
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"
early_access: false
reviews:
  profile: "chill"
  request_changes_workflow: false
  high_level_summary: true
  review_status: true
  collapse_walkthrough: true
  poem: false
  sequence_diagrams: false
  auto_review:
    enabled: true
    drafts: false
    auto_incremental_review: false
chat:
  auto_reply: true


================================================
FILE: .coveragerc
================================================
[run]
source = axolotl
omit =
    */tests/*
    setup.py

[report]
exclude_lines =
    pragma: no cover
    def __repr__
    raise NotImplementedError
    if __name__ == .__main__.:
    pass
    raise ImportError


================================================
FILE: .editorconfig
================================================
root = true

[*]
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

[*.py]
indent_style = space
indent_size = 4

[**.yml]
indent_style = space
indent_size = 2


================================================
FILE: .gitattributes
================================================
data/*.jsonl filter=lfs diff=lfs merge=lfs -text


================================================
FILE: .github/CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
  and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
  overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
  advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
  address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement on Discord
at https://discord.gg/QYF8QrtEUm

All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior,  harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.

Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.


================================================
FILE: .github/CONTRIBUTING.md
================================================
# Contributing to axolotl

First of all, thank you for your interest in contributing to axolotl! We appreciate the time and effort you're willing to invest in making our project better. This document provides guidelines and information to make the contribution process as smooth as possible.

## Table of Contents

- [Code of Conduct](#code-of-conduct)
- [Getting Started](#getting-started)
- [How to Contribute](#how-to-contribute)
  - [Reporting Bugs](#reporting-bugs)
  - [Suggesting Enhancements](#suggesting-enhancements)
  - [Submitting Pull Requests](#submitting-pull-requests)
- [Style Guidelines](#style-guidelines)
  - [Code Style](#code-style)
  - [Commit Messages](#commit-messages)
- [Additional Resources](#additional-resources)

## Code of Conduct

All contributors are expected to adhere to our [Code of Conduct](CODE_OF_CONDUCT.md). Please read it before participating in the axolotl community.

## Getting Started

Bugs? Please check for open issue else create a new [Issue](https://github.com/axolotl-ai-cloud/axolotl/issues/new).

PRs are **greatly welcome**!

1. Fork the repository and clone it to your local machine.
2. Set up the development environment by following the instructions in the [README.md](https://github.com/axolotl-ai-cloud/axolotl/tree/main/README.md) file.
3. Explore the codebase, run tests, and verify that everything works as expected.

Please run below to setup env
```bash
pip3 install -r requirements-dev.txt -r requirements-tests.txt
pre-commit install

# test
pytest tests/
```

## How to Contribute

### Reporting Bugs

If you encounter a bug or issue while using axolotl, please open a new issue on the [GitHub Issues](https://github.com/axolotl-ai-cloud/axolotl/issues) page. Provide a clear and concise description of the problem, steps to reproduce it, and any relevant error messages or logs.

### Suggesting Enhancements

We welcome ideas for improvements and new features. To suggest an enhancement, open a new issue on the [GitHub Issues](https://github.com/axolotl-ai-cloud/axolotl/issues) page. Describe the enhancement in detail, explain the use case, and outline the benefits it would bring to the project.

### Submitting Pull Requests

1. Create a new branch for your feature or bugfix. Use a descriptive name like `feature/your-feature-name` or `fix/your-bugfix-name`.
2. Make your changes, following the [Style Guidelines](#style-guidelines) below.
3. Test your changes and ensure that they don't introduce new issues or break existing functionality.
4. Commit your changes, following the [commit message guidelines](#commit-messages).
5. Push your branch to your fork on GitHub.
6. Open a new pull request against the `main` branch of the axolotl repository. Include a clear and concise description of your changes, referencing any related issues.

#### Skipping CI Checks

You can skip certain CI checks by including specific keywords in your commit messages:

- `[skip ci]` or `skip ci` - Skips all CI checks for that commit
- `[skip-e2e]` or `skip-e2e` - Skips only end-to-end tests while running other CI checks. You may also include this in the title of your PR to disable end-to-end tests for the entire PR.

## Style Guidelines

### Code Style

axolotl uses [Ruff](https://docs.astral.sh/ruff/) as its code style guide. Please ensure that your code follows these guidelines.

Use the pre-commit linter to ensure that your code is formatted consistently.
```bash
pre-commit run --all-files
```

### Commit Messages

Write clear and concise commit messages that briefly describe the changes made in each commit. Use the imperative mood and start with a capitalized verb, e.g., "Add new feature" or "Fix bug in function".

## Additional Resources

- [GitHub Help](https://help.github.com/)
- [GitHub Pull Request Documentation](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests)
- [Ruff](https://docs.astral.sh/ruff/)

Thank you once again for your interest in contributing to axolotl. We look forward to collaborating with you and creating an even better project together!


================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms

github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yaml
================================================
name: Bug Report
description: File a bug report
labels: ["bug", "needs triage"]
body:
  - type: markdown
    attributes:
      value: |
        ## Before you start
        Please **make sure you are on the latest version.**
        If you encountered the issue after you installed, updated, or reloaded, **please try restarting before reporting the bug**.

  - type: checkboxes
    id: no-duplicate-issues
    attributes:
      label: "Please check that this issue hasn't been reported before."
      description: "The **Label filters** may help make your search more focussed."
      options:
        - label: "I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports."
          required: true

  - type: textarea
    id: expected
    attributes:
      label: Expected Behavior
      description: Tell us what **should** happen.
    validations:
      required: true

  - type: textarea
    id: what-happened
    attributes:
      label: Current behaviour
      description: |
        Tell us what happens instead of the expected behavior.
        Provide stacktrace and/or screenshots.
    validations:
      required: true

  - type: textarea
    id: reproduce
    attributes:
      label: Steps to reproduce
      description: |
        Which exact steps can a developer take to reproduce the issue?
        The more detail you provide, the easier it will be to narrow down and fix the bug.
        Please paste in tasks and/or queries **as text, not screenshots**.
      placeholder: |
        Example of the level of detail needed to reproduce any bugs efficiently and reliably.
        1. Go to the '...' page.
        2. Click on the '...' button.
        3. Scroll down to '...'.
        4. Observe the error.
    validations:
      required: true

  - type: textarea
    id: config
    attributes:
      label: Config yaml
      description: |
        Please attach the config yaml!
      render: yaml

  - type: textarea
    id: possible-solution
    attributes:
      label: Possible solution
      description: |
        Not obligatory, but please suggest a fix or reason for the bug, if you have an idea.


  - type: checkboxes
    id: operating-systems
    attributes:
      label: Which Operating Systems are you using?
      description: You may select more than one.
      options:
        - label: Linux
        - label: macOS
        - label: Windows

  - type: input
    id: Python-version
    attributes:
      label: Python Version
      description: Which {Programming} version are you using?
      placeholder: 3.10 / please change accordingly
    validations:
      required: true

  - type: input
    id: axolotl-branch-commit
    attributes:
      label: axolotl branch-commit
      description: On which branch/commit are you?
      placeholder: main/4d6490b
    validations:
      required: true

  - type: checkboxes
    id: acknowledgements
    attributes:
      label: 'Acknowledgements'
      description: 'Please confirm the following:'
      options:
        - label: 'My issue title is concise, descriptive, and in title casing.'
          required: true
        - label: 'I have searched the existing issues to make sure this bug has not been reported yet.'
          required: true
        - label: 'I am using the latest version of axolotl.'
          required: true
        - label: 'I have provided enough information for the maintainers to reproduce and diagnose the issue.'
          required: true


================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false
contact_links:
  - name: Ask a question
    url: https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/q-a
    about: Ask questions and discuss with other community members
  - name: Discuss the Project in Discord
    url: https://discord.gg/HhrNrHJPRb


================================================
FILE: .github/ISSUE_TEMPLATE/docs.yml
================================================
name: Documentation Improvement / Clarity
description: Make a suggestion to improve the project documentation.
labels: ['needs triage', 'docs']
body:
  - type: markdown
    attributes:
      value: '## :book: Documentation :book:'
  - type: markdown
    attributes:
      value: |
        * Ask questions in [Discord](https://discord.gg/HhrNrHJPRb).
        * Before you file an issue read the [Contributing guide](./CONTRIBUTING.md).
        * Check to make sure someone hasn't already opened a [similar issue](https://github.com/axolotl-ai-cloud/axolotl/issues).
  - type: textarea
    attributes:
      label: What piece of documentation is affected?
      description: Please link to the article you'd like to see updated.
    validations:
      required: true
  - type: textarea
    attributes:
      label: What part(s) of the article would you like to see updated?
      description: |
        - Give as much detail as you can to help us understand the change you want to see.
        - Why should the docs be changed? What use cases does it support?
        - What is the expected outcome?
    validations:
      required: true
  - type: textarea
    attributes:
      label: Additional Information
      description: Add any other context or screenshots about the feature request here.
    validations:
      required: false
  - type: checkboxes
    id: acknowledgements
    attributes:
      label: 'Acknowledgements'
      description: 'Please confirm the following:'
      options:
        - label: 'My issue title is concise, descriptive, and in title casing.'
          required: true
        - label: 'I have searched the existing issues to make sure this feature has not been requested yet.'
          required: true
        - label: 'I have provided enough information for the maintainers to understand and evaluate this request.'
          required: true


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.yaml
================================================
name: Feature Request / Enhancement
description: Suggest a new feature or feature enhancement for the project
labels: ["enhancement", "needs triage"]
body:
  - type: checkboxes
    id: no-duplicate-issues
    attributes:
      label: "⚠️ Please check that this feature request hasn't been suggested before."
      description: "There are two locations for previous feature requests. Please search in both. Thank you. The **Label filters** may help make your search more focussed."
      options:
        - label: "I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) didn't find any similar feature requests."
          required: true
        - label: "I searched previous [Issues](https://github.com/axolotl-ai-cloud/axolotl/labels/enhancement) didn't find any similar feature requests."
          required: true

  - type: textarea
    id: feature-description
    validations:
      required: true
    attributes:
      label: "🔖 Feature description"
      description: "A clear and concise description of what the feature request is."
      placeholder: "You should add ..."

  - type: textarea
    id: solution
    validations:
      required: true
    attributes:
      label: "✔️ Solution"
      description: "A clear and concise description of what you want to happen, and why."
      placeholder: "In my use-case, ..."

  - type: textarea
    id: alternatives
    validations:
      required: false
    attributes:
      label: "❓ Alternatives"
      description: "A clear and concise description of any alternative solutions or features you've considered."
      placeholder: "I have considered ..."

  - type: textarea
    id: additional-context
    validations:
      required: false
    attributes:
      label: "📝 Additional Context"
      description: "Add any other context or screenshots about the feature request here."
      placeholder: "..."

  - type: checkboxes
    id: acknowledgements
    attributes:
      label: 'Acknowledgements'
      description: 'Please confirm the following:'
      options:
        - label: 'My issue title is concise, descriptive, and in title casing.'
          required: true
        - label: 'I have searched the existing issues to make sure this feature has not been requested yet.'
          required: true
        - label: 'I have provided enough information for the maintainers to understand and evaluate this request.'
          required: true


================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
<!--- Provide a general summary of your changes in the Title above -->

# Description

<!--- Describe your changes in detail -->

## Motivation and Context

<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

## How has this been tested?

<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, tests ran to see how -->
<!--- your change affects other areas of the code, etc. -->

## AI Usage Disclaimer

<!--- Was AI (e.g., ChatGPT, Claude, Copilot) used to generate or assist with this PR? -->
<!--- Please indicate: No / Yes (specify which tool and to what extent) -->

## Screenshots (if appropriate)

## Types of changes

<!--- What types of changes does your code introduce? Put an `x` in all the boxes that apply: -->

## Social Handles (Optional)

<!-- Thanks for submitting a bugfix or enhancement. -->
<!-- We'd love to show our thanks to you on Twitter & Discord if you provide your handle -->


================================================
FILE: .github/SECURITY.md
================================================
# Security Policy

## Supported Versions

Due to the nature of the fast development that is happening in this project, only the latest released version can be supported.

## Reporting a Vulnerability

If you find a vulnerability, please contact us on  [Discord](https://discord.gg/xcu3ECkH9a) rather than creating a GitHub issue to allow us some time to fix it before it is a known vulnerability to others.


================================================
FILE: .github/SUPPORT.md
================================================
# Support

If you need help with this project or have questions, please:

1. Check the documentation.
2. Search the existing issues and pull requests.
3. Create a new issue if your question is not answered or your problem is not solved.
4. Have a look in the [Discord server](https://discord.gg/HhrNrHJPRb)

Please note that this project is maintained by volunteers who have limited availability. We'll do our best to address your questions and concerns in a timely manner.


================================================
FILE: .github/release-drafter.yml
================================================
name-template: 'v$RESOLVED_VERSION'
tag-template: 'v$RESOLVED_VERSION'
categories:
  - title: '🚀 Features'
    labels:
      - 'feature'
      - 'enhancement'
  - title: '🐛 Bug Fixes'
    labels:
      - 'fix'
      - 'bugfix'
      - 'bug'
  - title: '🧰 Maintenance'
    label: 'chore'
change-template: '- $TITLE @$AUTHOR (#$NUMBER)'
change-title-escapes: '\<*_&' # You can add # and @ to disable mentions, and add ` to disable code blocks.
version-resolver:
  major:
    labels:
      - 'major'
  minor:
    labels:
      - 'minor'
  patch:
    labels:
      - 'patch'
  default: patch
template: |
  ## What’s Changed

  $CHANGES


================================================
FILE: .github/workflows/base.yml
================================================
name: ci-cd-base

on:
  push:
    branches:
      - "main"
    paths:
      - 'docker/Dockerfile-base'
      - 'docker/Dockerfile-uv-base'
      - '.github/workflows/base.yml'
  pull_request:
    paths:
      - 'docker/Dockerfile-base'
      - 'docker/Dockerfile-uv-base'
      - '.github/workflows/base.yml'
  workflow_dispatch:

permissions:
  contents: read

jobs:
  build-base:
    if: ${{ github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
    timeout-minutes: 480
    # this job needs to be run on self-hosted GPU runners...
    runs-on: ubuntu-latest-m
    env:
      HAS_DOCKERHUB_CREDS: ${{ secrets.DOCKERHUB_USERNAME != '' && secrets.DOCKERHUB_TOKEN != '' }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.8.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.1
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.10.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.10.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
#          - cuda: "129"
#            cuda_version: 12.9.1
#            cudnn_version: ""
#            python_version: "3.12"
#            pytorch: 2.9.1
#            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
#            dockerfile: "Dockerfile-base"
#            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.1
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.9.1
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.10.0
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-base"
            platforms: "linux/amd64,linux/arm64"
#          - cuda: "128"
#            cuda_version: 12.8.1
#            cudnn_version: ""
#            python_version: "3.11"
#            pytorch: nightly
#            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
#            dockerfile: "Dockerfile-base-nightly"
#          # "next" is for release candidates of pytorch
#          - cuda: "128"
#            cuda_version: 12.8.1
#            cudnn_version: ""
#            python_version: "3.11"
#            pytorch: next
#            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
#            dockerfile: "Dockerfile-base-next"
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-base
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        if: ${{ github.event_name != 'pull_request' && env.HAS_DOCKERHUB_CREDS == 'true' }}
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ./docker/${{ matrix.dockerfile }}
          platforms: ${{ matrix.platforms }}
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.metadata.outputs.tags }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
          labels: ${{ steps.metadata.outputs.labels }}
          build-args: |
            CUDA_VERSION=${{ matrix.cuda_version }}
            CUDNN_VERSION=${{ matrix.cudnn_version }}
            CUDA=${{ matrix.cuda }}
            PYTHON_VERSION=${{ matrix.python_version }}
            PYTORCH_VERSION=${{ matrix.pytorch }}
            TORCH_CUDA_ARCH_LIST=${{ matrix.torch_cuda_arch_list }}
  build-base-uv:
    if: ${{ github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
    timeout-minutes: 480
    runs-on: ubuntu-latest-m
    env:
      HAS_DOCKERHUB_CREDS: ${{ secrets.DOCKERHUB_USERNAME != '' && secrets.DOCKERHUB_TOKEN != '' }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.8.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.1
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.9.1
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.10.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "128"
            cuda_version: 12.8.1
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.10.0
            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
#          - cuda: "129"
#            cuda_version: 12.9.1
#            cudnn_version: ""
#            python_version: "3.12"
#            pytorch: 2.9.1
#            torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
#            dockerfile: "Dockerfile-uv-base"
#            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.11"
            pytorch: 2.9.1
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.9.1
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
          - cuda: "130"
            cuda_version: 13.0.0
            cudnn_version: ""
            python_version: "3.12"
            pytorch: 2.10.0
            torch_cuda_arch_list: "9.0+PTX"
            dockerfile: "Dockerfile-uv-base"
            platforms: "linux/amd64,linux/arm64"
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-base-uv
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        if: ${{ github.event_name != 'pull_request' && env.HAS_DOCKERHUB_CREDS == 'true' }}
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          file: ./docker/${{ matrix.dockerfile }}
          platforms: ${{ matrix.platforms }}
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.metadata.outputs.tags }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
          labels: ${{ steps.metadata.outputs.labels }}
          build-args: |
            CUDA_VERSION=${{ matrix.cuda_version }}
            CUDNN_VERSION=${{ matrix.cudnn_version }}
            CUDA=${{ matrix.cuda }}
            PYTHON_VERSION=${{ matrix.python_version }}
            PYTORCH_VERSION=${{ matrix.pytorch }}
            TORCH_CUDA_ARCH_LIST=${{ matrix.torch_cuda_arch_list }}


================================================
FILE: .github/workflows/docs.yml
================================================
name: Publish Docs
on:
  push:
    branches:
      - main

permissions:
    contents: write
    pages: write

jobs:
    build-deploy:
        runs-on: ubuntu-latest
        steps:
        - name: cleanup node
          run: |
            sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL
        - name: Check out repository
          uses: actions/checkout@v4
        - name: Set up Quarto
          uses: quarto-dev/quarto-actions/setup@v2
        - name: Setup Python
          uses: actions/setup-python@v5
          with:
            python-version: '3.11'
        - name: Install dependencies
          run: |
            python3 -m pip install jupyter quartodoc
            python3 -m pip install -e .
        - name: Build autodoc
          run: quartodoc build
        - name: Publish to GitHub Pages (and render)
          uses: quarto-dev/quarto-actions/publish@v2
          with:
            target: gh-pages
          env:
            GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}


================================================
FILE: .github/workflows/lint.yml
================================================
name: lint
on:
  # check on PRs, and manual triggers
  merge_group:
  pull_request:
      types: [opened, synchronize, reopened, ready_for_review]
      paths:
       - '**.py'
       - 'requirements.txt'
       - '.github/workflows/*.yml'
       - "*.[q]md"
       - "examples/**/*.y[a]?ml"
       - ".pre-commit-config.yaml"
  workflow_dispatch:

permissions:
  contents: read

jobs:
  pre-commit:
    name: pre-commit
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: 'pip' # caching pip dependencies
      - uses: pre-commit/action@v3.0.1


================================================
FILE: .github/workflows/main.yml
================================================
name: ci-cd

on:
  push:
    branches:
      - "main"
    tags:
      - "v*"
  workflow_dispatch:

permissions:
  contents: read

jobs:
  build-axolotl:
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            axolotl_extras:
            platforms: "linux/amd64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
            is_latest: true
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
#          - cuda: 129
#            cuda_version: 12.9.1
#            python_version: "3.12"
#            pytorch: 2.9.1
#            axolotl_extras:
#            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl
          tags: |
            type=ref,event=branch
            type=pep440,pattern={{version}}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      # guidance for testing before pushing: https://docs.docker.com/build/ci/github-actions/test-before-push/
      - name: Build and export to Docker
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platforms }}
          build-args: |
            BASE_TAG=${{ github.ref_type == 'tag' && 'main' || github.ref_name }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
            CUDA=${{ matrix.cuda }}
            PYTORCH_VERSION=${{ matrix.pytorch }}
            AXOLOTL_ARGS=${{ matrix.axolotl_args }}
            AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}
          file: ./docker/Dockerfile
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
            ${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
          labels: ${{ steps.metadata.outputs.labels }}

  build-axolotl-uv:
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
            is_latest: true
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-uv
          tags: |
            type=ref,event=branch
            type=pep440,pattern={{version}}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      # guidance for testing before pushing: https://docs.docker.com/build/ci/github-actions/test-before-push/
      - name: Build and export to Docker
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platforms }}
          build-args: |
            BASE_TAG=${{ github.ref_type == 'tag' && 'main' || github.ref_name }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
            CUDA=${{ matrix.cuda }}
            PYTORCH_VERSION=${{ matrix.pytorch }}
            AXOLOTL_ARGS=${{ matrix.axolotl_args }}
            AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}
          file: ./docker/Dockerfile-uv
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
            ${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
          labels: ${{ steps.metadata.outputs.labels }}

  build-axolotl-cloud:
    needs: build-axolotl
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            axolotl_extras:
            platforms: "linux/amd64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            is_latest: true
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
#          - cuda: 129
#            cuda_version: 12.9.1
#            python_version: "3.12"
#            pytorch: 2.9.1
#            axolotl_extras:
#            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-cloud
          tags: |
            type=ref,event=branch
            type=pep440,pattern={{version}}
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platforms }}
          build-args: |
            BASE_TAG=${{ github.ref_type == 'tag' && 'main' || github.ref_name }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            CUDA=${{ matrix.cuda }}
          file: ./docker/Dockerfile-cloud
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
             ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
             ${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
          labels: ${{ steps.metadata.outputs.labels }}

  build-axolotl-cloud-uv:
    needs: build-axolotl-uv
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.9.1
            axolotl_extras:
            is_latest: true
            platforms: "linux/amd64,linux/arm64"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.10.0
            axolotl_extras:
            platforms: "linux/amd64,linux/arm64"
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-cloud-uv
          tags: |
            type=ref,event=branch
            type=pep440,pattern={{version}}
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platforms }}
          build-args: |
            BASE_TAG=${{ github.ref_type == 'tag' && 'main' || github.ref_name }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            CUDA=${{ matrix.cuda }}
          file: ./docker/Dockerfile-cloud-uv
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
             ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
             ${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
          labels: ${{ steps.metadata.outputs.labels }}

  build-axolotl-cloud-no-tmux:
    needs: build-axolotl
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            is_latest: true
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
            is_latest:
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-cloud-term
          tags: |
            type=ref,event=branch
            type=pep440,pattern={{version}}
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          build-args: |
            BASE_TAG=${{ github.ref_type == 'tag' && 'main' || github.ref_name }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            CUDA=${{ matrix.cuda }}
          file: ./docker/Dockerfile-cloud-no-tmux
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
             ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
             ${{ (matrix.is_latest) && format('{0}-latest', steps.metadata.outputs.tags) || '' }}
          labels: ${{ steps.metadata.outputs.labels }}


================================================
FILE: .github/workflows/multi-gpu-e2e.yml
================================================
name: docker-multigpu-tests-biweekly

on:
  pull_request:
    paths:
      - 'tests/e2e/multigpu/**.py'
      - 'requirements.txt'
      - 'setup.py'
      - 'pyproject.toml'
      - '.github/workflows/multi-gpu-e2e.yml'
      - 'scripts/cutcrossentropy_install.py'
      - 'src/axolotl/core/trainers/mixins/sequence_parallel.py'
      - 'src/axolotl/utils/distributed.py'
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * 1,4'  # Runs at 00:00 UTC every monday & thursday

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
  contents: read

env:
  MODAL_IMAGE_BUILDER_VERSION: "2025.06"

jobs:
  test-axolotl-multigpu:
    if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            axolotl_extras: fbgemm-gpu
            num_gpus: 2
#          - cuda: 129
#            cuda_version: 12.9.1
#            python_version: "3.12"
#            pytorch: 2.9.1
#            axolotl_extras: "fbgemm-gpu"
#            num_gpus: 2
#            dockerfile: "Dockerfile-uv.jinja"
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
#            axolotl_extras: fbgemm-gpu
            num_gpus: 2
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.10.0
            axolotl_extras: "fbgemm-gpu"
            num_gpus: 2
            dockerfile: "Dockerfile-uv.jinja"
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
          echo "E2E_DOCKERFILE=${{ matrix.dockerfile || 'Dockerfile.jinja'}}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: |
          modal run -m cicd.multigpu


================================================
FILE: .github/workflows/nightlies.yml
================================================
name: docker-nightlies

on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * *'  # Runs at 00:00 UTC every day

permissions:
  contents: read

jobs:
  build-axolotl:
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            axolotl_extras:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl
          tags: |
            type=raw,value={{ branch }}-{{ date 'YYYYMMDD' }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      # guidance for testing before pushing: https://docs.docker.com/build/ci/github-actions/test-before-push/
      - name: Build and export to Docker
        uses: docker/build-push-action@v5
        with:
          context: .
          build-args: |
            BASE_TAG=${{ github.ref_name }}-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}
            CUDA=${{ matrix.cuda }}
            PYTORCH_VERSION=${{ matrix.pytorch }}
            AXOLOTL_ARGS=${{ matrix.axolotl_args }}
          file: ./docker/Dockerfile
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
          labels: ${{ steps.metadata.outputs.labels }}

  build-axolotl-cloud:
    needs: build-axolotl
    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            axolotl_extras:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Docker metadata
        id: metadata
        uses: docker/metadata-action@v5
        with:
          images: |
            axolotlai/axolotl-cloud
          tags: |
            type=raw,value={{ branch }}-{{ date 'YYYYMMDD' }}
      - name: Login to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Build
        uses: docker/build-push-action@v5
        with:
          context: .
          build-args: |
            BASE_TAG=${{ github.ref_name }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
            CUDA=${{ matrix.cuda }}
          file: ./docker/Dockerfile-cloud
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
             ${{ steps.metadata.outputs.tags }}-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}${{ matrix.axolotl_extras != '' && '-' || '' }}${{ matrix.axolotl_extras }}
          labels: ${{ steps.metadata.outputs.labels }}


================================================
FILE: .github/workflows/precommit-autoupdate.yml
================================================
name: Pre-commit auto-update

on:
  schedule:
    - cron: '0 0 1 * *'  # Run monthly
  workflow_dispatch:  # Manual kickoff

permissions: {}

jobs:
  auto-update:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Update pre-commit hooks
        id: update
        run: |
          pip install pre-commit
          pre-commit autoupdate
          if [[ -n $(git status --porcelain) ]]; then
            echo "changes=true" >> $GITHUB_OUTPUT
          fi

      - name: Create Pull Request
        if: steps.update.outputs.changes == 'true'
        uses: peter-evans/create-pull-request@v6
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          branch: update/pre-commit-hooks
          delete-branch: true
          title: "chore: update pre-commit hooks"
          commit-message: "chore: update pre-commit hooks"
          body: |
            Automated PR to update pre-commit hooks to their latest versions.


================================================
FILE: .github/workflows/preview-docs.yml
================================================
name: Preview
on:
  workflow_dispatch:
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]

    # Run the workflow only when one of these files changes
    paths:
      - '**/*.md'      # any Markdown file
      - '**/*.qmd'     # any Quarto file
      - '_quarto.yml'
      - docs/scripts/generate_config_docs.py
      - src/axolotl/utils/schemas/**.py
      - .github/workflows/preview-docs.yml

permissions:
  contents: read
  pull-requests: write

jobs:
  preview:
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    steps:
      - name: cleanup node
        run: |
          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL

      - name: Check out repository
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          python3 -m pip install jupyter quartodoc
          python3 -m pip install -e .

      - name: Build autodoc
        run: quartodoc build

      - name: Quarto render
        run: quarto render

      - name: Netlify Publish
        uses: nwtgck/actions-netlify@v3.0
        if: ${{ github.event.pull_request.head.repo.full_name == github.repository }}
        id: netlify
        with:
          publish-dir: './_site'
          enable-pull-request-comment: false
          enable-github-deployment: false
          github-token: ${{ secrets.GITHUB_TOKEN }}
          deploy-message: "Deployed On Netlify"
          github-deployment-environment: 'preview'
          github-deployment-description: 'Preview Deployment'
        env:
          NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
          NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}

      - name: Update PR with preview link
        if: ${{ steps.netlify.outcome == 'success' }}
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          message: |
            📖 **Documentation Preview**: ${{ steps.netlify.outputs.deploy-url }}

            Deployed on Netlify from commit ${{ github.event.pull_request.head.sha }}


================================================
FILE: .github/workflows/pypi.yml
================================================
name: publish pypi

on:
  push:
    tags:
      - "v*"
  workflow_dispatch:

permissions: {}

jobs:
  setup_release:
    name: Create Release
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Create release
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: gh release create "$GITHUB_REF_NAME" --generate-notes
  pypi-publish:
    name: Upload release to PyPI
    runs-on: ubuntu-latest
    needs: [setup_release]
    environment:
      name: pypi
      url: https://pypi.org/p/axolotl
    permissions:
      contents: read
      id-token: write # IMPORTANT: this permission is mandatory for trusted publishing
    steps:
      - name: Check out repository code
        uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          pip3 install wheel packaging==26.0
          pip3 install --no-build-isolation -e .
          pip3 install -r requirements-dev.txt -r requirements-tests.txt

      - name: Extract tag name
        id: tag
        run: echo "TAG_NAME=$(echo $GITHUB_REF | cut -d / -f 3)" >> "$GITHUB_OUTPUT"

      - name: Update version in VERSION file
        run: |
          echo "${{ steps.tag.outputs.TAG_NAME }}" | sed 's/^v//' > VERSION

      - name: Build a source dist
        run: |
          python setup.py sdist

      - name: Publish package distributions to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1


================================================
FILE: .github/workflows/tests-nightly.yml
================================================
name: Tests Nightly against upstream main
on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * *'  # Runs at 00:00 UTC every day
  pull_request:
    types: [opened, synchronize, reopened, ready_for_review]
    paths:
      - '.github/workflows/tests-nightly.yml'

permissions:
  contents: read

jobs:
  pre-commit:
    name: pre-commit
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: 'pip' # caching pip dependencies
      - uses: pre-commit/action@v3.0.1
        env:
          SKIP: no-commit-to-branch

  prime-cdn-s3-cache:
    name: Prefetch S3 once to prime the CDN cache
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    timeout-minutes: 10
    steps:
      - name: Restore Cache from S3
        id: hf-cache-restore-s3
        run: |
          curl -v -H "Range: bytes=0-1023" -L https://axolotl-ci.b-cdn.net/hf-cache.tar.zst > /dev/null

  pytest:
    name: PyTest
    runs-on: ubuntu-latest
    needs: [prime-cdn-s3-cache]
    strategy:
      fail-fast: false
      matrix:
        python_version: ["3.12"]  # TODO include py3.14 once https://github.com/mistralai/mistral-common/pull/194 is merged
        pytorch_version: ["2.8.0", "2.9.1", "2.10.0"]
    timeout-minutes: 20

    steps:
      - name: Check out repository code
        uses: actions/checkout@v4

      - name: Restore Cache from S3
        id: hf-cache-restore-s3
        run: |
          mkdir -p /home/runner/.cache/huggingface/hub
          curl -L https://axolotl-ci.b-cdn.net/hf-cache.tar.zst | tar -xf - -C /home/runner/.cache/huggingface/hub/  --use-compress-program unzstd

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python_version }}
          cache: 'pip' # caching pip dependencies

      - name: upgrade pip
        run: |
          pip3 install --upgrade pip
          pip3 install --upgrade packaging==26.0 setuptools==78.1.1 wheel

      - name: Install PyTorch
        run: |
          pip3 install torch==${{ matrix.pytorch_version }} torchvision

      - name: Update requirements.txt
        run: |
          sed -i 's#^transformers.*#transformers @ git+https://github.com/huggingface/transformers.git@main#' requirements.txt
          sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt
          sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt
          sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt
          sed -i 's#^datasets.*#datasets @ git+https://github.com/huggingface/datasets.git@main#' requirements.txt

      - name: Install dependencies
        run: |
          pip3 show torch
          pip3 install --no-build-isolation -U -e .
          python scripts/unsloth_install.py | sh
          python scripts/cutcrossentropy_install.py | sh
          pip3 install -r requirements-dev.txt -r requirements-tests.txt

      - name: Make sure PyTorch version wasn't clobbered
        run: |
          python -c "import torch; assert '${{ matrix.pytorch_version }}' in torch.__version__"

      - name: Ensure axolotl CLI was installed
        run: |
          axolotl --help

      - name: Run tests
        run: |
          pytest -v --durations=10 -n8 --dist loadfile --ignore=tests/e2e/ --ignore=tests/patched/ --ignore=tests/cli/ tests/
          pytest -v --durations=10 tests/patched/
          pytest -v --durations=10 tests/cli/

      - name: cleanup pip cache
        run: |
          find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;

  docker-e2e-tests:
    if: github.repository_owner == 'axolotl-ai-cloud'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    needs: [pre-commit, pytest]

    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
            nightly_build: "true"
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.10.0
            num_gpus: 1
            axolotl_extras:
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
            dockerfile: "Dockerfile-uv.jinja"
            nightly_build: "true"
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
          echo "E2E_DOCKERFILE=${{ matrix.dockerfile || 'Dockerfile.jinja'}}" >> $GITHUB_ENV
          echo "NIGHTLY_BUILD=${{ matrix.nightly_build }}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: |
          modal run cicd.e2e_tests
  docker-e2e-multigpu-tests:
    if: github.repository_owner == 'axolotl-ai-cloud'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    needs: [pre-commit, pytest, docker-e2e-tests]

    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            num_gpus: 2
            axolotl_extras:
            nightly_build: "true"
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
          echo "NIGHTLY_BUILD=${{ matrix.nightly_build }}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: |
          modal run cicd.multigpu


================================================
FILE: .github/workflows/tests.yml
================================================
name: Tests
on:
  # check on push/merge to main, PRs, and manual triggers
  merge_group:
  push:
    branches:
      - "main"
    paths:
      - '**.py'
      - 'requirements.txt'
      - '.github/workflows/*.yml'
      - 'requirements-tests.txt'
      - 'cicd/cicd.sh'
      - 'cicd/Dockerfile.jinja'
  pull_request:
      types: [opened, synchronize, reopened, ready_for_review]
      paths:
       - '**.py'
       - 'requirements.txt'
       - '.github/workflows/*.yml'
       - 'requirements-tests.txt'
       - 'cicd/cicd.sh'
       - 'cicd/Dockerfile.jinja'
  workflow_dispatch:

# Cancel jobs on the same ref if a new one is triggered
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

permissions:
  contents: read

env:
  TRANSFORMERS_IS_CI: "yes"

jobs:
  pre-commit:
    name: pre-commit
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: 'pip' # caching pip dependencies
      - uses: pre-commit/action@v3.0.1
        env:
          SKIP: no-commit-to-branch

  prime-cdn-s3-cache:
    name: Prefetch S3 once to prime the CDN cache
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    timeout-minutes: 10
    steps:
      - name: Restore Cache from S3
        id: hf-cache-restore-s3
        run: |
          curl -v -H "Range: bytes=0-1023" -L https://axolotl-ci.b-cdn.net/hf-cache.tar.zst > /dev/null

  pytest:
    name: PyTest
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    needs: [prime-cdn-s3-cache]
    strategy:
      fail-fast: false
      matrix:
        python_version: ["3.12"]  # TODO include py3.14 once https://github.com/mistralai/mistral-common/pull/194 is merged
        pytorch_version: ["2.8.0", "2.9.1", "2.10.0"]
#        exclude:
#          - python_version: "3.14"
#            pytorch_version: "2.8.0"
#          - python_version: "3.14"
#            pytorch_version: "2.9.1"
    timeout-minutes: 20

    steps:
      - name: cleanup node
        run: |
          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL

      - name: Check out repository code
        uses: actions/checkout@v4

      - name: Restore Cache from S3
        id: hf-cache-restore-s3
        run: |
          mkdir -p ~/.cache/huggingface/hub
          curl -L https://axolotl-ci.b-cdn.net/hf-cache.tar.zst | tar -xpf - -C ~/.cache/huggingface/hub/  --use-compress-program unzstd --strip-components=1
          ls -ltr ~/.cache/huggingface/hub/

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python_version }}
          cache: 'pip' # caching pip dependencies

      - name: upgrade pip
        run: |
          pip3 install --upgrade pip
          pip3 install --upgrade packaging==26.0 setuptools==75.8.0 wheel

      - name: Install PyTorch
        run: |
          pip3 install --no-cache-dir torch==${{ matrix.pytorch_version }} torchvision

      - name: Install dependencies
        run: |
          pip3 show torch
          pip3 install --no-cache-dir --no-build-isolation -U -e .
          python scripts/unsloth_install.py | sh
          python scripts/cutcrossentropy_install.py | sh
          pip3 install -r requirements-dev.txt -r requirements-tests.txt

      - name: cleanup pip cache
        run: |
          find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;

      - name: Make sure PyTorch version wasn't clobbered
        run: |
          python -c "import torch; assert '${{ matrix.pytorch_version }}' in torch.__version__"

      - name: Ensure axolotl CLI was installed
        run: |
          axolotl --help

      - name: Pre-Download dataset fixture
        run: |
          hf download --repo-type=dataset axolotl-ai-internal/axolotl-oss-dataset-fixtures

      - name: Show HF cache
        run: hf cache ls

      - name: Run tests
        run: |
          df -h
          pytest -v --durations=10 -n4 --dist loadfile --ignore=tests/e2e/ --ignore=tests/patched/ --ignore=tests/cli/ --ignore=tests/monkeypatch/ tests/ --cov=axolotl --cov-report=xml
          df -h
          pytest -v --durations=10 tests/monkeypatch/ --cov=axolotl --cov-append --cov-report=xml
          df -h
          pytest -v --durations=10 tests/patched/ --cov=axolotl --cov-append --cov-report=xml
          df -h
          pytest -v --durations=10 tests/cli/ --cov=axolotl --cov-append --cov-report=xml

      - name: Show HF cache
        run: hf cache ls

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          files: ./coverage.xml
          flags: unittests,pytorch-${{ matrix.pytorch_version }}
          fail_ci_if_error: false

  pytest-sdist:
    name: PyTest from Source Dist
    runs-on: ubuntu-latest
    if: ${{ !github.event.pull_request.draft }}
    needs: [prime-cdn-s3-cache]
    strategy:
      fail-fast: false
      matrix:
        python_version: ["3.12"]  # TODO include py3.14 once https://github.com/mistralai/mistral-common/pull/194 is merged
        pytorch_version: ["2.8.0", "2.9.1", "2.10.0"]
#        exclude:
#          - python_version: "3.14"
#            pytorch_version: "2.8.0"
#          - python_version: "3.14"
#            pytorch_version: "2.9.1"
    timeout-minutes: 30

    steps:
      - name: cleanup node
        run: |
          sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc /opt/hostedtoolcache/CodeQL

      - name: Check out repository code
        uses: actions/checkout@v4

      - name: Restore Cache from S3
        id: hf-cache-restore-s3
        run: |
          mkdir -p ~/.cache/huggingface/hub
          curl -L https://axolotl-ci.b-cdn.net/hf-cache.tar.zst | tar -xpf - -C ~/.cache/huggingface/hub/  --use-compress-program unzstd --strip-components=1
          ls -ltr ~/.cache/huggingface/hub/

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python_version }}
          cache: 'pip' # caching pip dependencies

      - name: upgrade pip
        run: |
          pip3 install --upgrade pip
          pip3 install --upgrade packaging==26.0 setuptools==75.8.0 setuptools_scm build wheel psutil

      - name: Install PyTorch
        run: |
          pip3 install --no-cache-dir torch==${{ matrix.pytorch_version }} torchvision

      - name: Install dependencies
        run: |
          pip3 show torch
          python -m build --no-isolation --sdist
          pip3 install --no-cache-dir --no-build-isolation dist/axolotl*.tar.gz
          python scripts/unsloth_install.py | sh
          python scripts/cutcrossentropy_install.py | sh
          pip3 install -r requirements-dev.txt -r requirements-tests.txt

      - name: cleanup pip cache
        run: |
          find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;

      - name: Make sure PyTorch version wasn't clobbered
        run: |
          python -c "import torch; assert '${{ matrix.pytorch_version }}' in torch.__version__"

      - name: Ensure axolotl CLI was installed
        run: |
          axolotl --help

      - name: Show HF cache
        run: hf cache ls

      - name: Run tests
        run: |
          pytest -v --durations=10 -n4 --dist loadfile --ignore=tests/e2e/ --ignore=tests/patched/ --ignore=tests/cli/ --ignore=tests/monkeypatch/ tests/ --cov=axolotl --cov-report=xml
          pytest -v --durations=10 tests/monkeypatch/ --cov=axolotl --cov-append --cov-report=xml
          pytest -v --durations=10 tests/cli/

      - name: Show HF cache
        run: hf cache ls

  gate-skip-e2e:
    needs: [pre-commit]
    runs-on: ubuntu-latest
    outputs:
      skip: ${{ steps.compute.outputs.skip }}
    steps:
      - uses: actions/github-script@v7
        id: compute
        with:
          script: |
            const token = /\[skip-e2e\]/i;
            let msg = '';
            if (context.eventName === 'push') {
              msg = context.payload.head_commit?.message || '';
            } else if (context.eventName === 'pull_request') {
              const { owner, repo } = context.repo;
              const prNumber = context.payload.pull_request.number;
              const commits = await github.paginate(
                github.rest.pulls.listCommits,
                { owner, repo, pull_number: prNumber, per_page: 100 }
              );
              msg = commits.at(-1)?.commit?.message || '';
            }
            const title = context.payload.pull_request?.title || '';
            const body  = context.payload.pull_request?.body  || '';
            const skip = token.test(msg) || token.test(title) || token.test(body);
            core.setOutput('skip', String(skip));

  docker-e2e-tests-1st:
    # Run this job first as a gate for running the remainder of the test matrix
    if: >
      github.repository_owner == 'axolotl-ai-cloud' &&
      (github.event_name != 'pull_request' || !github.event.pull_request.draft) &&
      needs.gate-skip-e2e.outputs.skip != 'true'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    needs: [pre-commit, pytest]

    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.12"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
            dockerfile: "Dockerfile-uv.jinja"
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "MODAL_IMAGE_BUILDER_VERSION=2024.10" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
          echo "E2E_DOCKERFILE=${{ matrix.dockerfile || 'Dockerfile.jinja'}}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: |
          modal run cicd.e2e_tests

  docker-e2e-tests:
    if: >
      github.repository_owner == 'axolotl-ai-cloud' &&
      (github.event_name != 'pull_request' || !github.event.pull_request.draft) &&
      needs.gate-skip-e2e.outputs.skip != 'true'
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 120
    # Only run the remainder of the matrix if the first e2e check passed;
    # this is to save on wasted compute costs for known failures that get caught in the first run
    needs: [pre-commit, pytest, gate-skip-e2e, docker-e2e-tests-1st]

    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.8.0
            num_gpus: 1
            gpu_type: "B200"
            axolotl_extras: fbgemm-gpu
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.10.0
            num_gpus: 1
            axolotl_extras:
          - cuda: 130
            cuda_version: 13.0.0
            python_version: "3.11"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "MODAL_IMAGE_BUILDER_VERSION=2024.10" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
          echo "GPU_TYPE=${{ matrix.gpu_type || 'L40S'}}" >> $GITHUB_ENV
          echo "E2E_DOCKERFILE=${{ matrix.dockerfile || 'Dockerfile.jinja'}}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        env:
          CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
        run: |
          modal run cicd.e2e_tests

  docker-e2e-cleanup:
    runs-on: [self-hosted, modal]
    timeout-minutes: 90
    needs: [docker-e2e-tests]
    if: ${{ !github.event.pull_request.draft }}

    strategy:
      fail-fast: false
      matrix:
        include:
          - cuda: 128
            cuda_version: 12.8.1
            python_version: "3.11"
            pytorch: 2.9.1
            num_gpus: 1
            axolotl_extras:
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Install Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - name: Install Modal
        run: |
          python -m pip install --upgrade pip
          pip install modal==1.3.0.post1 jinja2
      - name: Update env vars
        run: |
          echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
          echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
          echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
          echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
          echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
          echo "MODAL_IMAGE_BUILDER_VERSION=2024.10" >> $GITHUB_ENV
          echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
      - name: Run tests job on Modal
        run: |
          modal run cicd.cleanup


================================================
FILE: .gitignore
================================================
**/axolotl.egg-info
configs
last_run_prepared/
outputs
.vscode
_site/

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
venv3.10/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# WandB
# wandb creates a folder to store logs for training runs
wandb

# Runs
lora-out/*
qlora-out/*
mlruns/*

/.quarto/
prepared-datasets/
submit.sh
*.out*

# Quartodoc generated files
objects.json
site_libs/

typings/
out/

# vim
*.swp

# scm auto-versioning
src/axolotl/_version.py


================================================
FILE: .mypy.ini
================================================
[mypy]
plugins = pydantic.mypy
exclude = venv

[mypy-alpaca_lora_4bit.*]
ignore_missing_imports = True

[mypy-axolotl.monkeypatch.*]
ignore_errors = True

[mypy-axolotl.models.mixtral.*]
ignore_errors = True

[mypy-axolotl.integrations.liger.models.*]
ignore_errors = True

[mypy-axolotl.models.phi.*]
ignore_errors = True

[mypy-flash_attn.*]
ignore_missing_imports = True

[mypy-huggingface_hub]
ignore_missing_imports = True

[mypy-transformers.*]
ignore_missing_imports = True

[mypy-peft]
ignore_missing_imports = True

[mypy-wandb]
ignore_missing_imports = True

[mypy-bitsandbytes]
ignore_missing_imports = True

[mypy-requests]
ignore_missing_imports = True

[mypy-datasets]
ignore_missing_imports = True

[mypy-fire]
ignore_missing_imports = True

[mypy-setuptools]
ignore_missing_imports = True

[mypy-addict]
ignore_missing_imports = True

[mypy-xformers.*]
ignore_missing_imports = True


================================================
FILE: .pre-commit-config.yaml
================================================
default_language_version:
    python: python3

repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
    -   id: check-yaml
    -   id: end-of-file-fixer
    -   id: trailing-whitespace
    -   id: no-commit-to-branch
        args: ['--branch', 'main']
-   repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.4
    hooks:
    -   id: ruff
        args: [--fix]
    -   id: ruff-format
-   repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.19.1
    hooks:
    - id: mypy
      additional_dependencies:
        [
            'types-PyYAML',
            'pydantic>=2.5.3',
        ]
-   repo: https://github.com/PyCQA/bandit
    rev: 1.9.4
    hooks:
    -   id: bandit
        args: [
            '--ini',
            '.bandit',
        ]


================================================
FILE: .runpod/.gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
pod/scripts/config.yaml


================================================
FILE: .runpod/Dockerfile
================================================
FROM axolotlai/axolotl-cloud:main-py3.11-cu124-2.6.0

COPY .runpod/requirements.txt /requirements.txt
RUN --mount=type=cache,target=/root/.cache/pip \
    python3 -m pip install --upgrade pip && \
    python3 -m pip install --upgrade -r /requirements.txt

# Environment settings
ARG BASE_VOLUME="/runpod-volume"
ENV BASE_VOLUME=$BASE_VOLUME
ENV HF_DATASETS_CACHE="${BASE_VOLUME}/huggingface-cache/datasets"
ENV HUGGINGFACE_HUB_CACHE="${BASE_VOLUME}/huggingface-cache/hub"
ENV HF_HUB_CACHE="${BASE_VOLUME}/huggingface-cache/hub"
ENV TRANSFORMERS_CACHE="${BASE_VOLUME}/huggingface-cache/hub"

COPY .runpod/src /src

WORKDIR /src
CMD ["python3", "/src/handler.py"]


================================================
FILE: .runpod/README.md
================================================
<h1>LLM Post Training- Full fine-tune, LoRA, QLoRa etc. Llama/Mistral/Gemma and more</h1>

# Configuration Options

This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.

## Usage

You can use these configuration Options:

1. As a JSON request body:

```json
{
  "input": {
    "user_id": "user",
    "model_id": "model-name",
    "run_id": "run-id",
    "credentials": {
      "wandb_api_key": "", # add your Weights & biases key. TODO:  you will be able to set this in Enviornment variables.
      "hf_token": "", # add your HF_token. TODO:  you will be able to set this in Enviornment variables.
    },
    "args": {
      "base_model": "NousResearch/Llama-3.2-1B",
      // ... other options
    }
  }
}
```

## Configuration Options

### Model Configuration

| Option              | Description                                                                                   | Default              |
| ------------------- | --------------------------------------------------------------------------------------------- | -------------------- |
| `base_model`        | Path to the base model (local or HuggingFace)                                                 | Required             |
| `base_model_config` | Configuration path for the base model                                                         | Same as base_model   |
| `revision_of_model` | Specific model revision from HuggingFace hub                                                  | Latest               |
| `tokenizer_config`  | Custom tokenizer configuration path                                                           | Optional             |
| `model_type`        | Type of model to load                                                                         | AutoModelForCausalLM |
| `tokenizer_type`    | Type of tokenizer to use                                                                      | AutoTokenizer        |
| `hub_model_id`      | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional             |

## Model Family Identification

| Option                     | Default | Description                    |
| -------------------------- | ------- | ------------------------------ |
| `is_falcon_derived_model`  | `false` | Whether model is Falcon-based  |
| `is_llama_derived_model`   | `false` | Whether model is LLaMA-based   |
| `is_qwen_derived_model`    | `false` | Whether model is Qwen-based    |
| `is_mistral_derived_model` | `false` | Whether model is Mistral-based |

## Model Configuration Overrides

| Option                                          | Default    | Description                        |
| ----------------------------------------------- | ---------- | ---------------------------------- |
| `overrides_of_model_config.rope_scaling.type`   | `"linear"` | RoPE scaling type (linear/dynamic) |
| `overrides_of_model_config.rope_scaling.factor` | `1.0`      | RoPE scaling factor                |

### Model Loading Options

| Option         | Description                   | Default |
| -------------- | ----------------------------- | ------- |
| `load_in_8bit` | Load model in 8-bit precision | false   |
| `load_in_4bit` | Load model in 4-bit precision | false   |
| `bf16`         | Use bfloat16 precision        | false   |
| `fp16`         | Use float16 precision         | false   |
| `tf32`         | Use tensor float 32 precision | false   |

## Memory and Device Settings

| Option             | Default   | Description             |
| ------------------ | --------- | ----------------------- |
| `gpu_memory_limit` | `"20GiB"` | GPU memory limit        |
| `lora_on_cpu`      | `false`   | Load LoRA on CPU        |
| `device_map`       | `"auto"`  | Device mapping strategy |
| `max_memory`       | `null`    | Max memory per device   |

## Training Hyperparameters

| Option                        | Default   | Description                 |
| ----------------------------- | --------- | --------------------------- |
| `gradient_accumulation_steps` | `1`       | Gradient accumulation steps |
| `micro_batch_size`            | `2`       | Batch size per GPU          |
| `eval_batch_size`             | `null`    | Evaluation batch size       |
| `num_epochs`                  | `4`       | Number of training epochs   |
| `warmup_steps`                | `100`     | Warmup steps                |
| `warmup_ratio`                | `0.05`    | Warmup ratio                |
| `learning_rate`               | `0.00003` | Learning rate               |
| `lr_quadratic_warmup`         | `false`   | Quadratic warmup            |
| `logging_steps`               | `null`    | Logging frequency           |
| `eval_steps`                  | `null`    | Evaluation frequency        |
| `evals_per_epoch`             | `null`    | Evaluations per epoch       |
| `save_strategy`               | `"epoch"` | Checkpoint saving strategy  |
| `save_steps`                  | `null`    | Saving frequency            |
| `saves_per_epoch`             | `null`    | Saves per epoch             |
| `save_total_limit`            | `null`    | Maximum checkpoints to keep |
| `max_steps`                   | `null`    | Maximum training steps      |

### Dataset Configuration

```yaml
datasets:
  - path: vicgalle/alpaca-gpt4 # HuggingFace dataset or TODO: You will be able to add the local path.
    type: alpaca # Format type (alpaca, gpteacher, oasst, etc.)
    ds_type: json # Dataset type
    data_files: path/to/data # Source data files
    train_on_split: train # Dataset split to use
```

## Chat Template Settings

| Option                   | Default                          | Description            |
| ------------------------ | -------------------------------- | ---------------------- |
| `chat_template`          | `"tokenizer_default"`            | Chat template type     |
| `chat_template_jinja`    | `null`                           | Custom Jinja template  |
| `default_system_message` | `"You are a helpful assistant."` | Default system message |

## Dataset Processing

| Option                            | Default                    | Description                         |
| --------------------------------- | -------------------------- | ----------------------------------- |
| `dataset_prepared_path`           | `"data/last_run_prepared"` | Path for prepared dataset           |
| `push_dataset_to_hub`             | `""`                       | Push dataset to HF hub              |
| `dataset_num_proc`                | `4`                        | Number of preprocessing processes   |
| `dataset_keep_in_memory`          | `false`                    | Keep dataset in memory              |
| `shuffle_merged_datasets`         | `true`                     | Shuffle merged datasets             |
| `shuffle_before_merging_datasets` | `false`                    | Shuffle each dataset before merging |
| `dataset_exact_deduplication`     | `true`                     | Deduplicate datasets                |

## LoRA Configuration

| Option                     | Default                | Description                    |
| -------------------------- | ---------------------- | ------------------------------ |
| `adapter`                  | `"lora"`               | Adapter type (lora/qlora)      |
| `lora_model_dir`           | `""`                   | Directory with pretrained LoRA |
| `lora_r`                   | `8`                    | LoRA attention dimension       |
| `lora_alpha`               | `16`                   | LoRA alpha parameter           |
| `lora_dropout`             | `0.05`                 | LoRA dropout                   |
| `lora_target_modules`      | `["q_proj", "v_proj"]` | Modules to apply LoRA          |
| `lora_target_linear`       | `false`                | Target all linear modules      |
| `peft_layers_to_transform` | `[]`                   | Layers to transform            |
| `lora_modules_to_save`     | `[]`                   | Modules to save                |
| `lora_fan_in_fan_out`      | `false`                | Fan in/out structure           |

## Optimization Settings

| Option                    | Default | Description                |
| ------------------------- | ------- | -------------------------- |
| `train_on_inputs`         | `false` | Train on input prompts     |
| `group_by_length`         | `false` | Group by sequence length   |
| `gradient_checkpointing`  | `false` | Use gradient checkpointing |
| `early_stopping_patience` | `3`     | Early stopping patience    |

## Learning Rate Scheduling

| Option                     | Default    | Description          |
| -------------------------- | ---------- | -------------------- |
| `lr_scheduler`             | `"cosine"` | Scheduler type       |
| `lr_scheduler_kwargs`      | `{}`       | Scheduler parameters |
| `cosine_min_lr_ratio`      | `null`     | Minimum LR ratio     |
| `cosine_constant_lr_ratio` | `null`     | Constant LR ratio    |
| `lr_div_factor`            | `null`     | LR division factor   |

## Optimizer Settings

| Option                 | Default      | Description         |
| ---------------------- | ------------ | ------------------- |
| `optimizer`            | `"adamw_hf"` | Optimizer choice    |
| `optim_args`           | `{}`         | Optimizer arguments |
| `optim_target_modules` | `[]`         | Target modules      |
| `weight_decay`         | `null`       | Weight decay        |
| `adam_beta1`           | `null`       | Adam beta1          |
| `adam_beta2`           | `null`       | Adam beta2          |
| `adam_epsilon`         | `null`       | Adam epsilon        |
| `max_grad_norm`        | `null`       | Gradient clipping   |

## Attention Implementations

| Option                     | Default | Description                   |
| -------------------------- | ------- | ----------------------------- |
| `flash_optimum`            | `false` | Use better transformers       |
| `xformers_attention`       | `false` | Use xformers                  |
| `flash_attention`          | `false` | Use flash attention           |
| `flash_attn_cross_entropy` | `false` | Flash attention cross entropy |
| `flash_attn_rms_norm`      | `false` | Flash attention RMS norm      |
| `flash_attn_fuse_mlp`      | `false` | Fuse MLP operations           |
| `sdp_attention`            | `false` | Use scaled dot product        |
| `s2_attention`             | `false` | Use shifted sparse attention  |

## Tokenizer Modifications

| Option           | Default | Description                  |
| ---------------- | ------- | ---------------------------- |
| `special_tokens` | -       | Special tokens to add/modify |
| `tokens`         | `[]`    | Additional tokens            |

## Distributed Training

| Option                  | Default | Description           |
| ----------------------- | ------- | --------------------- |
| `fsdp`                  | `null`  | FSDP configuration    |
| `fsdp_config`           | `null`  | FSDP config options   |
| `deepspeed`             | `null`  | Deepspeed config path |
| `ddp_timeout`           | `null`  | DDP timeout           |
| `ddp_bucket_cap_mb`     | `null`  | DDP bucket capacity   |
| `ddp_broadcast_buffers` | `null`  | DDP broadcast buffers |

<details>
<summary><h3>Example Configuration Request:</h3></summary>

Here's a complete example for fine-tuning a LLaMA model using LoRA:

```json
{
  "input": {
    "user_id": "user",
    "model_id": "llama-test",
    "run_id": "test-run",
    "credentials": {
      "wandb_api_key": "",
      "hf_token": ""
    },
    "args": {
      "base_model": "NousResearch/Llama-3.2-1B",
      "load_in_8bit": false,
      "load_in_4bit": false,
      "strict": false,
      "datasets": [
        {
          "path": "teknium/GPT4-LLM-Cleaned",
          "type": "alpaca"
        }
      ],
      "dataset_prepared_path": "last_run_prepared",
      "val_set_size": 0.1,
      "output_dir": "./outputs/lora-out",
      "adapter": "lora",
      "sequence_len": 2048,
      "sample_packing": true,
      "eval_sample_packing": true,
      "pad_to_sequence_len": true,
      "lora_r": 16,
      "lora_alpha": 32,
      "lora_dropout": 0.05,
      "lora_target_modules": [
        "gate_proj",
        "down_proj",
        "up_proj",
        "q_proj",
        "v_proj",
        "k_proj",
        "o_proj"
      ],
      "gradient_accumulation_steps": 2,
      "micro_batch_size": 2,
      "num_epochs": 1,
      "optimizer": "adamw_8bit",
      "lr_scheduler": "cosine",
      "learning_rate": 0.0002,
      "train_on_inputs": false,
      "group_by_length": false,
      "bf16": "auto",
      "tf32": false,
      "gradient_checkpointing": true,
      "logging_steps": 1,
      "flash_attention": true,
      "loss_watchdog_threshold": 5,
      "loss_watchdog_patience": 3,
      "warmup_steps": 10,
      "evals_per_epoch": 4,
      "saves_per_epoch": 1,
      "weight_decay": 0,
      "hub_model_id": "runpod/llama-fr-lora",
      "wandb_name": "test-run-1",
      "wandb_project": "test-run-1",
      "wandb_entity": "axo-test",
      "special_tokens": {
        "pad_token": "<|end_of_text|>"
      }
    }
  }
}
```

</details>

### Advanced Features

#### Wandb Integration

- `wandb_project`: Project name for Weights & Biases
- `wandb_entity`: Team name in W&B
- `wandb_watch`: Monitor model with W&B
- `wandb_name`: Name of the W&B run
- `wandb_run_id`: ID for the W&B run

#### Performance Optimization

- `sample_packing`: Enable efficient sequence packing
- `eval_sample_packing`: Use sequence packing during evaluation
- `torch_compile`: Enable PyTorch 2.0 compilation
- `flash_attention`: Use Flash Attention implementation
- `xformers_attention`: Use xFormers attention implementation

### Available Optimizers

The following optimizers are supported:

- `adamw_hf`: HuggingFace's AdamW implementation
- `adamw_torch`: PyTorch's AdamW
- `adamw_torch_fused`: Fused AdamW implementation
- `adamw_torch_xla`: XLA-optimized AdamW
- `adamw_apex_fused`: NVIDIA Apex fused AdamW
- `adafactor`: Adafactor optimizer
- `adamw_anyprecision`: Anyprecision AdamW
- `adamw_bnb_8bit`: 8-bit AdamW from bitsandbytes
- `lion_8bit`: 8-bit Lion optimizer
- `lion_32bit`: 32-bit Lion optimizer
- `sgd`: Stochastic Gradient Descent
- `adagrad`: Adagrad optimizer

## Notes

- Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
- Enable `flash_attention: true` for faster training on modern GPUs
- Use `gradient_checkpointing: true` to reduce memory usage
- Adjust `micro_batch_size` and `gradient_accumulation_steps` based on your GPU memory

For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config-reference.html).

### Errors:

- if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.


================================================
FILE: .runpod/hub.json
================================================
{
  "title": "Axolotl Fine-Tuning",
  "description": "Serverless fine-tuning of open-source LLMs with Axolotl. Supports LoRA, QLoRA, DPO, and more using Hugging Face models and datasets.",
  "type": "serverless",
  "category": "language",
  "iconUrl": "https://avatars.githubusercontent.com/u/167502477",
  "config": {
    "runsOn": "GPU",
    "containerDiskInGb": 200,
    "gpuCount": 1,
    "allowedCudaVersions": [
      "12.8",
      "12.7",
      "12.6",
      "12.5",
      "12.4"
    ],
    "presets": [],
    "env": [
      {
        "key": "TOKENIZER",
        "input": {
          "name": "Tokenizer",
          "type": "string",
          "description": "Name or path of the Hugging Face tokenizer to use.",
          "default": "",
          "advanced": true
        }
      },
      {
        "key": "MAX_NUM_SEQS",
        "input": {
          "name": "Max Num Seqs",
          "type": "number",
          "description": "Maximum number of sequences per iteration.",
          "default": 256,
          "advanced": true
        }
      },
      {
        "key": "DISABLE_LOG_STATS",
        "input": {
          "name": "Disable Log Stats",
          "type": "boolean",
          "description": "Disable logging statistics.",
          "default": false,
          "trueValue": "true",
          "falseValue": "false"
        }
      },
      {
        "key": "LOAD_FORMAT",
        "input": {
          "name": "Load Format",
          "type": "string",
          "description": "The format of the model weights to load.",
          "default": "auto",
          "options": [
            {
              "label": "auto",
              "value": "auto"
            },
            {
              "label": "pt",
              "value": "pt"
            },
            {
              "label": "safetensors",
              "value": "safetensors"
            },
            {
              "label": "npcache",
              "value": "npcache"
            },
            {
              "label": "dummy",
              "value": "dummy"
            },
            {
              "label": "tensorizer",
              "value": "tensorizer"
            },
            {
              "label": "bitsandbytes",
              "value": "bitsandbytes"
            }
          ],
          "advanced": true
        }
      }
    ]
  }
}


================================================
FILE: .runpod/requirements.txt
================================================
# Required Python packages get listed here, one per line.
# Reccomended to lock the version number to avoid unexpected changes.

# You can also install packages from a git repository, e.g.:
# git+https://github.com/runpod/runpod-python.git
# To learn more, see https://pip.pypa.io/en/stable/reference/requirements-file-format/
runpod~=1.7.0


================================================
FILE: .runpod/src/config/config.yaml
================================================
# # This is the huggingface model that contains *.pt, *.safetensors, or *.bin files
# # This can also be a relative path to a model on disk
# base_model: ./llama-7b-hf
# # You can specify an ignore pattern if the model repo contains more than 1 model type (*.pt, etc)
# base_model_ignore_patterns:
# # If the base_model repo on hf hub doesn't include configuration .json files,
# # You can set that here, or leave this empty to default to base_model
# base_model_config: ./llama-7b-hf
# # You can specify to choose a specific model revision from huggingface hub
# model_revision:
# # Optional tokenizer configuration override in case you want to use a different tokenizer
# # than the one defined in the base model
# tokenizer_config:
# # If you want to specify the type of model to load, AutoModelForCausalLM is a good choice too
# model_type: AutoModelForCausalLM
# # Corresponding tokenizer for the model AutoTokenizer is a good choice
# tokenizer_type: AutoTokenizer
# # Trust remote code for untrusted source
# trust_remote_code:
# # use_fast option for tokenizer loading from_pretrained, default to True
# tokenizer_use_fast:
# # Whether to use the legacy tokenizer setting, defaults to True
# tokenizer_legacy:
# # Resize the model embeddings when new tokens are added to multiples of 32
# # This is reported to improve training speed on some models
# resize_token_embeddings_to_32x:

# # Used to identify which the model is based on
# is_falcon_derived_model:
# is_llama_derived_model:
# # Please note that if you set this to true, `padding_side` will be set to "left" by default
# is_mistral_derived_model:
# is_qwen_derived_model:

# # optional overrides to the base model configuration
# model_config:
#   # RoPE Scaling https://github.com/huggingface/transformers/pull/24653
#   rope_scaling:
#     type: # linear | dynamic
#     factor: # float

# # Whether you are training a 4-bit GPTQ quantized model
# gptq: true
# gptq_groupsize: 128 # group size
# gptq_model_v1: false # v1 or v2

# # This will attempt to quantize the model down to 8 bits and use adam 8 bit optimizer
# load_in_8bit: true
# # Use bitsandbytes 4 bit
# load_in_4bit:

# # Use CUDA bf16
# bf16: true # bool or 'full' for `bf16_full_eval`. require >=ampere
# # Use CUDA fp16
# fp16: true
# # Use CUDA tf32
# tf32: true # require >=ampere

# # No AMP (automatic mixed precision)
# bfloat16: true # require >=ampere
# float16: true

# # A list of one or more datasets to finetune the model with
# datasets:
#   # HuggingFace dataset repo | s3://,gs:// path | "json" for local dataset, make sure to fill data_files
#   - path: vicgalle/alpaca-gpt4
#   # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
#     type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
#     ds_type: # Optional[str] (json|arrow|parquet|text|csv) defines the datatype when path is a file
#     data_files: # Optional[str] path to source data files
#     shards: # Optional[int] number of shards to split data into
#     name: # Optional[str] name of dataset configuration to load
#     train_on_split: train # Optional[str] name of dataset split to load from

#     # Optional[str] fastchat conversation type, only used with type: sharegpt
#     conversation:  # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
#     field_human: # Optional[str]. Human key to use for conversation.
#     field_model: # Optional[str]. Assistant key to use for conversation.

#   # Custom user prompt
#   - path: repo
#     type:
#       # The below are defaults. only set what's needed.
#       system_prompt: ""
#       system_format: "{system}"
#       field_system: system
#       field_instruction: instruction
#       field_input: input
#       field_output: output

#       # Customizable to be single line or multi-line
#       # 'format' can include {input}
#       format: |-
#         User: {instruction} {input}
#         Assistant:
#       # 'no_input_format' cannot include {input}
#       no_input_format: "{instruction} "

#       # For `completion` datasets only, uses the provided field instead of `text` column
#       field:

# # Axolotl attempts to save the dataset as an arrow after packing the data together so
# # subsequent training attempts load faster, relative path
# dataset_prepared_path: data/last_run_prepared
# # Push prepared dataset to hub
# push_dataset_to_hub: # repo path
# # The maximum number of processes to use while preprocessing your input dataset. This defaults to `os.cpu_count()`
# # if not set.
# dataset_num_proc: # defaults to os.cpu_count() if not set
# # push checkpoints to hub
# hub_model_id: # repo path to push finetuned model
# # how to push checkpoints to hub
# # https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
# hub_strategy:
# # Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# # Required to be true when used in combination with `push_dataset_to_hub`
# hf_use_auth_token: # boolean
# # How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for no eval.
# val_set_size: 0.04
# # Num shards for whole dataset
# dataset_shard_num:
# # Index of shard to use for whole dataset
# dataset_shard_idx:

# # The maximum length of an input to train with, this should typically be less than 2048
# # as most models have a token/context limit of 2048
# sequence_len: 2048
# # Pad inputs so each step uses constant sized buffers
# # This will reduce memory fragmentation and may prevent OOMs, by re-using memory more efficiently
# pad_to_sequence_len:
# # Max sequence length to concatenate training samples together up to
# # Inspired by StackLLaMA. see https://huggingface.co/blog/stackllama#supervised-fine-tuning
# # FutureWarning: This will soon be DEPRECATED
# max_packed_sequence_len: 1024
# # Use efficient multi-packing with block diagonal attention and per sequence position_ids. Recommend set to 'true'
# sample_packing:
# # Set to 'false' if getting errors during eval with sample_packing on.
# eval_sample_packing:
# # You can set these packing optimizations AFTER starting a training at least once.
# # The trainer will provide recommended values for these values.
# sample_packing_eff_est:
# total_num_tokens:

# # If you want to use 'lora' or 'qlora' or leave blank to train all parameters in original model
# adapter: lora
# # If you already have a lora model trained that you want to load, put that here.
# # This means after training, if you want to test the model, you should set this to the value of `lora_out_dir`.
# lora_model_dir:

# # LoRA hyperparameters
# # For more details about the following options, see:
# # https://www.anyscale.com/blog/fine-tuning-llms-lora-or-full-parameter-an-in-depth-analysis-with-llama-2
# lora_r: 8
# lora_alpha: 16
# lora_dropout: 0.05
# lora_target_modules:
#   - q_proj
#   - v_proj
# #  - k_proj
# #  - o_proj
# #  - gate_proj
# #  - down_proj
# #  - up_proj
# lora_target_linear: # If true, will target all linear layers

# # If you added new tokens to the tokenizer, you may need to save some LoRA modules because they need to know the new tokens.
# # For LLaMA and Mistral, you need to save `embed_tokens` and `lm_head`. It may vary for other models.
# # `embed_tokens` converts tokens to embeddings, and `lm_head` converts embeddings to token probabilities.
# # https://github.com/huggingface/peft/issues/334#issuecomment-1561727994
# lora_modules_to_save:
# #  - embed_tokens
# #  - lm_head

# # Once you complete training, the model will be saved to the following directory.
# # If you merge the adapter to the base model, a subdirectory `merged` will be created under this directory.
# # Make sure `lora_model_dir` points to this directory if you want to use the trained model.
# lora_out_dir:
# lora_fan_in_fan_out: false

# # ReLoRA configuration
# # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
# relora_steps: # Number of steps per ReLoRA restart
# relora_warmup_steps: # Number of per-restart warmup steps
# relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings

# # wandb configuration if you're using it
# wandb_mode: # "offline" to save run metadata locally and not sync to the server, "disabled" to turn off wandb
# wandb_project: # Your wandb project name
# wandb_entity: # A wandb Team name if using a Team
# wandb_watch:
# wandb_run_id: # Set the name of your wandb run
# wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_steps` or "end" to log only at the end of training

# # Where to save the full-finetuned model to
# output_dir: ./completed-model

# # Whether to use torch.compile and which backend to use
# torch_compile:  # bool
# torch_compile_backend:  # Optional[str]

# # Training hyperparameters

# # If greater than 1, backpropagation will be skipped and the gradients will be accumulated for the given number of steps.
# gradient_accumulation_steps: 1
# # The number of samples to include in each batch. This is the number of samples sent to each GPU.
# micro_batch_size: 2
# eval_batch_size:
# num_epochs: 4
# warmup_steps: 100  # cannot use with warmup_ratio
# warmup_ratio: 0.05  # cannot use with warmup_steps
# learning_rate: 0.00003
# lr_quadratic_warmup:
# logging_steps:
# save_strategy: # Set to `no` to skip checkpoint saves
# save_steps: # Leave empty to save at each epoch
# eval_steps: # Leave empty to eval at each epoch, integers for every N steps. decimal for fraction of total steps
# save_total_limit: # Checkpoints saved at a time
# # Maximum number of iterations to train for. It precedes num_epochs which means that
# # if both are set, num_epochs will not be guaranteed.
# # e.g., when 1 epoch is 1000 steps => `num_epochs: 2` and `max_steps: 100` will train for 100 steps
# max_steps:

# eval_table_size: # Approximate number of predictions sent to wandb depending on batch size. Enabled above 0. Default is 0
# eval_table_max_new_tokens: # Total number of tokens generated for predictions sent to wandb. Default is 128

# # Whether to mask out or include the human's prompt from the training labels
# train_on_inputs: false
# # Group similarly sized data to minimize padding.
# # May be slower to start, as it must download and sort the entire dataset.
# # Note that training loss may have an oscillating pattern with this enabled.
# group_by_length: false

# # Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
# gradient_checkpointing: false

# # Stop training after this many evaluation losses have increased in a row
# # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
# early_stopping_patience: 3

# # Specify a scheduler and kwargs to use with the optimizer
# lr_scheduler: # 'one_cycle' | empty for cosine
# lr_scheduler_kwargs:

# # For one_cycle optim
# lr_div_factor: # Learning rate div factor

# # Specify optimizer
# # Valid values are driven by the Transformers OptimizerNames class, see:
# # https://github.com/huggingface/transformers/blob/95b374952dc27d8511541d6f5a4e22c9ec11fb24/src/transformers/training_args.py#L134
# #
# # Note that not all optimizers may be available in your environment, ex: 'adamw_anyprecision' is part of
# # torchdistx, 'adamw_bnb_8bit' is part of bnb.optim.Adam8bit, etc. When in doubt, it is recommended to start with the optimizer used
# # in the examples/ for your model and fine-tuning use case.
# #
# # Valid values for 'optimizer' include:
# # - adamw_hf
# # - adamw_torch
# # - adamw_torch_fused
# # - adamw_torch_xla
# # - adamw_apex_fused
# # - adafactor
# # - adamw_anyprecision
# # - sgd
# # - adagrad
# # - adamw_bnb_8bit
# # - lion_8bit
# # - lion_32bit
# # - paged_adamw_32bit
# # - paged_adamw_8bit
# # - paged_lion_32bit
# # - paged_lion_8bit
# optimizer:
# # Specify weight decay
# weight_decay:
# # adamw hyperparams
# adam_beta1:
# adam_beta2:
# adam_epsilon:
# # Gradient clipping max norm
# max_grad_norm:

# # Augmentation techniques
# # NEFT https://arxiv.org/abs/2310.05914, set this to a number (paper default is 5) to add noise to embeddings
# # currently only supported on Llama and Mistral
# noisy_embedding_alpha:

# # Whether to bettertransformers
# flash_optimum:
# # Whether to use xformers attention patch https://github.com/facebookresearch/xformers:
# xformers_attention:
# # Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention:
# flash_attention:
# flash_attn_cross_entropy:  # Whether to use flash-attention cross entropy implementation - advanced use only
# flash_attn_rms_norm:  # Whether to use flash-attention rms norm implementation - advanced use only
# flash_attn_fuse_mlp: # Whether to fuse part of the MLP into a single operation
# # Whether to use scaled-dot-product attention
# # https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html
# sdp_attention:
# # Landmark attention (only llama)
# landmark_attention:
# # xpos RoPE see https://github.com/kaiokendev/cutoff-len-is-context-len/blob/main/util/xpos_rope_llama_monkey_patch.py
# # LLaMA only
# xpos_rope:

# # Resume from a specific checkpoint dir
# resume_from_checkpoint:
# # If resume_from_checkpoint isn't set and you simply want it to start where it left off.
# # Be careful with this being turned on between different models.
# auto_resume_from_checkpoints: false

# # Don't mess with this, it's here for accelerate and torchrun
# local_rank:

# # Add or change special tokens.
# # If you add tokens here, you don't need to add them to the `tokens` list.
# special_tokens:
#   # bos_token: "<s>"
#   # eos_token: "</s>"
#   # unk_token: "<unk>"

# # Add extra tokens.
# tokens:

# # FSDP
# fsdp:
# fsdp_config:

# # Deepspeed config path. e.g., deepspeed/zero3.json
# deepspeed:

# # Advanced DDP Arguments
# ddp_timeout:
# ddp_bucket_cap_mb:
# ddp_broadcast_buffers:

# # Path to torch distx for optim 'adamw_anyprecision'
# torchdistx_path:

# # Set to HF dataset for type: 'completion' for streaming instead of pre-tokenize
# pretraining_dataset:

# # Debug mode
# debug:

# # Seed
# seed:

# # Allow overwrite yml config using from cli
# strict:

base_model: ${BASE_MODEL}
base_model_ignore_patterns: ${BASE_MODEL_IGNORE_PATTERNS}
base_model_config: ${BASE_MODEL_CONFIG}
revision_of_model: ${REVISION_OF_MODEL}
tokenizer_config: ${TOKENIZER_CONFIG}
model_type: ${MODEL_TYPE}
tokenizer_type: ${TOKENIZER_TYPE}
trust_remote_code: ${TRUST_REMOTE_CODE}
tokenizer_use_fast: ${TOKENIZER_USE_FAST}
tokenizer_legacy: ${TOKENIZER_LEGACY}
resize_token_embeddings_to_32x: ${RESIZE_TOKEN_EMBEDDINGS_TO_32X}

is_falcon_derived_model: ${IS_FALCON_DERIVED_MODEL}
is_llama_derived_model: ${IS_LLAMA_DERIVED_MODEL}
is_qwen_derived_model: ${IS_QWEN_DERIVED_MODEL}
is_mistral_derived_model: ${IS_MISTRAL_DERIVED_MODEL}

overrides_of_model_config:
  rope_scaling:
    type: ${ROPE_SCALING_TYPE}
    factor: ${ROPE_SCALING_FACTOR}

bnb_config_kwargs:
  llm_int8_has_fp16_weight: ${BNB_LLM_INT8_HAS_FP16_WEIGHT}
  bnb_4bit_quant_type: ${BNB_4BIT_QUANT_TYPE}
  bnb_4bit_use_double_quant: ${BNB_4BIT_USE_DOUBLE_QUANT}

gptq: ${GPTQ}
load_in_8bit: ${LOAD_IN_8BIT}
load_in_4bit: ${LOAD_IN_4BIT}
bf16: ${BF16}
fp16: ${FP16}
tf32: ${TF32}
bfloat16: ${BFLOAT16}
float16: ${FLOAT16}

gpu_memory_limit: ${GPU_MEMORY_LIMIT}
lora_on_cpu: ${LORA_ON_CPU}

datasets:
  - path: ${DATASET_PATH}
    type: ${DATASET_TYPE}
    ds_type: ${DATASET_DS_TYPE}
    data_files: ${DATASET_DATA_FILES}
    shards: ${DATASET_SHARDS}
    name: ${DATASET_NAME}
    train_on_split: ${DATASET_TRAIN_ON_SPLIT}
    revision: ${DATASET_REVISION}
    trust_remote_code: ${DATASET_TRUST_REMOTE_CODE}

rl: ${RL}
dpo_use_weighting: ${DPO_USE_WEIGHTING}

chat_template: ${CHAT_TEMPLATE}
chat_template_jinja: ${CHAT_TEMPLATE_JINJA}
default_system_message: ${DEFAULT_SYSTEM_MESSAGE}
dataset_prepared_path: ${DATASET_PREPARED_PATH}
push_dataset_to_hub: ${PUSH_DATASET_TO_HUB}
dataset_num_proc: ${DATASET_NUM_PROC}
dataset_keep_in_memory: ${DATASET_KEEP_IN_MEMORY}
hub_model_id: ${HUB_MODEL_ID}
hub_strategy: ${HUB_STRATEGY}
hf_use_auth_token: ${HF_USE_AUTH_TOKEN}
val_set_size: ${VAL_SET_SIZE}
dataset_shard_num: ${DATASET_SHARD_NUM}
dataset_shard_idx: ${DATASET_SHARD_IDX}

sequence_len: ${SEQUENCE_LEN}
pad_to_sequence_len: ${PAD_TO_SEQUENCE_LEN}
sample_packing: ${SAMPLE_PACKING}
eval_sample_packing: ${EVAL_SAMPLE_PACKING}
sample_packing_eff_est: ${SAMPLE_PACKING_EFF_EST}
total_num_tokens: ${TOTAL_NUM_TOKENS}
sample_packing_group_size: ${SAMPLE_PACKING_GROUP_SIZE}
sample_packing_bin_size: ${SAMPLE_PACKING_BIN_SIZE}

batch_flattening: ${BATCH_FLATTENING}
device_map: ${DEVICE_MAP}
max_memory: ${MAX_MEMORY}

adapter: ${ADAPTER}
lora_model_dir: ${LORA_MODEL_DIR}

lora_r: ${LORA_R}
lora_alpha: ${LORA_ALPHA}
lora_dropout: ${LORA_DROPOUT}
lora_target_modules:
  - ${LORA_TARGET_MODULES}
lora_target_linear: ${LORA_TARGET_LINEAR}
peft_layers_to_transform: ${PEFT_LAYERS_TO_TRANSFORM}
lora_modules_to_save: ${LORA_MODULES_TO_SAVE}
lora_fan_in_fan_out: ${LORA_FAN_IN_FAN_OUT}

loraplus_lr_ratio: ${LORAPLUS_LR_RATIO}
loraplus_lr_embedding: ${LORAPLUS_LR_EMBEDDING}

peft:
  loftq_config:
    loftq_bits: ${LOFTQ_BITS}

relora_steps: ${RELORA_STEPS}
relora_warmup_steps: ${RELORA_WARMUP_STEPS}
relora_anneal_steps: ${RELORA_ANNEAL_STEPS}
relora_prune_ratio: ${RELORA_PRUNE_RATIO}
relora_cpu_offload: ${RELORA_CPU_OFFLOAD}

wandb_mode: ${WANDB_MODE}
wandb_project: ${WANDB_PROJECT}
wandb_entity: ${WANDB_ENTITY}
wandb_watch: ${WANDB_WATCH}
wandb_name: ${WANDB_NAME}
wandb_run_id: ${WANDB_RUN_ID}
wandb_log_model: ${WANDB_LOG_MODEL}

mlflow_tracking_uri: ${MLFLOW_TRACKING_URI}
mlflow_experiment_name: ${MLFLOW_EXPERIMENT_NAME}
mlflow_run_name: ${MLFLOW_RUN_NAME}
hf_mlflow_log_artifacts: ${HF_MLFLOW_LOG_ARTIFACTS}

use_comet: ${USE_COMET}
comet_api_key: ${COMET_API_KEY}
comet_workspace: ${COMET_WORKSPACE}
comet_project_name: ${COMET_PROJECT_NAME}
comet_experiment_key: ${COMET_EXPERIMENT_KEY}
comet_mode: ${COMET_MODE}
comet_online: ${COMET_ONLINE}
comet_experiment_config: ${COMET_EXPERIMENT_CONFIG}

output_dir: ${OUTPUT_DIR}

torch_compile: ${TORCH_COMPILE}
torch_compile_backend: ${TORCH_COMPILE_BACKEND}

gradient_accumulation_steps: ${GRADIENT_ACCUMULATION_STEPS}
micro_batch_size: ${MICRO_BATCH_SIZE}
eval_batch_size: ${EVAL_BATCH_SIZE}
num_epochs: ${NUM_EPOCHS}
warmup_steps: ${WARMUP_STEPS}
warmup_ratio: ${WARMUP_RATIO}
learning_rate: ${LEARNING_RATE}
lr_quadratic_warmup: ${LR_QUADRATIC_WARMUP}
logging_steps: ${LOGGING_STEPS}
eval_steps: ${EVAL_STEPS}
evals_per_epoch: ${EVALS_PER_EPOCH}
save_strategy: ${SAVE_STRATEGY}
save_steps: ${SAVE_STEPS}
saves_per_epoch: ${SAVES_PER_EPOCH}
save_total_limit: ${SAVE_TOTAL_LIMIT}
max_steps: ${MAX_STEPS}

eval_table_size: ${EVAL_TABLE_SIZE}
eval_max_new_tokens: ${EVAL_MAX_NEW_TOKENS}
eval_causal_lm_metrics: ${EVAL_CAUSAL_LM_METRICS}

profiler_steps: ${PROFILER_STEPS}
loss_watchdog_threshold: ${LOSS_WATCHDOG_THRESHOLD}
loss_watchdog_patience: ${LOSS_WATCHDOG_PATIENCE}

train_on_inputs: ${TRAIN_ON_INPUTS}
group_by_length: ${GROUP_BY_LENGTH}
gradient_checkpointing: ${GRADIENT_CHECKPOINTING}
early_stopping_patience: ${EARLY_STOPPING_PATIENCE}

lr_scheduler: ${LR_SCHEDULER}
lr_scheduler_kwargs: ${LR_SCHEDULER_KWARGS}
cosine_min_lr_ratio: ${COSINE_MIN_LR_RATIO}
cosine_constant_lr_ratio: ${COSINE_CONSTANT_LR_RATIO}
lr_div_factor: ${LR_DIV_FACTOR}

optimizer: ${OPTIMIZER}
optim_args: ${OPTIM_ARGS}
optim_target_modules: ${OPTIM_TARGET_MODULES}
weight_decay: ${WEIGHT_DECAY}
adam_beta1: ${ADAM_BETA1}
adam_beta2: ${ADAM_BETA2}
adam_epsilon: ${ADAM_EPSILON}
max_grad_norm: ${MAX_GRAD_NORM}

neftune_noise_alpha: ${NEFTUNE_NOISE_ALPHA}

flash_optimum: ${FLASH_OPTIMUM}
xformers_attention: ${XFORMERS_ATTENTION}
flash_attention: ${FLASH_ATTENTION}
flash_attn_cross_entropy: ${FLASH_ATTN_CROSS_ENTROPY}
flash_attn_rms_norm: ${FLASH_ATTN_RMS_NORM}
flash_attn_fuse_mlp: ${FLASH_ATTN_FUSE_MLP}
sdp_attention: ${SDP_ATTENTION}
s2_attention: ${S2_ATTENTION}
resume_from_checkpoint: ${RESUME_FROM_CHECKPOINT}
auto_resume_from_checkpoints: ${AUTO_RESUME_FROM_CHECKPOINTS}

local_rank: ${LOCAL_RANK}

special_tokens:
  bos_token: ${SPECIAL_TOKEN_BOS}
  eos_token: ${SPECIAL_TOKEN_EOS}
  unk_token: ${SPECIAL_TOKEN_UNK}
  pad_token: ${SPECIAL_TOKEN_PAD}

tokens: ${TOKENS}

fsdp: ${FSDP}
fsdp_config: ${FSDP_CONFIG}
deepspeed: ${DEEPSPEED}

ddp_timeout: ${DDP_TIMEOUT}
ddp_bucket_cap_mb: ${DDP_BUCKET_CAP_MB}
ddp_broadcast_buffers: ${DDP_BROADCAST_BUFFERS}

torchdistx_path: ${TORCHDISTX_PATH}
pretraining_dataset: ${PRETRAINING_DATASET}
debug: ${DEBUG}
seed: ${SEED}
strict: ${STRICT}


================================================
FILE: .runpod/src/handler.py
================================================
"""
Runpod serverless entrypoint handler
"""

import os

import runpod
import yaml
from huggingface_hub._login import login
from train import train
from utils import get_output_dir

BASE_VOLUME = os.environ.get("BASE_VOLUME", "/runpod-volume")
if not os.path.exists(BASE_VOLUME):
    os.makedirs(BASE_VOLUME)

logger = runpod.RunPodLogger()


async def handler(job):
    runpod_job_id = job["id"]
    inputs = job["input"]
    run_id = inputs.get("run_id", "default_run_id")
    args = inputs.get("args", {})

    # Set output directory
    output_dir = os.path.join(BASE_VOLUME, get_output_dir(run_id))
    args["output_dir"] = output_dir

    # First save args to a temporary config file
    config_path = "/workspace/test_config.yaml"

    # Add run_name and job_id to args before saving
    args["run_name"] = run_id
    args["runpod_job_id"] = runpod_job_id

    yaml_data = yaml.dump(args, default_flow_style=False)
    with open(config_path, "w", encoding="utf-8") as file:
        file.write(yaml_data)

    # Handle credentials
    credentials = inputs.get("credentials", {})

    if "wandb_api_key" in credentials:
        os.environ["WANDB_API_KEY"] = credentials["wandb_api_key"]
    if "hf_token" in credentials:
        os.environ["HF_TOKEN"] = credentials["hf_token"]

    if os.environ.get("HF_TOKEN"):
        login(token=os.environ["HF_TOKEN"])
    else:
        logger.info("No HF_TOKEN provided. Skipping login.")

    logger.info("Starting Training.")
    async for result in train(config_path):  # Pass the config path instead of args
        logger.info(result)
    logger.info("Training Complete.")

    # Cleanup
    if "WANDB_API_KEY" in os.environ:
        del os.environ["WANDB_API_KEY"]
    if "HF_TOKEN" in os.environ:
        del os.environ["HF_TOKEN"]


runpod.serverless.start({"handler": handler, "return_aggregate_stream": True})


================================================
FILE: .runpod/src/test_input.json
================================================
{
  "input": {
    "user_id": "user",
    "model_id": "llama-test",
    "run_id": "llama-test",
    "credentials": {
      "wandb_api_key": "",
      "hf_token": ""
    },
    "args": {
      "base_model": "NousResearch/Meta-Llama-3-8B",
      "model_type": "LlamaForCausalLM",
      "tokenizer_type": "AutoTokenizer",
      "load_in_8bit": true,
      "load_in_4bit": false,
      "strict": false,
      "datasets": [
        {
          "path": "mhenrichsen/alpaca_2k_test",
          "type": "alpaca"
        }
      ],
      "val_set_size": 0.05,
      "output_dir": "./outputs/lora-out",
      "sequence_len": 4096,
      "sample_packing": true,
      "eval_sample_packing": false,
      "pad_to_sequence_len": true,
      "adapter": "lora",
      "lora_r": 32,
      "lora_alpha": 16,
      "lora_dropout": 0.05,
      "lora_target_linear": true,
      "lora_modules_to_save": [
        "embed_tokens",
        "lm_head"
      ],
      "gradient_accumulation_steps": 4,
      "micro_batch_size": 2,
      "num_epochs": 1,
      "optimizer": "adamw_bnb_8bit",
      "lr_scheduler": "cosine",
      "learning_rate": 0.0002,
      "train_on_inputs": false,
      "group_by_length": false,
      "bf16": "auto",
      "tf32": false,
      "gradient_checkpointing": true,
      "logging_steps": 1,
      "flash_attention": true,
      "warmup_steps": 1,
      "evals_per_epoch": 1,
      "eval_max_new_tokens": 128,
      "saves_per_epoch": 1,
      "weight_decay": 0.0,
      "special_tokens": {
        "pad_token": "<|end_of_text|>"
      }
    }
  }
}


================================================
FILE: .runpod/src/train.py
================================================
"""
Runpod train entrypoint
"""

import asyncio


async def train(config_path: str, gpu_id: str = "0", preprocess: bool = True):
    """
    Run preprocessing (if enabled) and training with the given config file
    :param config_path: Path to the YAML config file
    :param gpu_id: GPU ID to use (default: "0")
    :param preprocess: Whether to run preprocessing (default: True)

    """
    # First check if preprocessing is needed
    if preprocess:
        # Preprocess command
        preprocess_cmd = (
            f"CUDA_VISIBLE_DEVICES={gpu_id} axolotl preprocess {config_path}"
        )
        process = await asyncio.create_subprocess_shell(
            preprocess_cmd,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.STDOUT,
        )

        if process.stdout is not None:
            async for line in process.stdout:
                yield f"Preprocessing: {line.decode().strip()}"
        await process.wait()
        yield "Preprocessing completed."
    else:
        yield "Skipping preprocessing step."

    # Training command
    train_cmd = f"axolotl train {config_path}"
    process = await asyncio.create_subprocess_shell(
        train_cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.STDOUT
    )

    if process.stdout is not None:
        async for line in process.stdout:
            yield f"Training: {line.decode().strip()}"
    await process.wait()


================================================
FILE: .runpod/src/utils.py
================================================
"""
Runpod launcher utils
"""

import os

import yaml


def get_output_dir(run_id):
    path = f"fine-tuning/{run_id}"
    return path


def make_valid_config(input_args):
    """
    Creates and saves updated config file, returns the path to the new config
    :param input_args: dict of input args
    :return: str, path to the updated config file
    """
    # Load default config
    with open("config/config.yaml", "r", encoding="utf-8") as fin:
        all_args = yaml.safe_load(fin)

    if not input_args:
        print("No args provided, using defaults")
    else:
        all_args.update(input_args)

    # Create updated config path
    updated_config_path = "config/updated_config.yaml"

    # Save updated config to new file
    with open(updated_config_path, "w", encoding="utf-8") as f:
        yaml.dump(all_args, f)

    return updated_config_path


def set_config_env_vars(args: dict):
    """
    Convert API arguments into environment variables.
    Handles nested dictionaries, lists, and special values.

    Args:
        args (dict): The arguments dictionary from the API request
    """

    def process_value(value):
        """Convert Python values to string format for environment variables"""
        if value is None:
            return ""
        if isinstance(value, bool):
            return str(value).lower()
        if isinstance(value, (list, dict)):
            return str(value)
        return str(value)

    def set_env_vars(data, prefix=""):
        """Recursively set environment variables from nested dictionary"""
        for key, value in data.items():
            env_key = prefix + key.upper()

            # Handle special cases
            if isinstance(value, dict):
                # For nested dictionaries (like special_tokens)
                set_env_vars(value, f"{env_key}_")
            elif isinstance(value, list):
                # Handle list of dictionaries (like datasets)
                if value and isinstance(value[0], dict):
                    for i, item in enumerate(value):
                        set_env_vars(item, f"{env_key}_{i}_")
                else:
                    # For simple lists (like lora_target_modules)
                    os.environ[env_key] = process_value(value)
            else:
                # Handle all other cases
                os.environ[env_key] = process_value(value)

    # Clear any existing related environment variables
    # This prevents old values from persisting
    for key in list(os.environ.keys()):
        if key.startswith(
            ("BASE_MODEL", "MODEL_TYPE", "TOKENIZER_TYPE", "DATASET", "LORA_", "WANDB_")
        ):
            del os.environ[key]

    # Set new environment variables
    set_env_vars(args)


================================================
FILE: .runpod/test-input.json
================================================
{
  "input": {
    "name": "quick_smoke_test_sft",
    "user_id": "user",
    "model_id": "llama-test",
    "run_id": "llama-test",
    "credentials": {
      "wandb_api_key": "",
      "hf_token": ""
    },
    "args": {
      "base_model": "HuggingFaceTB/SmolLM2-135M",
      "model_type": "AutoModelForCausalLM",
      "tokenizer_type": "AutoTokenizer",
      "load_in_4bit": true,
      "strict": false,
      "datasets": [
        {
          "path": "mhenrichsen/alpaca_2k_test",
          "type": "alpaca",
          "split": "train[:10%]"
        }
      ],
      "val_set_size": 0.02,
      "output_dir": "./outputs/lora-out",
      "sequence_len": 4096,
      "sample_packing": true,
      "eval_sample_packing": false,
      "pad_to_sequence_len": true,
      "adapter": "qlora",
      "lora_r": 32,
      "lora_alpha": 64,
      "lora_dropout": 0.05,
      "lora_target_linear": true,
      "lora_modules_to_save": [
        "embed_tokens",
        "lm_head"
      ],
      "gradient_accumulation_steps": 2,
      "micro_batch_size": 1,
      "num_epochs": 1,
      "optimizer": "adamw_torch_fused",
      "lr_scheduler": "cosine",
      "learning_rate": 0.0002,
      "train_on_inputs": false,
      "group_by_length": false,
      "bf16": "auto",
      "tf32": true,
      "gradient_checkpointing": true,
      "logging_steps": 1,
      "flash_attention": true,
      "warmup_steps": 1,
      "evals_per_epoch": 1,
      "eval_max_new_tokens": 128,
      "saves_per_epoch": 1,
      "weight_decay": 0.0,
      "special_tokens": {
        "pad_token": "<|endoftext|>"
      },
      "max_steps": 20
    },
    "timeout": 100000
  },
  "config": {
    "gpuTypeId": "NVIDIA GeForce RTX 4090",
    "gpuCount": 1,
    "containerDiskInGb": 200,
    "env": [
      {
        "key": "TOKENIZER",
        "value": ""
      },
      {
        "key": "DISABLE_LOG_STATS",
        "value": "true"
      }
    ],
    "allowedCudaVersions": [
      "12.8",
      "12.7",
      "12.6",
      "12.5",
      "12.4"
    ]
  }
}


================================================
FILE: .runpod/tests.json
================================================
{
  "tests": [
    {
      "name": "quick_smoke_test_sft",
      "input": {
        "user_id": "user",
        "model_id": "llama-test",
        "run_id": "llama-test",
        "credentials": {
          "wandb_api_key": "",
          "hf_token": ""
        },
        "args": {
          "base_model": "HuggingFaceTB/SmolLM2-135M",
          "model_type": "AutoModelForCausalLM",
          "tokenizer_type": "AutoTokenizer",
          "load_in_4bit": true,
          "strict": false,
          "datasets": [
            {
              "path": "mhenrichsen/alpaca_2k_test",
              "type": "alpaca",
              "split": "train[:10%]"
            }
          ],
          "val_set_size": 0.02,
          "output_dir": "./outputs/lora-out",
          "sequence_len": 4096,
          "sample_packing": true,
          "eval_sample_packing": false,
          "pad_to_sequence_len": true,
          "adapter": "qlora",
          "lora_r": 32,
          "lora_alpha": 64,
          "lora_dropout": 0.05,
          "lora_target_linear": true,
          "lora_modules_to_save": [
            "embed_tokens",
            "lm_head"
          ],
          "gradient_accumulation_steps": 2,
          "micro_batch_size": 1,
          "num_epochs": 1,
          "optimizer": "adamw_torch_fused",
          "lr_scheduler": "cosine",
          "learning_rate": 0.0002,
          "train_on_inputs": false,
          "group_by_length": false,
          "bf16": "auto",
          "tf32": true,
          "gradient_checkpointing": true,
          "logging_steps": 1,
          "flash_attention": true,
          "warmup_steps": 1,
          "evals_per_epoch": 1,
          "eval_max_new_tokens": 128,
          "saves_per_epoch": 1,
          "weight_decay": 0.0,
          "special_tokens": {
            "pad_token": "<|endoftext|>"
          },
          "max_steps": 20
        }
      },
      "timeout": 100000
    }
  ],
  "config": {
    "gpuTypeId": "NVIDIA GeForce RTX 4090",
    "gpuCount": 1,
    "containerDiskInGb": 200,
    "env": [
      {
        "key": "TOKENIZER",
        "value": ""
      },
      {
        "key": "DISABLE_LOG_STATS",
        "value": "true"
      }
    ],
    "allowedCudaVersions": [
      "12.8",
      "12.7",
      "12.6",
      "12.5",
      "12.4"
    ]
  }
}


================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
type: software
title: "Axolotl: Open Source LLM Post-Training"
message: "If you use this software, please cite it as below."
authors:
  - name: "Axolotl maintainers and contributors"
repository-code: "https://github.com/axolotl-ai-cloud/axolotl"
url: "https://axolotl.ai/"
license: Apache-2.0
date-released: "2023-05-30"


================================================
FILE: CNAME
================================================
docs.axolotl.ai


================================================
FILE: FAQS.md
================================================
# FAQs

- Can you train StableLM with this? Yes, but only with a single GPU atm. Multi GPU support is coming soon! Just waiting on this [PR](https://github.com/huggingface/transformers/pull/22874)
- Will this work with Deepspeed? That's still a WIP, but setting `export ACCELERATE_USE_DEEPSPEED=true` should work in some cases
- `Error invalid argument at line 359 in file /workspace/bitsandbytes/csrc/pythonInterface.c`
`/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.`
This could lead to a segmentation fault at exit. Try reinstalling bitsandbytes and transformers from source.


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: MANIFEST.in
================================================
include requirements.txt
include README.md
include LICENSE
include src/setuptools_axolotl_dynamic_dependencies.py
include src/axolotl/utils/chat_templates/templates/*.jinja
recursive-include axolotl *.py


================================================
FILE: README.md
================================================
<p align="center">
    <picture>
        <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/887513285d98132142bf5db2a74eb5e0928787f1/image/axolotl_logo_digital_white.svg">
        <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/887513285d98132142bf5db2a74eb5e0928787f1/image/axolotl_logo_digital_black.svg">
        <img alt="Axolotl" src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/887513285d98132142bf5db2a74eb5e0928787f1/image/axolotl_logo_digital_black.svg" width="400" height="104" style="max-width: 100%;">
    </picture>
</p>
  <p align="center">
      <strong>A Free and Open Source LLM Fine-tuning Framework</strong><br>
  </p>

<p align="center">
    <img src="https://img.shields.io/github/license/axolotl-ai-cloud/axolotl.svg?color=blue" alt="GitHub License">
    <img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests.yml/badge.svg" alt="tests">
    <a href="https://codecov.io/gh/axolotl-ai-cloud/axolotl"><img src="https://codecov.io/gh/axolotl-ai-cloud/axolotl/branch/main/graph/badge.svg" alt="codecov"></a>
    <a href="https://github.com/axolotl-ai-cloud/axolotl/releases"><img src="https://img.shields.io/github/release/axolotl-ai-cloud/axolotl.svg" alt="Releases"></a>
    <br/>
    <a href="https://github.com/axolotl-ai-cloud/axolotl/graphs/contributors"><img src="https://img.shields.io/github/contributors-anon/axolotl-ai-cloud/axolotl?color=yellow&style=flat-square" alt="contributors" style="height: 20px;"></a>
    <img src="https://img.shields.io/github/stars/axolotl-ai-cloud/axolotl" alt="GitHub Repo stars">
    <br/>
    <a href="https://discord.com/invite/HhrNrHJPRb"><img src="https://img.shields.io/badge/discord-7289da.svg?style=flat-square&logo=discord" alt="discord" style="height: 20px;"></a>
    <a href="https://twitter.com/axolotl_ai"><img src="https://img.shields.io/twitter/follow/axolotl_ai?style=social" alt="twitter" style="height: 20px;"></a>
    <a href="https://colab.research.google.com/github/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google-colab" style="height: 20px;"></a>
    <br/>
    <img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/tests-nightly.yml/badge.svg" alt="tests-nightly">
    <img src="https://github.com/axolotl-ai-cloud/axolotl/actions/workflows/multi-gpu-e2e.yml/badge.svg" alt="multigpu-semi-weekly tests">
</p>


## 🎉 Latest Updates

- 2026/03:
  - New model support has been added in Axolotl for [Mistral Small 4](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/mistral4), [Qwen3.5, Qwen3.5 MoE](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3.5), [GLM-4.7-Flash](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm47-flash), [GLM-4.6V](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm46v), and [GLM-4.5-Air](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/glm45).
  - [MoE expert quantization](https://docs.axolotl.ai/docs/expert_quantization.html) support (via `quantize_moe_experts: true`) greatly reduces VRAM when training MoE models (FSDP2 compat).
- 2026/02:
  - [ScatterMoE LoRA](https://github.com/axolotl-ai-cloud/axolotl/pull/3410) support. LoRA fine-tuning directly on MoE expert weights using custom Triton kernels.
  - Axolotl now has support for [SageAttention](https://github.com/axolotl-ai-cloud/axolotl/pull/2823) and [GDPO](https://github.com/axolotl-ai-cloud/axolotl/pull/3353) (Generalized DPO).
- 2026/01:
  - New integration for [EAFT](https://github.com/axolotl-ai-cloud/axolotl/pull/3366) (Entropy-Aware Focal Training), weights loss by entropy of the top-k logit distribution, and [Scalable Softmax](https://github.com/axolotl-ai-cloud/axolotl/pull/3338), improves long context in attention.
- 2025/12:
  - Axolotl now includes support for [Kimi-Linear](https://docs.axolotl.ai/docs/models/kimi-linear.html), [Plano-Orchestrator](https://docs.axolotl.ai/docs/models/plano.html), [MiMo](https://docs.axolotl.ai/docs/models/mimo.html), [InternVL 3.5](https://docs.axolotl.ai/docs/models/internvl3_5.html), [Olmo3](https://docs.axolotl.ai/docs/models/olmo3.html), [Trinity](https://docs.axolotl.ai/docs/models/trinity.html), and [Ministral3](https://docs.axolotl.ai/docs/models/ministral3.html).
  - [Distributed Muon Optimizer](https://github.com/axolotl-ai-cloud/axolotl/pull/3264) support has been added for FSDP2 pretraining.
- 2025/10: New model support has been added in Axolotl for: [Qwen3 Next](https://docs.axolotl.ai/docs/models/qwen3-next.html), [Qwen2.5-vl, Qwen3-vl](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen2_5-vl), [Qwen3, Qwen3MoE](https://docs.axolotl.ai/docs/models/qwen3.html), [Granite 4](https://docs.axolotl.ai/docs/models/granite4.html), [HunYuan](https://docs.axolotl.ai/docs/models/hunyuan.html), [Magistral 2509](https://docs.axolotl.ai/docs/models/magistral/vision.html), [Apertus](https://docs.axolotl.ai/docs/models/apertus.html), and [Seed-OSS](https://docs.axolotl.ai/docs/models/seed-oss.html).

<details>

<summary>Expand older updates</summary>

- 2025/09: Axolotl now has text diffusion training. Read more [here](https://github.com/axolotl-ai-cloud/axolotl/tree/main/src/axolotl/integrations/diffusion).
- 2025/08: QAT has been updated to include NVFP4 support. See [PR](https://github.com/axolotl-ai-cloud/axolotl/pull/3107).
- 2025/07:
  - ND Parallelism support has been added into Axolotl. Compose Context Parallelism (CP), Tensor Parallelism (TP), and Fully Sharded Data Parallelism (FSDP) within a single node and across multiple nodes. Check out the [blog post](https://huggingface.co/blog/accelerate-nd-parallel) for more info.
  - Axolotl adds more models: [GPT-OSS](https://docs.axolotl.ai/docs/models/gpt-oss.html), [Gemma 3n](https://docs.axolotl.ai/docs/models/gemma3n.html), [Liquid Foundation Model 2 (LFM2)](https://docs.axolotl.ai/docs/models/LiquidAI.html), and [Arcee Foundation Models (AFM)](https://docs.axolotl.ai/docs/models/arcee.html).
  - FP8 finetuning with fp8 gather op is now possible in Axolotl via `torchao`. Get started [here](https://docs.axolotl.ai/docs/mixed_precision.html#sec-fp8)!
  - [Voxtral](https://docs.axolotl.ai/docs/models/voxtral.html), [Magistral 1.1](https://docs.axolotl.ai/docs/models/magistral.html), and [Devstral](https://docs.axolotl.ai/docs/models/devstral.html) with mistral-common tokenizer support has been integrated in Axolotl!
  - TiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See [examples](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/alst) for using ALST with Axolotl!
- 2025/06: Magistral with mistral-common tokenizer support has been added to Axolotl. See [docs](https://docs.axolotl.ai/docs/models/magistral.html) to start training your own Magistral models with Axolotl!
- 2025/05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the [docs](https://docs.axolotl.ai/docs/qat.html) to learn more!
- 2025/04: Llama 4 support has been added in Axolotl. See [docs](https://docs.axolotl.ai/docs/models/llama-4.html) to start training your own Llama 4 models with Axolotl's linearized version!
- 2025/03: Axolotl has implemented Sequence Parallelism (SP) support. Read the [blog](https://huggingface.co/blog/axolotl-ai-co/long-context-with-sequence-parallelism-in-axolotl) and [docs](https://docs.axolotl.ai/docs/sequence_parallelism.html) to learn how to scale your context length when fine-tuning.
- 2025/03: (Beta) Fine-tuning Multimodal models is now supported in Axolotl. Check out the [docs](https://docs.axolotl.ai/docs/multimodal.html) to fine-tune your own!
- 2025/02: Axolotl has added LoRA optimizations to reduce memory usage and improve training speed for LoRA and QLoRA in single GPU and multi-GPU training (DDP and DeepSpeed). Jump into the [docs](https://docs.axolotl.ai/docs/lora_optims.html) to give it a try.
- 2025/02: Axolotl has added GRPO support. Dive into our [blog](https://huggingface.co/blog/axolotl-ai-co/training-llms-w-interpreter-feedback-wasm) and [GRPO example](https://github.com/axolotl-ai-cloud/grpo_code) and have some fun!
- 2025/01: Axolotl has added Reward Modelling / Process Reward Modelling fine-tuning support. See [docs](https://docs.axolotl.ai/docs/reward_modelling.html).

</details>

## ✨ Overview

Axolotl is a free and open-source tool designed to streamline post-training and fine-tuning for the latest large language models (LLMs).

Features:

- **Multiple Model Support**: Train various models like GPT-OSS, LLaMA, Mistral, Mixtral, Pythia, and many more models available on the Hugging Face Hub.
- **Multimodal Training**: Fine-tune vision-language models (VLMs) including LLaMA-Vision, Qwen2-VL, Pixtral, LLaVA, SmolVLM2, GLM-4.6V, InternVL 3.5, Gemma 3n, and audio models like Voxtral with image, video, and audio support.
- **Training Methods**: Full fine-tuning, LoRA, QLoRA, GPTQ, QAT, Preference Tuning (DPO, IPO, KTO, ORPO), RL (GRPO, GDPO), and Reward Modelling (RM) / Process Reward Modelling (PRM).
- **Easy Configuration**: Re-use a single YAML configuration file across the full fine-tuning pipeline: dataset preprocessing, training, evaluation, quantization, and inference.
- **Performance Optimizations**: [Multipacking](https://docs.axolotl.ai/docs/multipack.html), [Flash Attention 2/3/4](https://docs.axolotl.ai/docs/attention.html#flash-attention), [Xformers](https://docs.axolotl.ai/docs/attention.html#xformers), [Flex Attention](https://docs.axolotl.ai/docs/attention.html#flex-attention), [SageAttention](https://docs.axolotl.ai/docs/attention.html#sageattention), [Liger Kernel](https://docs.axolotl.ai/docs/custom_integrations.html#liger-kernels), [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy), [ScatterMoE](https://docs.axolotl.ai/docs/custom_integrations.html#kernels-integration), [Sequence Parallelism (SP)](https://docs.axolotl.ai/docs/sequence_parallelism.html), [LoRA optimizations](https://docs.axolotl.ai/docs/lora_optims.html), [Multi-GPU training (FSDP1, FSDP2, DeepSpeed)](https://docs.axolotl.ai/docs/multi-gpu.html), [Multi-node training (Torchrun, Ray)](https://docs.axolotl.ai/docs/multi-node.html), and many more!
- **Flexible Dataset Handling**: Load from local, HuggingFace, and cloud (S3, Azure, GCP, OCI) datasets.
- **Cloud Ready**: We ship [Docker images](https://hub.docker.com/u/axolotlai) and also [PyPI packages](https://pypi.org/project/axolotl/) for use on cloud platforms and local hardware.



## 🚀 Quick Start - LLM Fine-tuning in Minutes

**Requirements**:

- NVIDIA GPU (Ampere or newer for `bf16` and Flash Attention) or AMD GPU
- Python 3.11
- PyTorch ≥2.8.0

### Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb#scrollTo=msOCO4NRmRLa)

### Installation

#### Using pip

```bash
pip3 install -U packaging==26.0 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation axolotl[flash-attn,deepspeed]

# Download example axolotl configs, deepspeed configs
axolotl fetch examples
axolotl fetch deepspeed_configs  # OPTIONAL
```

#### Using Docker

Installing with Docker can be less error prone than installing in your own environment.
```bash
docker run --gpus '"all"' --rm -it axolotlai/axolotl:main-latest
```

Other installation approaches are described [here](https://docs.axolotl.ai/docs/installation.html).

#### Cloud Providers

<details>

- [RunPod](https://runpod.io/gsc?template=v2ickqhz9s&ref=6i7fkpdz)
- [Vast.ai](https://cloud.vast.ai?ref_id=62897&template_id=bdd4a49fa8bce926defc99471864cace&utm_source=github&utm_medium=developer_community&utm_campaign=template_launch_axolotl&utm_content=readme)
- [PRIME Intellect](https://app.primeintellect.ai/dashboard/create-cluster?image=axolotl&location=Cheapest&security=Cheapest&show_spot=true)
- [Modal](https://www.modal.com?utm_source=github&utm_medium=github&utm_campaign=axolotl)
- [Novita](https://novita.ai/gpus-console?templateId=311)
- [JarvisLabs.ai](https://jarvislabs.ai/templates/axolotl)
- [Latitude.sh](https://latitude.sh/blueprint/989e0e79-3bf6-41ea-a46b-1f246e309d5c)

</details>

### Your First Fine-tune

```bash
# Fetch axolotl examples
axolotl fetch examples

# Or, specify a custom path
axolotl fetch examples --dest path/to/folder

# Train a model using LoRA
axolotl train examples/llama-3/lora-1b.yml
```

That's it! Check out our [Getting Started Guide](https://docs.axolotl.ai/docs/getting-started.html) for a more detailed walkthrough.


## 📚 Documentation

- [Installation Options](https://docs.axolotl.ai/docs/installation.html) - Detailed setup instructions for different environments
- [Configuration Guide](https://docs.axolotl.ai/docs/config-reference.html) - Full configuration options and examples
- [Dataset Loading](https://docs.axolotl.ai/docs/dataset_loading.html) - Loading datasets from various sources
- [Dataset Guide](https://docs.axolotl.ai/docs/dataset-formats/) - Supported formats and how to use them
- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
- [Multipacking](https://docs.axolotl.ai/docs/multipack.html)
- [API Reference](https://docs.axolotl.ai/docs/api/) - Auto-generated code documentation
- [FAQ](https://docs.axolotl.ai/docs/faq.html) - Frequently asked questions

## 🤝 Getting Help

- Join our [Discord community](https://discord.gg/HhrNrHJPRb) for support
- Check out our [Examples](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/) directory
- Read our [Debugging Guide](https://docs.axolotl.ai/docs/debugging.html)
- Need dedicated support? Please contact [✉️wing@axolotl.ai](mailto:wing@axolotl.ai) for options

## 🌟 Contributing

Contributions are welcome! Please see our [Contributing Guide](https://github.com/axolotl-ai-cloud/axolotl/blob/main/.github/CONTRIBUTING.md) for details.

## 📈 Telemetry

Axolotl has opt-out telemetry that helps us understand how the project is being used
and prioritize improvements. We collect basic system information, model types, and
error rates—never personal data or file paths. Telemetry is enabled by default. To
disable it, set AXOLOTL_DO_NOT_TRACK=1. For more details, see our [telemetry documentation](https://docs.axolotl.ai/docs/telemetry.html).

## ❤️ Sponsors

Interested in sponsoring? Contact us at [wing@axolotl.ai](mailto:wing@axolotl.ai)

## 📝 Citing Axolotl

If you use Axolotl in your research or projects, please cite it as follows:

```bibtex
@software{axolotl,
  title = {Axolotl: Open Source LLM Post-Training},
  author = {{Axolotl maintainers and contributors}},
  url = {https://github.com/axolotl-ai-cloud/axolotl},
  license = {Apache-2.0},
  year = {2023}
}
```

## 📜 License

This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details.


================================================
FILE: VERSION
================================================
0.16.0.dev0


================================================
FILE: _quarto.yml
================================================
project:
  type: website
  pre-render:
   - docs/scripts/generate_config_docs.py
   - docs/scripts/generate_examples_docs.py

quartodoc:
  dir: docs/api
  package: axolotl
  title: API Reference
  parser: google

  sections:
    - title: Core
      desc: Core functionality for training
      contents:
        - train
        - evaluate
        - datasets
        - convert
        - prompt_tokenizers
        - logging_config
        - core.builders.base
        - core.builders.causal
        - core.builders.rl
        - core.training_args
        - core.chat.messages
        - core.chat.format.chatml
        - core.chat.format.llama3x
        - core.chat.format.shared
        - core.datasets.chat
        - core.datasets.transforms.chat_builder
    - title: CLI
      desc: Command-line interface
      contents:
        - cli.main
        - cli.train
        - cli.evaluate
        - cli.args
        - cli.art
        - cli.checks
        - cli.config
        - cli.delinearize_llama4
        - cli.inference
        - cli.merge_lora
        - cli.merge_sharded_fsdp_weights
        - cli.preprocess
        - cli.quantize
        - cli.vllm_serve
        - cli.cloud.base
        - cli.cloud.modal_
        - cli.utils
        - cli.utils.args
        - cli.utils.fetch
        - cli.utils.load
        - cli.utils.sweeps
        - cli.utils.train
    - title: Trainers
      desc: Training implementations
      contents:
        - core.trainers.base
        - core.trainers.trl
        - core.trainers.mamba
        - core.trainers.dpo.trainer
        - core.trainers.grpo.trainer
        - core.trainers.grpo.sampler
        - core.trainers.utils
    - title: Model Loading
      desc: Functionality for loading and patching models, tokenizers, etc.
      contents:
        - loaders.model
        - loaders.tokenizer
        - loaders.processor
        - loaders.adapter
        - loaders.patch_manager
        - loaders.constants
    - title: Mixins
      desc: Mixin classes for augmenting trainers
      contents:
        - core.trainers.mixins.optimizer
        - core.trainers.mixins.rng_state_loader
        - core.trainers.mixins.scheduler
    - title: Context Managers
      desc: Context managers for altering trainer behaviors
      contents:
        - utils.ctx_managers.sequence_parallel
    - title: Prompt Strategies
      desc: Prompt formatting strategies
      contents:
        - prompt_strategies.base
        - prompt_strategies.chat_template
        - prompt_strategies.alpaca_chat
        - prompt_strategies.alpaca_instruct
        - prompt_strategies.alpaca_w_system
        - prompt_strategies.user_defined
        - prompt_strategies.llama2_chat
        - prompt_strategies.completion
        - prompt_strategies.input_output
        - prompt_strategies.stepwise_supervised
        - prompt_strategies.metharme
        - prompt_strategies.orcamini
        - prompt_strategies.pygmalion
        - prompt_strategies.messages.chat
        - prompt_strategies.dpo.chat_template
        - prompt_strategies.dpo.llama3
        - prompt_strategies.dpo.chatml
        - prompt_strategies.dpo.zephyr
        - prompt_strategies.dpo.user_defined
        - prompt_strategies.dpo.passthrough
        - prompt_strategies.kto.llama3
        - prompt_strategies.kto.chatml
        - prompt_strategies.kto.user_defined
        - prompt_strategies.orpo.chat_template
        - prompt_strategies.bradley_terry.llama3
    - title: Kernels
      desc: Low-level performance optimizations
      contents:
        - kernels.lora
        - kernels.geglu
        - kernels.swiglu
        - kernels.quantize
        - kernels.utils
    - title: Monkey Patches
      desc: Runtime patches for model optimizations
      contents:
        - monkeypatch.llama_attn_hijack_flash
        - monkeypatch.llama_attn_hijack_xformers
        - monkeypatch.mistral_attn_hijack_flash
        - monkeypatch.multipack
        - monkeypatch.relora
        - monkeypatch.lora_kernels
        - monkeypatch.utils
        - monkeypatch.btlm_attn_hijack_flash
        - monkeypatch.stablelm_attn_hijack_flash
        - monkeypatch.trainer_fsdp_optim
        - monkeypatch.transformers_fa_utils
        - monkeypatch.unsloth_
        - monkeypatch.data.batch_dataset_fetcher
        - monkeypatch.mixtral
        - monkeypatch.gradient_checkpointing.offload_cpu
        - monkeypatch.gradient_checkpointing.offload_disk
    - title: Utils
      desc: Utility functions
      contents:
        - utils.tokenization
        - utils.chat_templates
        - utils.lora
        - utils.model_shard_quant
        - utils.bench
        - utils.freeze
        - utils.trainer
        - utils.schedulers
        - utils.distributed
        - utils.dict
        - utils.optimizers.adopt
        - utils.data.streaming
        - utils.data.sft
        - utils.quantization
    - title: Schemas
      desc: Pydantic data models for Axolotl config
      contents:
        - utils.schemas.config
        - utils.schemas.model
        - utils.schemas.training
        - utils.schemas.datasets
        - utils.schemas.peft
        - utils.schemas.trl
        - utils.schemas.multimodal
        - utils.schemas.integrations
        - utils.schemas.enums
        - utils.schemas.utils
    - title: Integrations
      desc: Third-party integrations and extensions
      contents:
        - integrations.base
        - integrations.cut_cross_entropy.args
        - integrations.grokfast.optimizer
        - integrations.kd.trainer
        - integrations.liger.args
        - integrations.lm_eval.args
        - integrations.spectrum.args
    - title: Common
      desc: Common utilities and shared functionality
      contents:
        - common.architectures
        - common.const
        - common.datasets
    - title: Models
      desc: Custom model implementations
      contents:
        - models.mamba.modeling_mamba
    - title: Data Processing
      desc: Data processing utilities
      contents:
        - utils.collators.core
        - utils.collators.batching
        - utils.collators.mamba
        - utils.collators.mm_chat
        - utils.samplers.multipack
    - title: Callbacks
      desc: Training callbacks
      contents:
        - utils.callbacks.perplexity
        - utils.callbacks.profiler
        - utils.callbacks.lisa
        - utils.callbacks.mlflow_
        - utils.callbacks.comet_
        - utils.callbacks.qat
website:
  title: "Axolotl"
  description: "We make fine-tuning accessible, scalable, and fun"
  favicon: favicon.jpg

  google-analytics: "G-9KYCVJBNMQ"

  navbar:
    logo: image/axolotl_logo_digital_white.svg
    title: false
    background: dark
    pinned: false
    collapse: false
    tools:
    - icon: twitter
      href: https://twitter.com/axolotl_ai
    - icon: github
      href: https://github.com/axolotl-ai-cloud/axolotl/
    - icon: discord
      href: https://discord.gg/7m9sfhzaf3

  sidebar:
      pinned: true
      collapse-level: 2
      style: docked
      contents:
        - text: Home
          href: index.qmd

        - section: "Getting Started"
          contents:
            - docs/getting-started.qmd
            - docs/installation.qmd
            - docs/inference.qmd
            - section: "Model Guides"
              contents:
                - docs/models/kimi-linear.qmd
                - docs/models/plano.qmd
                - docs/models/mimo.qmd
                - docs/models/internvl3_5.qmd
                - docs/models/olmo3.qmd
                - docs/models/trinity.qmd
                - docs/models/arcee.qmd
                - section: "Ministral3"
                  contents:
                    - docs/models/ministral3.qmd
                    - docs/models/ministral3/think.qmd
                    - docs/models/ministral3/vision.qmd
                - section: "Magistral"
                  contents:
                    - docs/models/magistral.qmd
                    - docs/models/magistral/think.qmd
                    - docs/models/magistral/vision.qmd
                - docs/models/ministral.qmd
                - docs/models/mistral-small.qmd
                - docs/models/voxtral.qmd
                - docs/models/devstral.qmd
                - docs/models/mistral.qmd
                - docs/models/llama-4.qmd
                - docs/models/llama-2.qmd
                - docs/models/qwen3-next.qmd
                - docs/models/qwen3.qmd
                - docs/models/gemma3n.qmd
                - docs/models/apertus.qmd
                - docs/models/gpt-oss.qmd
                - docs/models/seed-oss.qmd
                - docs/models/phi.qmd
                - docs/models/smolvlm2.qmd
                - docs/models/granite4.qmd
                - docs/models/LiquidAI.qmd
                - docs/models/hunyuan.qmd
                - docs/models/jamba.qmd
                - docs/models/orpheus.qmd

            - docs/cli.qmd
            - docs/telemetry.qmd
            - docs/config-reference.qmd
            - text: "API Reference"
              href: docs/api

        - section: "Dataset Formats"
          contents: docs/dataset-formats/*

        - section: "Deployments"
          contents:
            - docs/docker.qmd
            - docs/multi-gpu.qmd
            - docs/multi-node.qmd
            - docs/ray-integration.qmd
            - docs/amd_hpc.qmd
            - docs/mac.qmd

        - section: "How To Guides"
          contents:
            - docs/multimodal.qmd
            - docs/rlhf.qmd
            - docs/reward_modelling.qmd
            - docs/lr_groups.qmd
            - docs/lora_optims.qmd
            - docs/dataset_loading.qmd
            - docs/qat.qmd
            - docs/quantize.qmd
            - docs/optimizations.qmd

        - section: "Core Concepts"
          contents:
            - docs/batch_vs_grad.qmd
            - docs/dataset_preprocessing.qmd
            - docs/streaming.qmd
            - docs/multipack.qmd
            - docs/mixed_precision.qmd
            - docs/optimizers.qmd
            - docs/attention.qmd

        - section: "Advanced Features"
          contents:
            - docs/fsdp_qlora.qmd
            - docs/unsloth.qmd
            - docs/torchao.qmd
            - docs/custom_integrations.qmd
            - docs/sequence_parallelism.qmd
            - docs/gradient_checkpointing.qmd
            - docs/nd_parallelism.qmd
            - docs/expert_quantization.qmd

        - section: "Troubleshooting"
          contents:
            - docs/faq.qmd
            - docs/debugging.qmd
            - docs/nccl.qmd

format:
  html:
    theme: darkly
    css: styles.css
    toc: true
    # Enable better handling of line breaks in markdown
    preserve-tabs: true
    html-math-method: mathjax
    # Improved markdown processing options
    md-extensions:
      - markdown_it
      - def_list
      - attr_list
      - fenced_divs
      - tables
      - html_admonition
      - lineblocks
      - fancy_lists
    # Control whitespace handling
    whitespace: preserve
    # Process newlines in paragraphs
    wrap: preserve
    # Better line break handling
    preserve-linebreaks: true


================================================
FILE: benchmarks/bench_entropy.py
================================================
"""Benchmark for entropy_from_logits Triton kernel vs original chunked implementation.

Usage: CUDA_VISIBLE_DEVICES=0 python benchmarks/bench_entropy.py
"""

import gc
import statistics

import torch
import torch.nn.functional as F

from axolotl.monkeypatch.trainer.utils import entropy_from_logits

V = 151936  # Qwen vocab
WARMUP = 5
BENCH_ITERS = 20
MEM_ITERS = 10


def entropy_from_logits_original(logits: torch.Tensor, chunk_size: int = 128):
    """Original chunked implementation (reference)."""
    original_shape = logits.shape[:-1]
    num_classes = logits.shape[-1]
    flat_logits = logits.reshape(-1, num_classes)
    entropies = []
    for chunk in flat_logits.split(chunk_size, dim=0):
        logps = F.log_softmax(chunk, dim=-1)
        chunk_entropy = -(torch.exp(logps) * logps).sum(-1)
        entropies.append(chunk_entropy)
    return torch.cat(entropies, dim=0).reshape(original_shape)


def _clean_gpu():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()
    torch.cuda.reset_accumulated_memory_stats()
    torch.cuda.synchronize()


def profile_time(fn, logits, n_iters=BENCH_ITERS):
    for _ in range(WARMUP):
        out = fn(logits, chunk_size=128)
        del out
    torch.cuda.synchronize()

    times = []
    for _ in range(n_iters):
        s = torch.cuda.Event(enable_timing=True)
        e = torch.cuda.Event(enable_timing=True)
        s.record()
        out = fn(logits, chunk_size=128)
        e.record()
        torch.cuda.synchronize()
        times.append(s.elapsed_time(e))
        del out
    return times


def profile_memory(fn, logits, n_iters=MEM_ITERS):
    for _ in range(WARMUP):
        out = fn(logits, chunk_size=128)
        del out
    torch.cuda.synchronize()

    peaks = []
    for _ in range(n_iters):
        _clean_gpu()
        base = torch.cuda.max_memory_allocated()
        out = fn(logits, chunk_size=128)
        torch.cuda.synchronize()
        peaks.append(torch.cuda.max_memory_allocated() - base)
        del out
    return [p / 1e6 for p in peaks]


def fmt(values, unit=""):
    mean = statistics.mean(values)
    std = statistics.stdev(values) if len(values) > 1 else 0.0
    return f"{mean:8.2f} ± {std:5.2f} {unit}  [min={min(values):.2f}, max={max(values):.2f}]"


def benchmark_contiguous():
    print("=" * 60)
    print(
        f"CONTIGUOUS BENCHMARK  (warmup={WARMUP}, time={BENCH_ITERS}, mem={MEM_ITERS})"
    )
    print("=" * 60)

    configs = [
        (1, 2048),
        (1, 8192),
        (1, 16384),
        (4, 4096),
        (8, 2048),
        (16, 2048),
        (16, 4096),
    ]

    for B, L in configs:
        mem_gb = B * L * V * 2 / 1e9
        if mem_gb > 28:
            print(f"\n  skip B={B}, L={L} ({mem_gb:.1f} GB)")
            continue

        N = B * L
        print(f"\n{'─' * 60}")
        print(f"B={B:2d}, L={L:5d}  ({N:6d} rows, logits {mem_gb:.2f} GB)")
        print(f"{'─' * 60}")

        torch.manual_seed(42)
        logits = torch.randn(B, L, V, device="cuda", dtype=torch.bfloat16)

        t_orig = profile_time(entropy_from_logits_original, logits)
        t_triton = profile_time(entropy_from_logits, logits)
        orig_mean = statistics.mean(t_orig)
        triton_mean = statistics.mean(t_triton)

        print("  TIME (ms):")
        print(f"    original: {fmt(t_orig, 'ms')}")
        print(f"    triton:   {fmt(t_triton, 'ms')}")
        print(f"    speedup:  {orig_mean / triton_mean:.2f}x")

        m_orig = profile_memory(entropy_from_logits_original, logits)
        m_triton = profile_memory(entropy_from_logits, logits)
        orig_peak = statistics.mean(m_orig)
        triton_peak = statistics.mean(m_triton)

        print("  MEMORY (peak overhead):")
        print(f"    original: {fmt(m_orig, 'MB')}")
        print(f"    triton:   {fmt(m_triton, 'MB')}")
        print(f"    saved:    {orig_peak - triton_peak:.1f} MB")

        del logits
        _clean_gpu()


def benchmark_noncontiguous():
    print("\n" + "=" * 60)
    print(
        f"NON-CONTIGUOUS BENCHMARK  (warmup={WARMUP}, time={BENCH_ITERS}, mem={MEM_ITERS})"
    )
    print("=" * 60)

    configs = [
        (4, 2048, "transpose"),
        (4, 8192, "transpose"),
        (8, 2048, "transpose"),
        (4, 4096, "slice_batch"),
    ]

    for B, L, method in configs:
        torch.manual_seed(42)

        if method == "transpose":
            raw = torch.randn(L, B, V, device="cuda", dtype=torch.bfloat16)
            logits_nc = raw.transpose(0, 1)
            raw_gb = L * B * V * 2 / 1e9
        elif method == "slice_batch":
            raw = torch.randn(B * 2, L, V, device="cuda", dtype=torch.bfloat16)
            logits_nc = raw[::2]
            raw_gb = B * 2 * L * V * 2 / 1e9
        else:
            continue

        if raw_gb > 28:
            print(f"\n  skip B={B}, L={L}, {method} ({raw_gb:.1f} GB)")
            del raw, logits_nc
            torch.cuda.empty_cache()
            continue

        N = B * L
        print(f"\n{'─' * 60}")
        print(f"B={B}, L={L}  {method}  ({N} rows, raw {raw_gb:.2f} GB)")
        print(f"{'─' * 60}")

        def original_with_copy(logits, chunk_size=128):
            return entropy_from_logits_original(
                logits.contiguous(), chunk_size=chunk_size
            )

        t_orig = profile_time(original_with_copy, logits_nc)
        t_triton = profile_time(entropy_from_logits, logits_nc)
        orig_mean = statistics.mean(t_orig)
        triton_mean = statistics.mean(t_triton)

        print("  TIME (ms):")
        print(f"    orig+copy:     {fmt(t_orig, 'ms')}")
        print(f"    triton-strided:{fmt(t_triton, 'ms')}")
        print(f"    speedup:       {orig_mean / triton_mean:.2f}x")

        m_orig = profile_memory(original_with_copy, logits_nc)
        m_triton = profile_memory(entropy_from_logits, logits_nc)
        orig_peak = statistics.mean(m_orig)
        triton_peak = statistics.mean(m_triton)

        print("  MEMORY (peak overhead):")
        print(f"    orig+copy:     {fmt(m_orig, 'MB')}")
        print(f"    triton-strided:{fmt(m_triton, 'MB')}")
        print(f"    saved:         {orig_peak - triton_peak:.1f} MB")

        del raw, logits_nc
        _clean_gpu()


if __name__ == "__main__":
    benchmark_contiguous()
    benchmark_noncontiguous()


================================================
FILE: benchmarks/bench_scattermoe_lora.py
================================================
"""Benchmark for ScatterMoE LoRA Triton kernels.

Measures forward, backward dX, and backward dA/dB kernels at common MoE
model shapes. Reports per-kernel timings, LoRA overhead vs base scatter2scatter,
and full fwd+bwd autograd throughput.

Usage:
  CUDA_VISIBLE_DEVICES=0 python benchmarks/bench_scattermoe_lora.py
  CUDA_VISIBLE_DEVICES=0 python benchmarks/bench_scattermoe_lora.py --ranks 16 64
  CUDA_VISIBLE_DEVICES=0 python benchmarks/bench_scattermoe_lora.py --models Qwen/Qwen3.5-35B-A3B
"""

import argparse
import gc
import time
from functools import partial

import torch

from axolotl.integrations.kernels.libs.scattermoe_lora.kernels import (
    lora_ops,
    ops as base_ops,
)
from axolotl.integrations.kernels.libs.scattermoe_lora.parallel_experts import (
    flatten_sort_count,
)
from axolotl.integrations.kernels.libs.scattermoe_lora.parallel_linear_lora import (
    ScatterMoELoRA,
)

DEVICE = "cuda"
DTYPE = torch.bfloat16
WARMUP = 5
ITERS = 20

# ─── Model configs ──────────────────────────────────────────────────────────

BUILTIN_CONFIGS = {
    "Qwen3.5-35B-A3B": (256, 2048, 512, 8),  # E, H, I, k
    "Qwen3-30B-A3B": (128, 2048, 768, 8),
    "OLMoE-1B-7B": (64, 2048, 1024, 8),
    "Mixtral-8x7B": (8, 4096, 14336, 2),
}


def _resolve_config(spec):
    """Resolve a model spec to (E, H, I, k). Accepts builtin names or HF IDs."""
    key = spec.lower().replace("/", "-")
    for name, cfg in BUILTIN_CONFIGS.items():
        if key in name.lower() or name.lower() in key:
            return name, cfg

    from transformers import AutoConfig

    hf_cfg = AutoConfig.from_pretrained(spec, trust_remote_code=True)
    if callable(getattr(hf_cfg, "get_text_config", None)):
        tc = hf_cfg.get_text_config()
        if hasattr(tc, "model_type") and tc.model_type != hf_cfg.model_type:
            hf_cfg = tc
    hidden = hf_cfg.hidden_size
    inter = getattr(hf_cfg, "moe_intermediate_size", None) or hf_cfg.intermediate_size
    experts = (
        getattr(hf_cfg, "num_experts", None)
        or getattr(hf_cfg, "num_local_experts", None)
        or getattr(hf_cfg, "n_routed_experts", None)
    )
    top_k = (
        getattr(hf_cfg, "num_experts_per_tok", None)
        or getattr(hf_cfg, "num_experts_per_token", None)
        or 2
    )
    name = spec.split("/")[-1]
    return name, (experts, hidden, inter, top_k)


# ─── Benchmark helpers ──────────────────────────────────────────────────────


def _clean():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.synchronize()


def _bench(fn, warmup=WARMUP, iters=ITERS):
    for _ in range(warmup):
        fn()
    torch.cuda.synchronize()
    times = []
    for _ in range(iters):
        torch.cuda.synchronize()
        t0 = time.perf_counter()
        fn()
        torch.cuda.synchronize()
        times.append((time.perf_counter() - t0) * 1000)
    times.sort()
    return times[len(times) // 2]


def _setup(num_experts, K, N, T, top_k, R):
    torch.manual_seed(42)
    x = torch.randn(T, K, device=DEVICE, dtype=DTYPE)
    W = torch.randn(num_experts, K, N, device=DEVICE, dtype=DTYPE) * 0.02
    lora_A = torch.randn(R * num_experts, K, device=DEVICE, dtype=DTYPE) * 0.01
    lora_B = torch.randn(N, R * num_experts, device=DEVICE, dtype=DTYPE) * 0.01
    logits = torch.randn(T, num_experts, device=DEVICE)
    _, top_idx = torch.topk(torch.softmax(logits, dim=-1), top_k, dim=-1)
    sei, ssi, eo = flatten_sort_count(top_idx, num_experts)
    gx = base_ops.group(x, ssi, fan_out=top_k)
    dy = torch.randn(gx.size(0), N, device=DEVICE, dtype=DTYPE)
    return x, W, lora_A, lora_B, sei, ssi, eo, gx, dy


# ─── Kernel wrappers (avoid B023 loop-variable capture) ──────────────────────


def _call_fwd(x, W, sei, ssi, top_k, lA, lB):
    return lora_ops.scatter2scatter_lora(
        X=x,
        W=W,
        sorted_expert_idxs=sei,
        sorted_scattered_idxs=ssi,
        k=top_k,
        lora_A=lA,
        lora_B=lB,
        scaling=2.0,
    )


def _call_base(x, W, sei, ssi,

Download .txt

gitextract_1sp7sr39/

├── .axolotl-complete.bash
├── .bandit
├── .coderabbit.yaml
├── .coveragerc
├── .editorconfig
├── .gitattributes
├── .github/
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── FUNDING.yml
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yaml
│   │   ├── config.yml
│   │   ├── docs.yml
│   │   └── feature-request.yaml
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── SECURITY.md
│   ├── SUPPORT.md
│   ├── release-drafter.yml
│   └── workflows/
│       ├── base.yml
│       ├── docs.yml
│       ├── lint.yml
│       ├── main.yml
│       ├── multi-gpu-e2e.yml
│       ├── nightlies.yml
│       ├── precommit-autoupdate.yml
│       ├── preview-docs.yml
│       ├── pypi.yml
│       ├── tests-nightly.yml
│       └── tests.yml
├── .gitignore
├── .mypy.ini
├── .pre-commit-config.yaml
├── .runpod/
│   ├── .gitignore
│   ├── Dockerfile
│   ├── README.md
│   ├── hub.json
│   ├── requirements.txt
│   ├── src/
│   │   ├── config/
│   │   │   └── config.yaml
│   │   ├── handler.py
│   │   ├── test_input.json
│   │   ├── train.py
│   │   └── utils.py
│   ├── test-input.json
│   └── tests.json
├── CITATION.cff
├── CNAME
├── FAQS.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── VERSION
├── _quarto.yml
├── benchmarks/
│   ├── bench_entropy.py
│   ├── bench_scattermoe_lora.py
│   └── bench_selective_logsoftmax.py
├── cicd/
│   ├── Dockerfile-uv.jinja
│   ├── Dockerfile.jinja
│   ├── __init__.py
│   ├── cicd.sh
│   ├── cleanup.py
│   ├── cleanup.sh
│   ├── e2e_tests.py
│   ├── multigpu.py
│   ├── multigpu.sh
│   └── single_gpu.py
├── codecov.yml
├── deepspeed_configs/
│   ├── zero1.json
│   ├── zero1_torch_compile.json
│   ├── zero2.json
│   ├── zero2_torch_compile.json
│   ├── zero3.json
│   ├── zero3_bf16.json
│   ├── zero3_bf16_cpuoffload_all.json
│   └── zero3_bf16_cpuoffload_params.json
├── devtools/
│   ├── README.md
│   └── dev_chat_template.yml
├── docker/
│   ├── Dockerfile
│   ├── Dockerfile-base
│   ├── Dockerfile-base-next
│   ├── Dockerfile-base-nightly
│   ├── Dockerfile-cloud
│   ├── Dockerfile-cloud-no-tmux
│   ├── Dockerfile-cloud-uv
│   ├── Dockerfile-tests
│   ├── Dockerfile-uv
│   └── Dockerfile-uv-base
├── docker-compose.yaml
├── docs/
│   ├── .gitignore
│   ├── amd_hpc.qmd
│   ├── attention.qmd
│   ├── batch_vs_grad.qmd
│   ├── checkpoint_saving.qmd
│   ├── cli.qmd
│   ├── custom_integrations.qmd
│   ├── dataset-formats/
│   │   ├── conversation.qmd
│   │   ├── index.qmd
│   │   ├── inst_tune.qmd
│   │   ├── pretraining.qmd
│   │   ├── stepwise_supervised.qmd
│   │   ├── template_free.qmd
│   │   └── tokenized.qmd
│   ├── dataset_loading.qmd
│   ├── dataset_preprocessing.qmd
│   ├── debugging.qmd
│   ├── docker.qmd
│   ├── expert_quantization.qmd
│   ├── faq.qmd
│   ├── fsdp_qlora.qmd
│   ├── getting-started.qmd
│   ├── gradient_checkpointing.qmd
│   ├── inference.qmd
│   ├── input_output.qmd
│   ├── installation.qmd
│   ├── lora_optims.qmd
│   ├── lr_groups.qmd
│   ├── mac.qmd
│   ├── mixed_precision.qmd
│   ├── multi-gpu.qmd
│   ├── multi-node.qmd
│   ├── multimodal.qmd
│   ├── multipack.qmd
│   ├── nccl.qmd
│   ├── nd_parallelism.qmd
│   ├── optimizations.qmd
│   ├── optimizers.qmd
│   ├── qat.qmd
│   ├── quantize.qmd
│   ├── ray-integration.qmd
│   ├── reward_modelling.qmd
│   ├── rlhf.qmd
│   ├── scripts/
│   │   ├── examples-allowlist.yml
│   │   ├── generate_config_docs.py
│   │   └── generate_examples_docs.py
│   ├── sequence_parallelism.qmd
│   ├── streaming.qmd
│   ├── telemetry.qmd
│   ├── torchao.qmd
│   └── unsloth.qmd
├── examples/
│   ├── LiquidAI/
│   │   ├── README.md
│   │   ├── lfm2-350m-fft.yaml
│   │   ├── lfm2-8b-a1b-lora.yaml
│   │   └── lfm2-vl-lora.yaml
│   ├── alst/
│   │   ├── README.md
│   │   ├── llama3-8b-deepspeed-alst.yaml
│   │   └── llama3-8b-fsdp2-alst.yaml
│   ├── apertus/
│   │   ├── README.md
│   │   └── apertus-8b-qlora.yaml
│   ├── arcee/
│   │   ├── README.md
│   │   └── afm-4.5b-qlora.yaml
│   ├── archived/
│   │   ├── README.md
│   │   ├── cerebras/
│   │   │   ├── btlm-ft.yml
│   │   │   └── qlora.yml
│   │   ├── code-llama/
│   │   │   ├── 13b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   ├── 34b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   ├── 7b/
│   │   │   │   ├── lora.yml
│   │   │   │   └── qlora.yml
│   │   │   └── README.md
│   │   ├── dbrx/
│   │   │   ├── 16bit-lora.yaml
│   │   │   ├── 8bit-lora.yaml
│   │   │   ├── README.md
│   │   │   └── fft-ds-zero3.yaml
│   │   ├── deepcoder/
│   │   │   └── deepcoder-14B-preview-lora.yml
│   │   ├── falcon/
│   │   │   ├── config-7b-lora.yml
│   │   │   ├── config-7b-qlora.yml
│   │   │   └── config-7b.yml
│   │   ├── gemma/
│   │   │   └── qlora.yml
│   │   ├── gptj/
│   │   │   └── qlora.yml
│   │   ├── jeopardy-bot/
│   │   │   └── config.yml
│   │   ├── mpt-7b/
│   │   │   ├── README.md
│   │   │   └── config.yml
│   │   ├── openllama-3b/
│   │   │   ├── README.md
│   │   │   ├── config.yml
│   │   │   ├── lora.yml
│   │   │   └── qlora.yml
│   │   ├── pythia/
│   │   │   └── lora.yml
│   │   ├── pythia-12b/
│   │   │   ├── README.md
│   │   │   └── config.yml
│   │   ├── qwen/
│   │   │   ├── README.md
│   │   │   ├── lora.yml
│   │   │   ├── qlora.yml
│   │   │   ├── qwen2-moe-lora.yaml
│   │   │   └── qwen2-moe-qlora.yaml
│   │   ├── redpajama/
│   │   │   ├── README.md
│   │   │   └── config-3b.yml
│   │   ├── replit-3b/
│   │   │   └── config-lora.yml
│   │   ├── stablelm-2/
│   │   │   ├── 1.6b/
│   │   │   │   ├── fft.yml
│   │   │   │   └── lora.yml
│   │   │   └── README.md
│   │   ├── starcoder2/
│   │   │   └── qlora.yml
│   │   ├── tiny-llama/
│   │   │   ├── README.md
│   │   │   ├── lora-mps.yml
│   │   │   ├── lora.yml
│   │   │   ├── pretrain.yml
│   │   │   └── qlora.yml
│   │   ├── xgen-7b/
│   │   │   └── xgen-7b-8k-qlora.yml
│   │   └── yi-34B-chat/
│   │       ├── README.md
│   │       └── qlora.yml
│   ├── cloud/
│   │   ├── baseten.yaml
│   │   └── modal.yaml
│   ├── cohere/
│   │   └── command-r-7b-qlora.yml
│   ├── colab-notebooks/
│   │   └── colab-axolotl-example.ipynb
│   ├── deepcogito/
│   │   ├── cogito-v1-preview-llama-3B-lora.yml
│   │   └── cogito-v1-preview-qwen-14B-lora.yml
│   ├── deepseek-v2/
│   │   ├── fft-fsdp-16b.yaml
│   │   └── qlora-fsdp-2_5.yaml
│   ├── devstral/
│   │   ├── README.md
│   │   └── devstral-small-qlora.yml
│   ├── distributed-parallel/
│   │   ├── README.md
│   │   ├── llama-3_1-8b-hsdp-tp.yaml
│   │   └── qwen3-8b-fsdp-tp-cp.yaml
│   ├── eaft/
│   │   └── eaft-example.yml
│   ├── falcon-h1/
│   │   ├── falcon-h1-1b-deep-qlora.yaml
│   │   ├── falcon-h1-1b-qlora.yaml
│   │   ├── falcon-h1-34b-qlora.yaml
│   │   ├── falcon-h1-3b-qlora.yaml
│   │   ├── falcon-h1-500m-qlora.yaml
│   │   └── falcon-h1-7b-qlora.yaml
│   ├── gemma2/
│   │   ├── qlora.yml
│   │   └── reward-model.yaml
│   ├── gemma3/
│   │   ├── gemma-3-1b-qlora.yml
│   │   ├── gemma-3-270m-qlora.yml
│   │   ├── gemma-3-4b-qlora.yml
│   │   └── gemma-3-4b-vision-qlora.yml
│   ├── gemma3n/
│   │   ├── README.md
│   │   ├── gemma-3n-e2b-qlora.yml
│   │   ├── gemma-3n-e2b-vision-audio-qlora.yml
│   │   └── gemma-3n-e2b-vision-qlora.yml
│   ├── glm4/
│   │   └── qlora-32b.yaml
│   ├── glm45/
│   │   ├── README.md
│   │   └── glm-45-air-qlora.yaml
│   ├── glm46v/
│   │   ├── README.md
│   │   ├── glm-4-6v-flash-ddp.yaml
│   │   └── glm-4-6v-flash-qlora.yaml
│   ├── glm47-flash/
│   │   ├── README.md
│   │   ├── lora.yaml
│   │   ├── lora_fsdp.yaml
│   │   ├── qlora.yaml
│   │   └── qlora_fsdp.yaml
│   ├── gpt-oss/
│   │   ├── README.md
│   │   ├── gpt-oss-120b-fft-fsdp2-offload.yaml
│   │   ├── gpt-oss-20b-fft-deepspeed-zero3.yaml
│   │   ├── gpt-oss-20b-fft-fsdp2-offload.yaml
│   │   ├── gpt-oss-20b-fft-fsdp2.yaml
│   │   ├── gpt-oss-20b-sft-lora-singlegpu.yaml
│   │   └── gpt-oss-safeguard-20b-sft-lora-singlegpu.yaml
│   ├── granite4/
│   │   ├── README.md
│   │   └── granite-4.0-tiny-fft.yaml
│   ├── hunyuan/
│   │   ├── README.md
│   │   └── hunyuan-v1-dense-qlora.yaml
│   ├── internvl3_5/
│   │   ├── README.md
│   │   └── internvl3_5-8b-qlora.yml
│   ├── jamba/
│   │   ├── README.md
│   │   ├── qlora.yaml
│   │   ├── qlora_deepspeed.yaml
│   │   └── qlora_fsdp_large.yaml
│   ├── kimi-linear/
│   │   ├── README.md
│   │   └── kimi-48b-lora.yaml
│   ├── llama-2/
│   │   ├── README.md
│   │   ├── fft_optimized.yml
│   │   ├── gptq-lora.yml
│   │   ├── lisa.yml
│   │   ├── loftq.yml
│   │   ├── lora.yml
│   │   ├── qlora-fsdp.yml
│   │   ├── qlora.yml
│   │   └── relora.yml
│   ├── llama-3/
│   │   ├── 3b-fp8-fsdp2.yaml
│   │   ├── 3b-qat-fsdp2.yaml
│   │   ├── 3b-qat-mxfp4.yaml
│   │   ├── 3b-qat-nvfp4.yaml
│   │   ├── README.md
│   │   ├── diffusion/
│   │   │   ├── pretrain-1b.yaml
│   │   │   └── sft-1b.yaml
│   │   ├── fft-8b-liger-fsdp.yaml
│   │   ├── fft-8b.yaml
│   │   ├── instruct-dpo-lora-8b.yml
│   │   ├── instruct-lora-8b.yml
│   │   ├── lora-1b-deduplicate-dpo.yml
│   │   ├── lora-1b-deduplicate-sft.yml
│   │   ├── lora-1b-kernels.yml
│   │   ├── lora-1b-ray.yml
│   │   ├── lora-1b-sample-packing-sequentially.yml
│   │   ├── lora-1b.yml
│   │   ├── lora-8b.yml
│   │   ├── opentelemetry-qlora.yml
│   │   ├── qlora-1b-gdpo.yaml
│   │   ├── qlora-1b-kto.yaml
│   │   ├── qlora-1b.yml
│   │   ├── qlora-fsdp-405b.yaml
│   │   ├── qlora-fsdp-70b.yaml
│   │   ├── qlora.yml
│   │   └── sparse-finetuning.yaml
│   ├── llama-3-vision/
│   │   └── lora-11b.yaml
│   ├── llama-4/
│   │   ├── README.md
│   │   ├── do-no-use-fa2/
│   │   │   ├── maverick-qlora-fsdp1.yaml
│   │   │   ├── scout-qlora-fsdp1.yaml
│   │   │   ├── scout-qlora-single-h100.yaml
│   │   │   └── scout-vision-qlora-fsdp.yaml
│   │   ├── scout-qlora-flexattn-fsdp2.yaml
│   │   ├── scout-qlora-single-h100-flex.yaml
│   │   └── scout-vision-qlora-fsdp2-flex.yaml
│   ├── llava/
│   │   └── lora-7b.yaml
│   ├── magistral/
│   │   ├── README.md
│   │   ├── magistral-small-fsdp-qlora.yaml
│   │   ├── magistral-small-qlora.yaml
│   │   ├── think/
│   │   │   ├── README.md
│   │   │   └── magistral-small-think-qlora.yaml
│   │   └── vision/
│   │       ├── README.md
│   │       └── magistral-small-vision-24B-qlora.yml
│   ├── mamba/
│   │   └── config.yml
│   ├── mimo/
│   │   ├── README.md
│   │   └── mimo-7b-qlora.yaml
│   ├── ministral/
│   │   ├── README.md
│   │   └── ministral-small-qlora.yaml
│   ├── ministral3/
│   │   ├── README.md
│   │   ├── ministral3-3b-qlora.yaml
│   │   ├── think/
│   │   │   ├── README.md
│   │   │   └── ministral3-3b-think-qlora.yaml
│   │   └── vision/
│   │       ├── README.md
│   │       └── ministral3-3b-vision-qlora.yml
│   ├── mistral/
│   │   ├── README.md
│   │   ├── bigstral/
│   │   │   └── bigstral-ds-zero3.yaml
│   │   ├── config.yml
│   │   ├── dpo/
│   │   │   └── mistral-dpo-qlora.yml
│   │   ├── lora.yml
│   │   ├── mistral-qlora-fsdp.yml
│   │   ├── mixtral/
│   │   │   ├── mixtral-8x22b-qlora-fsdp.yml
│   │   │   ├── mixtral-qlora-fsdp.yml
│   │   │   ├── mixtral.yml
│   │   │   └── mixtral_22.yml
│   │   ├── mps/
│   │   │   └── lora-mps.yml
│   │   ├── orpo/
│   │   │   └── mistral-qlora-orpo.yml
│   │   └── qlora.yml
│   ├── mistral-small/
│   │   ├── README.md
│   │   └── mistral-small-3.1-24B-lora.yml
│   ├── mistral4/
│   │   ├── README.md
│   │   ├── fft-text.yml
│   │   ├── fft-vision.yml
│   │   ├── qlora-text.yml
│   │   └── qlora-vision.yml
│   ├── nemotron/
│   │   └── nemotron-mini-4b-qlora.yaml
│   ├── olmo3/
│   │   ├── README.md
│   │   └── olmo3-7b-qlora.yaml
│   ├── orpheus/
│   │   ├── README.md
│   │   └── finetune.yml
│   ├── phi/
│   │   ├── README.md
│   │   ├── lora-3.5.yaml
│   │   ├── phi-ft.yml
│   │   ├── phi-qlora.yml
│   │   ├── phi2-ft.yml
│   │   ├── phi3-ft-fsdp.yml
│   │   └── phi3-ft.yml
│   ├── pixtral/
│   │   └── lora-12b.yml
│   ├── plano/
│   │   ├── README.md
│   │   └── plano-4b-qlora.yaml
│   ├── qat_nvfp4/
│   │   ├── Gemma3-12B_baseline.yml
│   │   ├── Gemma3-12B_qat.yml
│   │   ├── Math-Gemma3-12B_baseline.yml
│   │   ├── Math-Gemma3-12B_qat.yml
│   │   ├── Math-Gemma3-27B_baseline.yml
│   │   ├── Math-Gemma3-27B_qat.yml
│   │   ├── Math-Qwen2.5-72B_baseline.yml
│   │   ├── Math-Qwen2.5-72B_qat.yml
│   │   ├── Qwen2.5-72B_baseline.yml
│   │   └── Qwen2.5-72B_qat.yml
│   ├── qwen2/
│   │   ├── adamw-pretrain-fsdp2.yaml
│   │   ├── dpo.yaml
│   │   ├── muon-pretrain-fsdp2.yaml
│   │   ├── prm.yaml
│   │   ├── qlora-fsdp.yaml
│   │   └── reward-model.yaml
│   ├── qwen2-vl/
│   │   └── lora-7b.yaml
│   ├── qwen2_5-vl/
│   │   └── lora-7b.yaml
│   ├── qwen3/
│   │   ├── 32b-qlora.yaml
│   │   ├── 8b-qat-fsdp2.yml
│   │   ├── README.md
│   │   ├── qlora-fsdp.yaml
│   │   └── reward-model.yaml
│   ├── qwen3-next/
│   │   ├── README.md
│   │   └── qwen3-next-80b-a3b-qlora.yaml
│   ├── qwen3.5/
│   │   ├── 122b-a10b-moe-qlora-fsdp.yaml
│   │   ├── 122b-a10b-moe-qlora.yaml
│   │   ├── 27b-fft.yaml
│   │   ├── 27b-qlora-fsdp.yaml
│   │   ├── 27b-qlora.yaml
│   │   ├── 35b-a3b-moe-qlora-fsdp.yaml
│   │   ├── 35b-a3b-moe-qlora.yaml
│   │   ├── 9b-fft-vision.yaml
│   │   ├── 9b-lora-vision.yaml
│   │   └── README.md
│   ├── seed-oss/
│   │   ├── README.md
│   │   └── seed-oss-36b-qlora.yaml
│   ├── slurm/
│   │   ├── README.md
│   │   └── axolotl.slurm
│   ├── smolvlm2/
│   │   ├── README.md
│   │   └── smolvlm2-2B-lora.yaml
│   ├── streaming/
│   │   ├── README.md
│   │   ├── pretrain.yaml
│   │   └── sft.yaml
│   ├── swanlab/
│   │   ├── README.md
│   │   ├── custom_trainer_profiling.py
│   │   ├── dpo-swanlab-completions.yml
│   │   ├── dpo-swanlab-full-featured.yml
│   │   └── lora-swanlab-profiling.yml
│   ├── trinity/
│   │   ├── README.md
│   │   └── trinity-nano-preview-qlora.yaml
│   └── voxtral/
│       ├── README.md
│       ├── voxtral-mini-audio-qlora.yml
│       └── voxtral-mini-qlora.yml
├── index.qmd
├── pyproject.toml
├── requirements-dev.txt
├── requirements-tests.txt
├── requirements.txt
├── scripts/
│   ├── chat_datasets.py
│   ├── cloud-entrypoint-term.sh
│   ├── cloud-entrypoint.sh
│   ├── cutcrossentropy_install.py
│   ├── motd
│   └── unsloth_install.py
├── setup.py
├── src/
│   ├── axolotl/
│   │   ├── __init__.py
│   │   ├── cli/
│   │   │   ├── __init__.py
│   │   │   ├── args.py
│   │   │   ├── art.py
│   │   │   ├── checks.py
│   │   │   ├── cloud/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── baseten/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── template/
│   │   │   │   │       ├── run.sh
│   │   │   │   │       └── train_sft.py
│   │   │   │   └── modal_.py
│   │   │   ├── config.py
│   │   │   ├── delinearize_llama4.py
│   │   │   ├── evaluate.py
│   │   │   ├── inference.py
│   │   │   ├── main.py
│   │   │   ├── merge_lora.py
│   │   │   ├── merge_sharded_fsdp_weights.py
│   │   │   ├── preprocess.py
│   │   │   ├── quantize.py
│   │   │   ├── train.py
│   │   │   ├── utils/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── diffusion.py
│   │   │   │   ├── fetch.py
│   │   │   │   ├── load.py
│   │   │   │   ├── sweeps.py
│   │   │   │   └── train.py
│   │   │   └── vllm_serve.py
│   │   ├── common/
│   │   │   ├── __init__.py
│   │   │   ├── architectures.py
│   │   │   ├── const.py
│   │   │   └── datasets.py
│   │   ├── convert.py
│   │   ├── core/
│   │   │   ├── __init__.py
│   │   │   ├── attention/
│   │   │   │   └── __init__.py
│   │   │   ├── builders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── causal.py
│   │   │   │   └── rl.py
│   │   │   ├── chat/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── format/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── chatml.py
│   │   │   │   │   ├── llama3x.py
│   │   │   │   │   └── shared.py
│   │   │   │   └── messages.py
│   │   │   ├── datasets/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat.py
│   │   │   │   └── transforms/
│   │   │   │       ├── __init__.py
│   │   │   │       └── chat_builder.py
│   │   │   ├── trainers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   ├── dpo/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args.py
│   │   │   │   │   └── trainer.py
│   │   │   │   ├── grpo/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── args.py
│   │   │   │   │   ├── async_trainer.py
│   │   │   │   │   ├── fast_async_trainer.py
│   │   │   │   │   ├── replay_buffer.py
│   │   │   │   │   ├── sampler.py
│   │   │   │   │   └── trainer.py
│   │   │   │   ├── mamba.py
│   │   │   │   ├── mixins/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── activation_checkpointing.py
│   │   │   │   │   ├── checkpoints.py
│   │   │   │   │   ├── distributed_parallel.py
│   │   │   │   │   ├── optimizer.py
│   │   │   │   │   ├── packing.py
│   │   │   │   │   ├── rng_state_loader.py
│   │   │   │   │   └── scheduler.py
│   │   │   │   ├── trl.py
│   │   │   │   └── utils.py
│   │   │   ├── training_args.py
│   │   │   └── training_args_base.py
│   │   ├── datasets.py
│   │   ├── evaluate.py
│   │   ├── integrations/
│   │   │   ├── LICENSE.md
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── config.py
│   │   │   ├── cut_cross_entropy/
│   │   │   │   ├── ACKNOWLEDGEMENTS.md
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   └── args.py
│   │   │   ├── densemixer/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── plugin.py
│   │   │   ├── diffusion/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── callbacks.py
│   │   │   │   ├── generation.py
│   │   │   │   ├── plugin.py
│   │   │   │   ├── trainer.py
│   │   │   │   └── utils.py
│   │   │   ├── grokfast/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── optimizer.py
│   │   │   ├── kd/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── callbacks.py
│   │   │   │   ├── chat_template.py
│   │   │   │   ├── collator.py
│   │   │   │   ├── collator_online_teacher.py
│   │   │   │   ├── kernels/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── liger.py
│   │   │   │   │   └── models.py
│   │   │   │   ├── topk_logprob/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── forward_kl.py
│   │   │   │   ├── trainer.py
│   │   │   │   └── utils.py
│   │   │   ├── kernels/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── autotune_callback.py
│   │   │   │   ├── autotune_collector.py
│   │   │   │   ├── constants.py
│   │   │   │   ├── libs/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── scattermoe_lora/
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── kernels/
│   │   │   │   │       │   ├── __init__.py
│   │   │   │   │       │   ├── lora_ops.py
│   │   │   │   │       │   ├── ops.py
│   │   │   │   │       │   └── single.py
│   │   │   │   │       ├── layers.py
│   │   │   │   │       ├── lora_ops.py
│   │   │   │   │       ├── parallel_experts.py
│   │   │   │   │       ├── parallel_linear_lora.py
│   │   │   │   │       ├── selective_dequant.py
│   │   │   │   │       └── selective_dequant_kernel.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── sonicmoe/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── patch.py
│   │   │   │       ├── routing.py
│   │   │   │       └── weight_converter.py
│   │   │   ├── liger/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── deepseekv2.py
│   │   │   │   │   ├── jamba.py
│   │   │   │   │   ├── llama4.py
│   │   │   │   │   ├── qwen3.py
│   │   │   │   │   └── qwen3_moe.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── utils.py
│   │   │   ├── llm_compressor/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   ├── plugin.py
│   │   │   │   └── utils.py
│   │   │   ├── lm_eval/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── cli.py
│   │   │   ├── spectrum/
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── args.py
│   │   │   │   └── model_snr_results/
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-1.5B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-1.5B.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-3B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-3B.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-7B-Instruct.json
│   │   │   │       ├── snr_results_Qwen-Qwen2.5-7B.json
│   │   │   │       ├── snr_results_google-gemma-2-2b.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-1B-Instruct.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-1B.json
│   │   │   │       ├── snr_results_meta-llama-Llama-3.2-3B-Instruct.json
│   │   │   │       └── snr_results_meta-llama-Llama-3.2-3B.json
│   │   │   └── swanlab/
│   │   │       ├── README.md
│   │   │       ├── __init__.py
│   │   │       ├── args.py
│   │   │       ├── callbacks.py
│   │   │       ├── completion_logger.py
│   │   │       ├── plugins.py
│   │   │       └── profiling.py
│   │   ├── kernels/
│   │   │   ├── __init__.py
│   │   │   ├── geglu.py
│   │   │   ├── lora.py
│   │   │   ├── quantize.py
│   │   │   ├── swiglu.py
│   │   │   └── utils.py
│   │   ├── loaders/
│   │   │   ├── __init__.py
│   │   │   ├── adapter.py
│   │   │   ├── adapters/
│   │   │   │   └── __init__.py
│   │   │   ├── constants.py
│   │   │   ├── model.py
│   │   │   ├── patch_manager.py
│   │   │   ├── processor.py
│   │   │   ├── tokenizer.py
│   │   │   └── utils.py
│   │   ├── logging_config.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   └── mamba/
│   │   │       ├── __init__.py
│   │   │       ├── configuration_mamba.py
│   │   │       └── modeling_mamba.py
│   │   ├── monkeypatch/
│   │   │   ├── __init__.py
│   │   │   ├── accelerate/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── fsdp2.py
│   │   │   │   └── parallelism_config.py
│   │   │   ├── attention/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── flash_attn_4.py
│   │   │   │   ├── flex_attn.py
│   │   │   │   ├── sage_attn.py
│   │   │   │   └── xformers.py
│   │   │   ├── btlm_attn_hijack_flash.py
│   │   │   ├── data/
│   │   │   │   ├── __init__.py
│   │   │   │   └── batch_dataset_fetcher.py
│   │   │   ├── deepspeed_utils.py
│   │   │   ├── fsdp2_qlora.py
│   │   │   ├── gradient_checkpointing/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── offload_cpu.py
│   │   │   │   └── offload_disk.py
│   │   │   ├── llama_attn_hijack_flash.py
│   │   │   ├── llama_attn_hijack_xformers.py
│   │   │   ├── lora_kernels.py
│   │   │   ├── loss/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chunked.py
│   │   │   │   └── eaft.py
│   │   │   ├── mistral_attn_hijack_flash.py
│   │   │   ├── mixtral/
│   │   │   │   └── __init__.py
│   │   │   ├── models/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── apertus/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── activation.py
│   │   │   │   ├── kimi_linear/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── configuration_kimi.py
│   │   │   │   │   ├── modeling_kimi.py
│   │   │   │   │   ├── patch_kimi_linear.py
│   │   │   │   │   └── tokenization_kimi.py
│   │   │   │   ├── llama4/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   ├── mistral3/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── mistral_common_tokenizer.py
│   │   │   │   ├── pixtral/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling_flash_attention_utils.py
│   │   │   │   ├── qwen3_5/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   ├── qwen3_next/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── modeling.py
│   │   │   │   └── voxtral/
│   │   │   │       ├── __init__.py
│   │   │   │       └── modeling.py
│   │   │   ├── moe_quant.py
│   │   │   ├── multipack.py
│   │   │   ├── peft/
│   │   │   │   ├── __init__.py
│   │   │   │   └── utils.py
│   │   │   ├── relora.py
│   │   │   ├── ring_attn/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── adapters/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── batch.py
│   │   │   │   └── patch.py
│   │   │   ├── scaled_softmax_attn.py
│   │   │   ├── stablelm_attn_hijack_flash.py
│   │   │   ├── tiled_mlp/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── base.py
│   │   │   │   └── patch.py
│   │   │   ├── trainer/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── lr.py
│   │   │   │   ├── trl.py
│   │   │   │   ├── trl_vllm.py
│   │   │   │   └── utils.py
│   │   │   ├── trainer_accelerator_args.py
│   │   │   ├── trainer_fsdp_optim.py
│   │   │   ├── transformers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── trainer_context_parallel.py
│   │   │   │   └── trainer_loss_calc.py
│   │   │   ├── transformers_fa_utils.py
│   │   │   ├── unsloth_.py
│   │   │   ├── utils.py
│   │   │   └── xformers_/
│   │   │       └── __init__.py
│   │   ├── processing_strategies.py
│   │   ├── prompt_strategies/
│   │   │   ├── __init__.py
│   │   │   ├── alpaca_chat.py
│   │   │   ├── alpaca_instruct.py
│   │   │   ├── alpaca_w_system.py
│   │   │   ├── base.py
│   │   │   ├── bradley_terry/
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat_template.py
│   │   │   │   └── llama3.py
│   │   │   ├── chat_template.py
│   │   │   ├── completion.py
│   │   │   ├── context_qa.py
│   │   │   ├── creative_acr.py
│   │   │   ├── dpo/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chat_template.py
│   │   │   │   ├── chatml.py
│   │   │   │   ├── llama3.py
│   │   │   │   ├── passthrough.py
│   │   │   │   ├── user_defined.py
│   │   │   │   └── zephyr.py
│   │   │   ├── input_output.py
│   │   │   ├── jinja_template_analyzer.py
│   │   │   ├── kto/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── chatml.py
│   │   │   │   ├── llama3.py
│   │   │   │   └── user_defined.py
│   │   │   ├── llama2_chat.py
│   │   │   ├── messages/
│   │   │   │   ├── __init__.py
│   │   │   │   └── chat.py
│   │   │   ├── metharme.py
│   │   │   ├── orcamini.py
│   │   │   ├── orpo/
│   │   │   │   ├── __init__.py
│   │   │   │   └── chat_template.py
│   │   │   ├── pretrain.py
│   │   │   ├── pygmalion.py
│   │   │   ├── stepwise_supervised.py
│   │   │   └── user_defined.py
│   │   ├── prompt_tokenizers.py
│   │   ├── prompters.py
│   │   ├── scripts/
│   │   │   ├── __init__.py
│   │   │   ├── vllm_serve_lora.py
│   │   │   └── vllm_worker_ext.py
│   │   ├── telemetry/
│   │   │   ├── __init__.py
│   │   │   ├── callbacks.py
│   │   │   ├── errors.py
│   │   │   ├── manager.py
│   │   │   ├── runtime_metrics.py
│   │   │   └── whitelist.yaml
│   │   ├── train.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── bench.py
│   │       ├── callbacks/
│   │       │   ├── __init__.py
│   │       │   ├── comet_.py
│   │       │   ├── dynamic_checkpoint.py
│   │       │   ├── generation.py
│   │       │   ├── lisa.py
│   │       │   ├── mlflow_.py
│   │       │   ├── models.py
│   │       │   ├── opentelemetry.py
│   │       │   ├── perplexity.py
│   │       │   ├── profiler.py
│   │       │   ├── qat.py
│   │       │   ├── swanlab.py
│   │       │   ├── tokens_per_second.py
│   │       │   └── trackio_.py
│   │       ├── chat_templates/
│   │       │   ├── __init__.py
│   │       │   ├── base.py
│   │       │   └── templates/
│   │       │       ├── alpaca.jinja
│   │       │       ├── aya.jinja
│   │       │       ├── chatml.jinja
│   │       │       ├── cohere.jinja
│   │       │       ├── command_a.jinja
│   │       │       ├── command_a_rag.jinja
│   │       │       ├── command_a_tool_use.jinja
│   │       │       ├── deepseek_v2.jinja
│   │       │       ├── deepseek_v3.jinja
│   │       │       ├── exaone.jinja
│   │       │       ├── exaone4.jinja
│   │       │       ├── falcon_h1.jinja
│   │       │       ├── gemma.jinja
│   │       │       ├── gemma3.jinja
│   │       │       ├── gemma3n.jinja
│   │       │       ├── jamba.jinja
│   │       │       ├── llama3.jinja
│   │       │       ├── llama3_2_vision.jinja
│   │       │       ├── llama4.jinja
│   │       │       ├── llava.jinja
│   │       │       ├── metharme.jinja
│   │       │       ├── mistral_v1.jinja
│   │       │       ├── mistral_v2v3.jinja
│   │       │       ├── mistral_v3_tekken.jinja
│   │       │       ├── mistral_v7_tekken.jinja
│   │       │       ├── phi_3.jinja
│   │       │       ├── phi_35.jinja
│   │       │       ├── phi_4.jinja
│   │       │       ├── pixtral.jinja
│   │       │       ├── qwen2_vl.jinja
│   │       │       ├── qwen3.jinja
│   │       │       ├── qwen3_5.jinja
│   │       │       └── qwen_25.jinja
│   │       ├── collators/
│   │       │   ├── __init__.py
│   │       │   ├── batching.py
│   │       │   ├── core.py
│   │       │   ├── mamba.py
│   │       │   └── mm_chat.py
│   │       ├── comet_.py
│   │       ├── config/
│   │       │   ├── __init__.py
│   │       │   └── models/
│   │       │       └── __init__.py
│   │       ├── ctx_managers/
│   │       │   ├── __init__.py
│   │       │   └── sequence_parallel.py
│   │       ├── data/
│   │       │   ├── __init__.py
│   │       │   ├── lock.py
│   │       │   ├── rl.py
│   │       │   ├── sft.py
│   │       │   ├── shared.py
│   │       │   ├── streaming.py
│   │       │   ├── utils.py
│   │       │   └── wrappers.py
│   │       ├── datasets.py
│   │       ├── dict.py
│   │       ├── distributed.py
│   │       ├── environment.py
│   │       ├── freeze.py
│   │       ├── generation/
│   │       │   ├── __init__.py
│   │       │   └── sft.py
│   │       ├── import_helper.py
│   │       ├── logging.py
│   │       ├── lora.py
│   │       ├── mistral/
│   │       │   ├── __init__.py
│   │       │   ├── mistral3_processor.py
│   │       │   └── mistral_tokenizer.py
│   │       ├── mlflow_.py
│   │       ├── model_shard_quant.py
│   │       ├── optimizers/
│   │       │   ├── __init__.py
│   │       │   └── adopt.py
│   │       ├── quantization.py
│   │       ├── samplers/
│   │       │   ├── __init__.py
│   │       │   ├── multipack.py
│   │       │   └── utils.py
│   │       ├── schedulers.py
│   │       ├── schemas/
│   │       │   ├── __init__.py
│   │       │   ├── config.py
│   │       │   ├── datasets.py
│   │       │   ├── deprecated.py
│   │       │   ├── dynamic_checkpoint.py
│   │       │   ├── enums.py
│   │       │   ├── fsdp.py
│   │       │   ├── integrations.py
│   │       │   ├── internal/
│   │       │   │   └── __init__.py
│   │       │   ├── model.py
│   │       │   ├── multimodal.py
│   │       │   ├── peft.py
│   │       │   ├── quantization.py
│   │       │   ├── training.py
│   │       │   ├── trl.py
│   │       │   ├── utils.py
│   │       │   ├── validation.py
│   │       │   └── vllm.py
│   │       ├── tee.py
│   │       ├── tokenization.py
│   │       ├── trackio_.py
│   │       ├── train.py
│   │       ├── trainer.py
│   │       └── wandb_.py
│   └── setuptools_axolotl_dynamic_dependencies.py
├── styles.css
└── tests/
    ├── __init__.py
    ├── cli/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── test_cli_base.py
    │   ├── test_cli_evaluate.py
    │   ├── test_cli_fetch.py
    │   ├── test_cli_inference.py
    │   ├── test_cli_interface.py
    │   ├── test_cli_merge_lora.py
    │   ├── test_cli_merge_sharded_fsdp_weights.py
    │   ├── test_cli_preprocess.py
    │   ├── test_cli_sweeps.py
    │   ├── test_cli_train.py
    │   ├── test_cli_version.py
    │   ├── test_nested_options.py
    │   └── test_utils.py
    ├── conftest.py
    ├── constants.py
    ├── core/
    │   ├── chat/
    │   │   ├── __init__.py
    │   │   ├── format/
    │   │   │   └── __init__.py
    │   │   └── test_messages.py
    │   ├── test_async_grpo.py
    │   └── test_builders.py
    ├── e2e/
    │   ├── .gitignore
    │   ├── __init__.py
    │   ├── integrations/
    │   │   ├── test_cut_cross_entropy.py
    │   │   ├── test_fp8.py
    │   │   ├── test_hooks.py
    │   │   ├── test_kd.py
    │   │   ├── test_liger.py
    │   │   ├── test_llm_compressor.py
    │   │   ├── test_scattermoe_lora_kernels.py
    │   │   ├── test_scattermoe_lora_olmoe.py
    │   │   └── test_sonicmoe.py
    │   ├── kernels/
    │   │   ├── test_geglu.py
    │   │   ├── test_lora.py
    │   │   ├── test_quantize.py
    │   │   └── test_swiglu.py
    │   ├── multigpu/
    │   │   ├── __init__.py
    │   │   ├── patched/
    │   │   │   ├── __init__.py
    │   │   │   └── test_sp.py
    │   │   ├── solo/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_flex.py
    │   │   │   ├── test_gdpo.py
    │   │   │   └── test_grpo.py
    │   │   ├── test_dist_muon_fsdp2.py
    │   │   ├── test_eval.py
    │   │   ├── test_fp8_fsdp2.py
    │   │   ├── test_fsdp1.py
    │   │   ├── test_fsdp2.py
    │   │   ├── test_gemma3.py
    │   │   ├── test_llama.py
    │   │   ├── test_locking.py
    │   │   ├── test_ray.py
    │   │   └── test_tp.py
    │   ├── patched/
    │   │   ├── __init__.py
    │   │   ├── lora_kernels/
    │   │   │   ├── __init__.py
    │   │   │   └── test_lora_kernel_patching.py
    │   │   ├── test_4d_multipack_llama.py
    │   │   ├── test_activation_checkpointing.py
    │   │   ├── test_cli_integrations.py
    │   │   ├── test_fa_xentropy.py
    │   │   ├── test_falcon_samplepack.py
    │   │   ├── test_flattening.py
    │   │   ├── test_fsdp2_qlora.py
    │   │   ├── test_fused_llama.py
    │   │   ├── test_llama_s2_attention.py
    │   │   ├── test_lora_llama_multipack.py
    │   │   ├── test_mistral_samplepack.py
    │   │   ├── test_mixtral_samplepack.py
    │   │   ├── test_model_patches.py
    │   │   ├── test_peft_embeddings.py
    │   │   ├── test_phi_multipack.py
    │   │   ├── test_resume.py
    │   │   ├── test_unsloth_integration.py
    │   │   └── test_unsloth_qlora.py
    │   ├── solo/
    │   │   ├── __init__.py
    │   │   ├── test_flex.py
    │   │   └── test_relora_llama.py
    │   ├── test_activation_offloading.py
    │   ├── test_deepseekv3.py
    │   ├── test_diffusion.py
    │   ├── test_dpo.py
    │   ├── test_embeddings_lr.py
    │   ├── test_evaluate.py
    │   ├── test_falcon.py
    │   ├── test_gemma2.py
    │   ├── test_gemma3_text.py
    │   ├── test_imports.py
    │   ├── test_llama.py
    │   ├── test_llama_pretrain.py
    │   ├── test_llama_vision.py
    │   ├── test_load_model.py
    │   ├── test_lora_llama.py
    │   ├── test_mamba.py
    │   ├── test_mistral.py
    │   ├── test_mixtral.py
    │   ├── test_optimizers.py
    │   ├── test_packing_loss.py
    │   ├── test_phi.py
    │   ├── test_preprocess.py
    │   ├── test_process_reward_model_smollm2.py
    │   ├── test_profiler.py
    │   ├── test_qat.py
    │   ├── test_quantization.py
    │   ├── test_qwen.py
    │   ├── test_reward_model_smollm2.py
    │   ├── test_save_first_step.py
    │   ├── test_schedulers.py
    │   ├── test_streaming.py
    │   ├── test_tokenizer.py
    │   └── utils.py
    ├── fixtures/
    │   ├── alpaca/
    │   │   └── alpaca.json
    │   ├── conversation.json
    │   ├── conversation.missingturns.json
    │   ├── conversation.tokenized.json
    │   └── conversation.tokenized_llama2chat.json
    ├── hf_offline_utils.py
    ├── integrations/
    │   ├── __init__.py
    │   ├── test_diffusion.py
    │   ├── test_diffusion_callback.py
    │   ├── test_kd_chat_template.py
    │   ├── test_liger.py
    │   ├── test_routing_parity.py
    │   ├── test_scattermoe_autotune_telemetry.py
    │   ├── test_scattermoe_lora.py
    │   ├── test_scattermoe_lora_kernels.py
    │   ├── test_sonicmoe.py
    │   ├── test_sonicmoe_gradients.py
    │   └── test_swanlab.py
    ├── monkeypatch/
    │   ├── test_llama_attn_hijack_flash.py
    │   ├── test_pixtral_flash_attention_patch.py
    │   ├── test_qwen3_next_modeling_patch.py
    │   ├── test_trainer_accelerator_args.py
    │   ├── test_trainer_context_parallel_patch.py
    │   ├── test_trainer_loss_calc.py
    │   ├── test_trl_vllm.py
    │   └── test_voxtral_modeling_patch.py
    ├── patched/
    │   └── test_validation.py
    ├── prompt_strategies/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── messages/
    │   │   ├── __init__.py
    │   │   └── test_chat.py
    │   ├── test_alpaca.py
    │   ├── test_chat_template_ds_schema_unification.py
    │   ├── test_chat_template_utils.py
    │   ├── test_chat_templates.py
    │   ├── test_chat_templates_advanced.py
    │   ├── test_chat_templates_mistral.py
    │   ├── test_chat_templates_thinking.py
    │   ├── test_chat_templates_tool_call_string_arguments.py
    │   ├── test_dpo_chat_templates.py
    │   ├── test_dpo_chatml.py
    │   ├── test_jinja_template_analyzer.py
    │   ├── test_raw_io.py
    │   └── test_stepwise.py
    ├── telemetry/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── test_callbacks.py
    │   ├── test_errors.py
    │   ├── test_manager.py
    │   └── test_runtime_metrics.py
    ├── test_chunked_xentropy.py
    ├── test_context_parallel_batch_size.py
    ├── test_convert.py
    ├── test_data.py
    ├── test_datasets.py
    ├── test_dict.py
    ├── test_exact_deduplication.py
    ├── test_freeze.py
    ├── test_loaders.py
    ├── test_logging_config_file_capture.py
    ├── test_lora.py
    ├── test_normalize_config.py
    ├── test_opentelemetry_callback.py
    ├── test_packed_batch_sampler.py
    ├── test_packed_dataset.py
    ├── test_packed_pretraining.py
    ├── test_perplexity.py
    ├── test_prompt_tokenizers.py
    ├── test_prompters.py
    ├── test_revision_parameter.py
    ├── test_save_deduplicated.py
    ├── test_schedulers.py
    ├── test_streaming.py
    ├── test_tensor_parallel_batch_size.py
    ├── test_tokenizers.py
    ├── test_train.py
    ├── test_triton_kernels.py
    ├── test_utils_tee.py
    ├── test_validation_dataset.py
    └── utils/
        ├── callbacks/
        │   └── test_dynamic_checkpoint.py
        ├── data/
        │   └── test_utils.py
        ├── lora/
        │   ├── test_config_validation_lora.py
        │   ├── test_freeze_lora.py
        │   └── test_merge_lora.py
        ├── schemas/
        │   └── validation/
        │       ├── test_activation_offloading.py
        │       ├── test_default_values.py
        │       ├── test_fsdp.py
        │       └── test_moe_quant.py
        ├── test_grpo_rw_fnc.py
        ├── test_import_helper.py
        ├── test_mistral3_processor.py
        └── test_train.py

Download .txt

Showing preview only (358K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (3886 symbols across 505 files)

FILE: .runpod/src/handler.py
  function handler (line 20) | async def handler(job):

FILE: .runpod/src/train.py
  function train (line 8) | async def train(config_path: str, gpu_id: str = "0", preprocess: bool = ...

FILE: .runpod/src/utils.py
  function get_output_dir (line 10) | def get_output_dir(run_id):
  function make_valid_config (line 15) | def make_valid_config(input_args):
  function set_config_env_vars (line 40) | def set_config_env_vars(args: dict):

FILE: benchmarks/bench_entropy.py
  function entropy_from_logits_original (line 20) | def entropy_from_logits_original(logits: torch.Tensor, chunk_size: int =...
  function _clean_gpu (line 33) | def _clean_gpu():
  function profile_time (line 41) | def profile_time(fn, logits, n_iters=BENCH_ITERS):
  function profile_memory (line 60) | def profile_memory(fn, logits, n_iters=MEM_ITERS):
  function fmt (line 77) | def fmt(values, unit=""):
  function benchmark_contiguous (line 83) | def benchmark_contiguous():
  function benchmark_noncontiguous (line 138) | def benchmark_noncontiguous():

FILE: benchmarks/bench_scattermoe_lora.py
  function _resolve_config (line 46) | def _resolve_config(spec):
  function _clean (line 79) | def _clean():
  function _bench (line 85) | def _bench(fn, warmup=WARMUP, iters=ITERS):
  function _setup (line 100) | def _setup(num_experts, K, N, T, top_k, R):
  function _call_fwd (line 117) | def _call_fwd(x, W, sei, ssi, top_k, lA, lB):
  function _call_base (line 130) | def _call_base(x, W, sei, ssi, top_k):
  function _call_dx (line 140) | def _call_dx(dy, W, sei, ssi, lA, lB):
  function _call_bwd (line 155) | def _call_bwd(dy, gx, lA, lB, eo, num_experts):
  function main (line 170) | def main():

FILE: benchmarks/bench_selective_logsoftmax.py
  function _clean_gpu (line 22) | def _clean_gpu():
  function profile_time (line 30) | def profile_time(fn, args, n_iters=BENCH_ITERS):
  function profile_memory (line 47) | def profile_memory(fn, args, n_iters=MEM_ITERS):
  function fmt (line 64) | def fmt(values, unit=""):
  function benchmark_forward (line 70) | def benchmark_forward():
  function benchmark_backward (line 123) | def benchmark_backward():

FILE: cicd/cleanup.py
  function cleanup (line 13) | def cleanup():
  function main (line 18) | def main():

FILE: cicd/e2e_tests.py
  function cicd_pytest (line 14) | def cicd_pytest():
  function main (line 19) | def main():

FILE: cicd/multigpu.py
  function run_cmd (line 63) | def run_cmd(cmd: str, run_folder: str):
  function cicd_pytest (line 79) | def cicd_pytest():
  function main (line 84) | def main():

FILE: cicd/single_gpu.py
  function run_cmd (line 64) | def run_cmd(cmd: str, run_folder: str):

FILE: docs/scripts/generate_config_docs.py
  class QuartoGenerator (line 20) | class QuartoGenerator:
    method __init__ (line 23) | def __init__(self):
    method _get_direct_fields (line 28) | def _get_direct_fields(self, cls: Type[BaseModel]) -> FrozenSet[str]:
    method _is_pydantic_model (line 46) | def _is_pydantic_model(self, type_obj) -> bool:
    method _extract_nested_type (line 50) | def _extract_nested_type(self, field_type) -> Any:
    method _extract_all_pydantic_models_from_type (line 126) | def _extract_all_pydantic_models_from_type(
    method _get_nested_models (line 186) | def _get_nested_models(
    method _build_inheritance_map (line 216) | def _build_inheritance_map(self, child_class: Type[BaseModel]):
    method _wrap_comment (line 237) | def _wrap_comment(self, text: str, width: int = 88) -> list[str]:
    method _extract_type_from_source (line 247) | def _extract_type_from_source(
    method _get_type_from_class_source (line 263) | def _get_type_from_class_source(self, class_obj: type, field_name: str...
    method _extract_field_groups_from_all_classes (line 285) | def _extract_field_groups_from_all_classes(
    method _extract_field_groups_from_source (line 319) | def _extract_field_groups_from_source(
    method _generate_field_documentation (line 438) | def _generate_field_documentation(
    method generate_qmd (line 591) | def generate_qmd(
  function main (line 736) | def main():

FILE: docs/scripts/generate_examples_docs.py
  function slugify (line 20) | def slugify(name: str) -> str:
  function read_allowlist (line 27) | def read_allowlist():
  function find_readme (line 36) | def find_readme(folder: Path) -> Path | None:
  function remove_first_h1 (line 44) | def remove_first_h1(md: str) -> tuple[str, str | None]:
  function rewrite_and_copy_assets (line 68) | def rewrite_and_copy_assets(md: str, src_dir: Path, dest_assets_root: Pa...
  function rewrite_readme_links (line 94) | def rewrite_readme_links(
  function write_qmd (line 209) | def write_qmd(out_path: Path, title: str, body_md: str):
  function update_quarto_yml (line 215) | def update_quarto_yml(generated: list[tuple[str, str, str]]):
  function main (line 286) | def main():

FILE: examples/swanlab/custom_trainer_profiling.py
  class CustomTrainerWithProfiling (line 30) | class CustomTrainerWithProfiling(AxolotlTrainer):
    method __init__ (line 39) | def __init__(self, *args, **kwargs):
    method training_step (line 56) | def training_step(self, model, inputs):
    method compute_loss (line 64) | def compute_loss(self, model, inputs, return_outputs=False):
    method prediction_step (line 72) | def prediction_step(self, model, inputs, prediction_loss_only, ignore_...
    method complex_training_step (line 85) | def complex_training_step(self, model, inputs):
    method _prepare_inputs (line 115) | def _prepare_inputs(self, inputs):
    method _prepare_input_for_model (line 130) | def _prepare_input_for_model(self, input_ids):
    method potentially_failing_method (line 147) | def potentially_failing_method(self):
    method _do_risky_computation (line 162) | def _do_risky_computation(self):
  class AdvancedProfilingTrainer (line 172) | class AdvancedProfilingTrainer(AxolotlTrainer):
    method __init__ (line 175) | def __init__(self, *args, **kwargs):
    method training_step (line 197) | def training_step(self, model, inputs):
    method _prepare_inputs (line 204) | def _prepare_inputs(self, inputs):
    method _debug_method (line 211) | def _debug_method(self, data):

FILE: scripts/chat_datasets.py
  function parse_dataset (line 13) | def parse_dataset(dataset=None, split="train"):

FILE: setup.py
  function parse_requirements (line 12) | def parse_requirements(extras_require_map):
  function get_package_version (line 147) | def get_package_version():

FILE: src/axolotl/cli/args.py
  class PreprocessCliArgs (line 8) | class PreprocessCliArgs:
  class TrainerCliArgs (line 29) | class TrainerCliArgs:
  class VllmServeCliArgs (line 40) | class VllmServeCliArgs:
  class QuantizeCliArgs (line 109) | class QuantizeCliArgs:
  class EvaluateCliArgs (line 122) | class EvaluateCliArgs:
  class InferenceCliArgs (line 131) | class InferenceCliArgs:

FILE: src/axolotl/cli/art.py
  function print_axolotl_text_art (line 22) | def print_axolotl_text_art():

FILE: src/axolotl/cli/checks.py
  function check_accelerate_default_config (line 16) | def check_accelerate_default_config() -> None:
  function check_user_token (line 24) | def check_user_token() -> bool:

FILE: src/axolotl/cli/cloud/__init__.py
  function load_cloud_cfg (line 16) | def load_cloud_cfg(cloud_config: Path | str) -> DictDefault:
  function do_cli_preprocess (line 24) | def do_cli_preprocess(
  function do_cli_train (line 35) | def do_cli_train(
  function do_cli_lm_eval (line 66) | def do_cli_lm_eval(

FILE: src/axolotl/cli/cloud/base.py
  class Cloud (line 9) | class Cloud(ABC):
    method preprocess (line 15) | def preprocess(self, config_yaml: str, *args, **kwargs) -> None:
    method train (line 19) | def train(

FILE: src/axolotl/cli/cloud/baseten/__init__.py
  class BasetenCloud (line 14) | class BasetenCloud(Cloud):
    method __init__ (line 17) | def __init__(self, config: dict):
    method preprocess (line 20) | def preprocess(self, config_yaml: str, *args, **kwargs) -> None:
    method train (line 26) | def train(

FILE: src/axolotl/cli/cloud/modal_.py
  function run_cmd (line 18) | def run_cmd(cmd: str, run_folder: str, volumes=None):
  class ModalCloud (line 52) | class ModalCloud(Cloud):
    method __init__ (line 57) | def __init__(self, config, app=None):
    method get_env (line 69) | def get_env(self):
    method get_image (line 84) | def get_image(self):
    method get_secrets (line 128) | def get_secrets(self):
    method create_volume (line 140) | def create_volume(self, volume_config):
    method get_ephemeral_disk_size (line 145) | def get_ephemeral_disk_size(self):
    method get_preprocess_timeout (line 148) | def get_preprocess_timeout(self):
    method get_preprocess_memory (line 153) | def get_preprocess_memory(self):
    method get_preprocess_env (line 161) | def get_preprocess_env(self):
    method preprocess (line 172) | def preprocess(self, config_yaml: str, *args, **kwargs):
    method get_train_timeout (line 183) | def get_train_timeout(self):
    method get_train_gpu (line 188) | def get_train_gpu(self):
    method get_train_memory (line 208) | def get_train_memory(self):
    method get_train_env (line 214) | def get_train_env(self, local_dirs=None):
    method train (line 228) | def train(
    method lm_eval (line 247) | def lm_eval(self, config_yaml: str):
  function _preprocess (line 261) | def _preprocess(config_yaml: str, volumes=None):
  function _train (line 273) | def _train(
  function _lm_eval (line 307) | def _lm_eval(config_yaml: str, volumes=None):

FILE: src/axolotl/cli/config.py
  function _coerce_value (line 36) | def _coerce_value(value: Any, existing: Optional[Any] = None) -> Any:
  function check_remote_config (line 97) | def check_remote_config(config: Union[str, Path]) -> Union[str, Path]:
  function choose_config (line 163) | def choose_config(path: Path) -> str:
  function prepare_plugins (line 208) | def prepare_plugins(cfg: DictDefault):
  function plugin_set_cfg (line 223) | def plugin_set_cfg(cfg: DictDefault):
  function load_cfg (line 230) | def load_cfg(
  function compute_supports_fp8 (line 349) | def compute_supports_fp8() -> bool:

FILE: src/axolotl/cli/delinearize_llama4.py
  function iter_convert_patched_to_hf (line 15) | def iter_convert_patched_to_hf(model_state_dict, num_experts) -> Generator:
  function do_cli (line 71) | def do_cli(model: Union[Path, str], output: Union[Path, str]) -> None:

FILE: src/axolotl/cli/evaluate.py
  function do_evaluate (line 21) | def do_evaluate(cfg: DictDefault, cli_args: TrainerCliArgs) -> None:
  function do_cli (line 44) | def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs) -> None:

FILE: src/axolotl/cli/inference.py
  function get_multi_line_input (line 32) | def get_multi_line_input() -> str:
  function do_inference (line 50) | def do_inference(
  function do_inference_gradio (line 168) | def do_inference_gradio(
  function do_cli (line 299) | def do_cli(

FILE: src/axolotl/cli/main.py
  function cli (line 43) | def cli():
  function preprocess (line 57) | def preprocess(config: str, cloud: Optional[str] = None, **kwargs):
  function train (line 98) | def train(
  function evaluate (line 153) | def evaluate(ctx: click.Context, config: str, launcher: str, **kwargs):
  function inference (line 198) | def inference(ctx: click.Context, config: str, launcher: str, gradio: bo...
  function merge_sharded_fsdp_weights (line 245) | def merge_sharded_fsdp_weights(
  function merge_lora (line 282) | def merge_lora(config: str, **kwargs):
  function fetch (line 299) | def fetch(directory: str, dest: Optional[str]):
  function vllm_serve (line 318) | def vllm_serve(config: str, **cli_args: VllmServeCliArgs):
  function quantize (line 328) | def quantize(config: str, **cli_args: QuantizeCliArgs):
  function delinearize_llama4 (line 337) | def delinearize_llama4(model: str, output: str):
  function main (line 346) | def main():

FILE: src/axolotl/cli/merge_lora.py
  function do_merge_lora (line 18) | def do_merge_lora(*, cfg: DictDefault) -> None:
  function do_cli (line 55) | def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs) -> None:

FILE: src/axolotl/cli/merge_sharded_fsdp_weights.py
  class BFloat16CastPlanner (line 31) | class BFloat16CastPlanner(_EmptyStateDictLoadPlanner):
    method commit_tensor (line 34) | def commit_tensor(self, read_item, tensor):
  function _distributed_checkpoint_to_merged_weights (line 38) | def _distributed_checkpoint_to_merged_weights(
  function merge_fsdp_weights (line 108) | def merge_fsdp_weights(
  function do_cli (line 169) | def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs):

FILE: src/axolotl/cli/preprocess.py
  function do_preprocess (line 29) | def do_preprocess(cfg: DictDefault, cli_args: PreprocessCliArgs) -> None:
  function do_cli (line 99) | def do_cli(

FILE: src/axolotl/cli/quantize.py
  function do_quantize (line 23) | def do_quantize(

FILE: src/axolotl/cli/train.py
  function do_train (line 23) | def do_train(cfg: DictDefault, cli_args: TrainerCliArgs):
  function do_cli (line 55) | def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs):
  function ray_train_func (line 91) | def ray_train_func(kwargs: dict):

FILE: src/axolotl/cli/utils/args.py
  function _strip_optional_type (line 12) | def _strip_optional_type(field_type: type | str | None):
  function filter_none_kwargs (line 32) | def filter_none_kwargs(func: Callable) -> Callable:
  function add_options_from_dataclass (line 53) | def add_options_from_dataclass(config_class: Type[Any]) -> Callable:
  function _is_pydantic_model (line 91) | def _is_pydantic_model(field_type: type) -> bool:
  function _get_field_description (line 99) | def _get_field_description(field) -> str | None:
  function _add_nested_model_options (line 108) | def _add_nested_model_options(
  function add_options_from_config (line 148) | def add_options_from_config(config_class: Type[BaseModel]) -> Callable:

FILE: src/axolotl/cli/utils/diffusion.py
  function diffusion_inference (line 12) | def diffusion_inference(
  function _parse_commands (line 91) | def _parse_commands(text: str):
  function run_diffusion (line 139) | def run_diffusion(
  function render_html (line 199) | def render_html(
  function launch_diffusion_gradio_ui (line 260) | def launch_diffusion_gradio_ui(

FILE: src/axolotl/cli/utils/fetch.py
  function _download_file (line 16) | def _download_file(
  function fetch_from_github (line 70) | def fetch_from_github(

FILE: src/axolotl/cli/utils/load.py
  function load_model_and_tokenizer (line 20) | def load_model_and_tokenizer(

FILE: src/axolotl/cli/utils/sweeps.py
  function generate_sweep_configs (line 9) | def generate_sweep_configs(

FILE: src/axolotl/cli/utils/train.py
  function _add_default_rdzv_args (line 15) | def _add_default_rdzv_args(launcher_args: list[str]) -> list[str]:
  function build_command (line 46) | def build_command(base_cmd: list[str], options: dict[str, Any]) -> list[...
  function generate_config_files (line 69) | def generate_config_files(config: str, sweep: str | None) -> Iterator[tu...
  function launch_training (line 109) | def launch_training(
  function _launch_cloud_training (line 134) | def _launch_cloud_training(
  function _launch_accelerate_training (line 157) | def _launch_accelerate_training(
  function _launch_torchrun_training (line 195) | def _launch_torchrun_training(
  function _launch_python_training (line 221) | def _launch_python_training(cfg_file: str, kwargs: dict) -> None:

FILE: src/axolotl/cli/vllm_serve.py
  class AxolotlScriptArguments (line 15) | class AxolotlScriptArguments(ScriptArguments):
  function do_vllm_serve (line 24) | def do_vllm_serve(

FILE: src/axolotl/common/datasets.py
  class TrainDatasetMeta (line 23) | class TrainDatasetMeta:
  function sample_dataset (line 31) | def sample_dataset(dataset: Dataset, num_samples: int) -> Dataset:
  function load_datasets (line 39) | def load_datasets(
  function load_preference_datasets (line 102) | def load_preference_datasets(

FILE: src/axolotl/convert.py
  class FileReader (line 7) | class FileReader:
    method read (line 12) | def read(self, file_path):
  class FileWriter (line 17) | class FileWriter:
    method __init__ (line 22) | def __init__(self, file_path):
    method write (line 25) | def write(self, content):
  class StdoutWriter (line 30) | class StdoutWriter:
    method write (line 35) | def write(self, content):
  class JsonParser (line 40) | class JsonParser:
    method parse (line 45) | def parse(self, content):
  class JsonlSerializer (line 49) | class JsonlSerializer:
    method serialize (line 54) | def serialize(self, data):
  class JsonToJsonlConverter (line 59) | class JsonToJsonlConverter:
    method __init__ (line 64) | def __init__(self, file_reader, file_writer, json_parser, jsonl_serial...
    method convert (line 70) | def convert(self, input_file_path):

FILE: src/axolotl/core/builders/base.py
  class TrainerBuilderBase (line 55) | class TrainerBuilderBase(abc.ABC):
    method __init__ (line 58) | def __init__(self, cfg, model, tokenizer, processor=None):
    method model_ref (line 78) | def model_ref(self):
    method model_ref (line 82) | def model_ref(self, model):
    method train_dataset (line 86) | def train_dataset(self):
    method train_dataset (line 90) | def train_dataset(self, dataset):
    method eval_dataset (line 94) | def eval_dataset(self):
    method eval_dataset (line 98) | def eval_dataset(self, dataset):
    method peft_config (line 102) | def peft_config(self):
    method peft_config (line 106) | def peft_config(self, peft_config):
    method build (line 110) | def build(self, total_num_steps):
    method get_callbacks (line 113) | def get_callbacks(self) -> list[TrainerCallback]:
    method get_post_trainer_create_callbacks (line 182) | def get_post_trainer_create_callbacks(self, trainer):
    method hook_pre_create_training_args (line 200) | def hook_pre_create_training_args(self, training_arguments_kwargs):
    method hook_post_create_training_args (line 204) | def hook_post_create_training_args(self, training_arguments):
    method hook_pre_create_trainer (line 208) | def hook_pre_create_trainer(self, trainer_kwargs, trainer_cls):
    method hook_post_create_trainer (line 212) | def hook_post_create_trainer(self, trainer):
    method _configure_warmup_and_logging (line 216) | def _configure_warmup_and_logging(
    method _configure_precision_settings (line 251) | def _configure_precision_settings(self, training_args_kwargs: dict):
    method _configure_scheduler (line 261) | def _configure_scheduler(self, training_args_kwargs: dict):
    method _configure_optimizer (line 273) | def _configure_optimizer(self, training_args_kwargs: dict, trainer_kwa...
    method _configure_hub_parameters (line 426) | def _configure_hub_parameters(self, training_args_kwargs: dict):
    method _configure_save_and_eval_strategy (line 439) | def _configure_save_and_eval_strategy(self, training_args_kwargs: dict):
    method _configure_reporting (line 466) | def _configure_reporting(self, training_args_kwargs: dict):
    method _configure_torch_compile (line 490) | def _configure_torch_compile(self, training_args_kwargs: dict):
    method _configure_accelerator_config (line 502) | def _configure_accelerator_config(self, training_args_kwargs: dict):
    method _configure_gradient_checkpointing (line 510) | def _configure_gradient_checkpointing(self, training_args_kwargs: dict):
    method _set_base_training_args (line 528) | def _set_base_training_args(

FILE: src/axolotl/core/builders/causal.py
  class HFCausalTrainerBuilder (line 53) | class HFCausalTrainerBuilder(TrainerBuilderBase):
    method get_callbacks (line 59) | def get_callbacks(self):
    method get_post_trainer_create_callbacks (line 82) | def get_post_trainer_create_callbacks(self, trainer):
    method _get_trainer_cls (line 134) | def _get_trainer_cls(self):
    method build (line 163) | def build(self, total_num_steps):
    method build_collator (line 454) | def build_collator(

FILE: src/axolotl/core/builders/rl.py
  class HFRLTrainerBuilder (line 24) | class HFRLTrainerBuilder(TrainerBuilderBase):
    method get_callbacks (line 27) | def get_callbacks(self):
    method get_post_trainer_create_callbacks (line 35) | def get_post_trainer_create_callbacks(self, trainer):
    method _get_trainer_cls (line 39) | def _get_trainer_cls(self, trainer_kwargs: dict):
    method _build_training_arguments (line 96) | def _build_training_arguments(self, total_num_steps):
    method build (line 206) | def build(self, total_num_steps):

FILE: src/axolotl/core/chat/format/chatml.py
  function format_message (line 11) | def format_message(

FILE: src/axolotl/core/chat/format/llama3x.py
  function format_message (line 11) | def format_message(message: Messages, message_index: Optional[int] = Non...

FILE: src/axolotl/core/chat/format/shared.py
  function wrap_tools (line 8) | def wrap_tools(message: Messages):

FILE: src/axolotl/core/chat/messages.py
  class MessageRoles (line 13) | class MessageRoles(str, Enum):
  class MessageContentTypes (line 28) | class MessageContentTypes(str, Enum):
  class SpecialToken (line 41) | class SpecialToken(str, Enum):
  class ToolCallFunction (line 50) | class ToolCallFunction(BaseModel):
  class Tool (line 59) | class Tool(BaseModel):
  class ToolCallContents (line 69) | class ToolCallContents(BaseModel):
    method __str__ (line 78) | def __str__(self) -> str:
  class ToolResponseContents (line 85) | class ToolResponseContents(BaseModel):
    method __str__ (line 94) | def __str__(self) -> str:
  class MessageContents (line 101) | class MessageContents(BaseModel):
    method __str__ (line 113) | def __str__(self) -> str:
  class Messages (line 120) | class Messages(BaseModel):
    method __str__ (line 131) | def __str__(self) -> str:
    method tokenized (line 134) | def tokenized(
  class Chats (line 180) | class Chats(BaseModel):
    method __str__ (line 187) | def __str__(self) -> str:
    method tokenized (line 190) | def tokenized(
  class ChatFormattedChats (line 208) | class ChatFormattedChats(Chats):
    method model_post_init (line 216) | def model_post_init(self, __context):
  class PreferenceChats (line 223) | class PreferenceChats(BaseModel):

FILE: src/axolotl/core/datasets/chat.py
  class TokenizedChatDataset (line 13) | class TokenizedChatDataset(Dataset):
    method __init__ (line 18) | def __init__(

FILE: src/axolotl/core/datasets/transforms/chat_builder.py
  function chat_message_transform_builder (line 9) | def chat_message_transform_builder(

FILE: src/axolotl/core/trainers/base.py
  class AxolotlTrainer (line 64) | class AxolotlTrainer(
    method axolotl_cfg (line 81) | def axolotl_cfg(self):
    method axolotl_cfg (line 85) | def axolotl_cfg(self, cfg):
    method __init__ (line 88) | def __init__(
    method _create_multipack_sampler (line 109) | def _create_multipack_sampler(
    method _get_train_sampler (line 150) | def _get_train_sampler(
    method _get_eval_sampler (line 187) | def _get_eval_sampler(self, eval_dataset: Dataset | None = None) -> Sa...
    method _get_dataloader (line 219) | def _get_dataloader(
    method _get_bench_sampler (line 316) | def _get_bench_sampler(
    method get_bench_dataloader (line 323) | def get_bench_dataloader(
    method compute_loss (line 344) | def compute_loss(
    method evaluate (line 399) | def evaluate(self, *args, **kwargs):
    method orpo_concatenate_inputs (line 404) | def orpo_concatenate_inputs(inputs, label_pad_token=-100, pad_token=0,...
    method orpo_compute_custom_loss (line 455) | def orpo_compute_custom_loss(self, logits, labels):
    method orpo_compute_logps (line 473) | def orpo_compute_logps(
    method orpo_compute_loss (line 497) | def orpo_compute_loss(
    method push_to_hub (line 565) | def push_to_hub(self, *args, **kwargs) -> str:
    method create_accelerator_and_postprocess (line 578) | def create_accelerator_and_postprocess(self):
    method additional_accelerator_args (line 587) | def additional_accelerator_args(
    method log (line 609) | def log(self, logs: dict[str, float], start_time: float | None = None)...
    method store_metrics (line 672) | def store_metrics(
    method _save_checkpoint (line 695) | def _save_checkpoint(self, model, trial, **kwargs):
    method _save (line 717) | def _save(self, output_dir: Optional[str] = None, state_dict=None):

FILE: src/axolotl/core/trainers/dpo/__init__.py
  class DPOStrategy (line 7) | class DPOStrategy:
    method get_trainer_class (line 11) | def get_trainer_class(cls):
    method get_training_args_class (line 15) | def get_training_args_class(cls):
    method set_training_args_kwargs (line 21) | def set_training_args_kwargs(cls, cfg):

FILE: src/axolotl/core/trainers/dpo/args.py
  class AxolotlDPOConfig (line 13) | class AxolotlDPOConfig(AxolotlTrainingMixins, DPOConfig):

FILE: src/axolotl/core/trainers/dpo/trainer.py
  class AxolotlDPOTrainer (line 23) | class AxolotlDPOTrainer(
    method __init__ (line 35) | def __init__(self, *args, dataset_tags=None, **kwargs):
    method push_to_hub (line 43) | def push_to_hub(self, *args, **kwargs) -> str:
    method tokenize_row (line 57) | def tokenize_row(
    method training_step (line 87) | def training_step(
    method concatenated_forward (line 98) | def concatenated_forward(

FILE: src/axolotl/core/trainers/grpo/__init__.py
  class GRPOStrategy (line 26) | class GRPOStrategy:
    method get_trainer_class (line 30) | def get_trainer_class(
    method get_training_args_class (line 51) | def get_training_args_class(
    method set_training_args_kwargs (line 59) | def set_training_args_kwargs(cls, cfg: DictDefault) -> dict[str, Any]:
    method set_trainer_args (line 205) | def set_trainer_args(cls, cfg: DictDefault) -> list[Any]:
    method set_trainer_kwargs (line 216) | def set_trainer_kwargs(cls, cfg: DictDefault) -> dict[str, Any]:
    method get_collator (line 228) | def get_collator(cls, *args, **kwargs):
    method get_blocklist_args_kwargs (line 233) | def get_blocklist_args_kwargs(cls) -> list[str]:
    method get_reward_func (line 242) | def get_reward_func(cls, reward_func_fqn: str) -> RewardFunc:
    method get_rollout_func (line 286) | def get_rollout_func(cls, rollout_func_fqn: str):

FILE: src/axolotl/core/trainers/grpo/args.py
  class AxolotlGRPOConfig (line 14) | class AxolotlGRPOConfig(AxolotlTrainingMixins, GRPOConfig):
  class AxolotlAsyncGRPOConfig (line 21) | class AxolotlAsyncGRPOConfig(AxolotlTrainingMixins, FastAsyncGRPOConfig):

FILE: src/axolotl/core/trainers/grpo/async_trainer.py
  function disable_gradient_checkpointing (line 68) | def disable_gradient_checkpointing(model, kwargs):
  class AsyncGRPOConfig (line 98) | class AsyncGRPOConfig(GRPOConfig):
  class ProducerConfig (line 184) | class ProducerConfig:
    method __post_init__ (line 211) | def __post_init__(self):
  class DataProducer (line 232) | class DataProducer(ABC):
    method produce (line 242) | def produce(
  class BaseDataProducer (line 256) | class BaseDataProducer(DataProducer):
    method __init__ (line 259) | def __init__(self, config: ProducerConfig | None = None):
    method on_rollout_begin (line 262) | def on_rollout_begin(self, global_step: int) -> None:
    method on_rollout_end (line 265) | def on_rollout_end(self, dataset: Dataset, global_step: int) -> None:
  class AsyncDataProducer (line 269) | class AsyncDataProducer:
    method __init__ (line 282) | def __init__(
    method config (line 303) | def config(self) -> ProducerConfig:
    method produce (line 306) | def produce(self, model: Any, global_step: int, **kwargs) -> Dataset:
    method _broadcast_dataset (line 358) | def _broadcast_dataset(self, dataset) -> Dataset:
    method _locked_produce (line 391) | def _locked_produce(self, model: Any, global_step: int, **kwargs) -> D...
    method on_rollout_begin (line 396) | def on_rollout_begin(self, global_step: int) -> None:
    method on_rollout_end (line 400) | def on_rollout_end(self, dataset: Dataset, global_step: int) -> None:
    method shutdown (line 404) | def shutdown(self) -> None:
  class DataProducerCallback (line 412) | class DataProducerCallback:
  class RolloutDataset (line 424) | class RolloutDataset(Dataset):
    method __init__ (line 434) | def __init__(self, data: dict[str, Any]):
    method __len__ (line 461) | def __len__(self) -> int:
    method __getitem__ (line 464) | def __getitem__(self, idx: int) -> dict[str, Any]:
  function make_rollout_collator (line 473) | def make_rollout_collator(shared_keys: set[str]):
  class GRPODataProducer (line 492) | class GRPODataProducer(BaseDataProducer):
    method __init__ (line 499) | def __init__(
    method set_trainer (line 523) | def set_trainer(self, trainer) -> None:
    method _init_prompt_dataloader (line 528) | def _init_prompt_dataloader(self) -> None:
    method produce (line 570) | def produce(
  class AsyncGRPOTrainer (line 623) | class AsyncGRPOTrainer(GRPOTrainer):
    method __init__ (line 630) | def __init__(self, *args, **kwargs):
    method _create_data_producer (line 709) | def _create_data_producer(self, args, train_dataset):
    method _setup_async (line 743) | def _setup_async(self):
    method _shutdown_async (line 777) | def _shutdown_async(self):
    method _submit_generation (line 782) | def _submit_generation(self):
    method _sync_peft_weights_no_merge (line 792) | def _sync_peft_weights_no_merge(self):
    method _sync_lora_adapter (line 892) | def _sync_lora_adapter(self):
    method _maybe_sync_vllm_weights (line 998) | def _maybe_sync_vllm_weights(self):
    method _zero_pad_embedding_for_fp8 (line 1045) | def _zero_pad_embedding_for_fp8(self):
    method _generate_single_turn (line 1091) | def _generate_single_turn(self, prompts, **kwargs):
    method _generate_rank0_only (line 1129) | def _generate_rank0_only(self, prompts):
    method _generate_only (line 1217) | def _generate_only(self, inputs, rank0_only=False):
    method _compute_rewards_for_batch (line 1414) | def _compute_rewards_for_batch(
    method _launch_reward_workers (line 1422) | def _launch_reward_workers(self, inputs, prompts, completions, complet...
    method _collect_reward_workers (line 1429) | def _collect_reward_workers(
    method _post_advantage_hook (line 1444) | def _post_advantage_hook(
    method _compute_deferred_scores (line 1463) | def _compute_deferred_scores(self, rollout: dict) -> dict:
    method _compute_streaming_group_scores (line 1802) | def _compute_streaming_group_scores(
    method _score_streaming (line 2153) | def _score_streaming(self, rollout: dict) -> list[dict]:
    method _prepare_inputs (line 2215) | def _prepare_inputs(self, generation_batch):
    method _prepare_inputs_data_producer (line 2230) | def _prepare_inputs_data_producer(self, generation_batch):
    method _prepare_inputs_legacy_async (line 2271) | def _prepare_inputs_legacy_async(self, generation_batch):
    method _get_per_token_logps_and_entropies (line 2302) | def _get_per_token_logps_and_entropies(
    method get_off_policy_mask (line 2437) | def get_off_policy_mask(
    method _compute_loss (line 2453) | def _compute_loss(self, model, inputs):

FILE: src/axolotl/core/trainers/grpo/fast_async_trainer.py
  class FastAsyncGRPOConfig (line 51) | class FastAsyncGRPOConfig(AsyncGRPOConfig):
  class RerollDataProducer (line 117) | class RerollDataProducer(GRPODataProducer):
    method _pre_produce_hook (line 125) | def _pre_produce_hook(self, inputs: list, global_step: int) -> list:
  function _persistent_reward_worker (line 169) | def _persistent_reward_worker(conn):
  class FastAsyncGRPOTrainer (line 216) | class FastAsyncGRPOTrainer(AsyncGRPOTrainer):
    method __init__ (line 226) | def __init__(self, *args, **kwargs):
    method _create_data_producer (line 265) | def _create_data_producer(self, args, train_dataset):
    method _get_reward_workers (line 301) | def _get_reward_workers(self):
    method _shutdown_reward_workers (line 328) | def _shutdown_reward_workers(self):
    method _compute_rewards_for_batch (line 346) | def _compute_rewards_for_batch(
    method _launch_reward_workers (line 355) | def _launch_reward_workers(self, inputs, prompts, completions, complet...
    method _collect_reward_workers (line 410) | def _collect_reward_workers(
    method _post_advantage_hook (line 470) | def _post_advantage_hook(
    method compute_liger_loss (line 637) | def compute_liger_loss(self, unwrapped_model, inputs):
    method _compute_loss (line 753) | def _compute_loss(self, model, inputs):

FILE: src/axolotl/core/trainers/grpo/replay_buffer.py
  class ReplayBuffer (line 8) | class ReplayBuffer:
    method __init__ (line 14) | def __init__(self, max_size: int):
    method __len__ (line 19) | def __len__(self):
    method add (line 22) | def add(self, score: float, data: dict):
    method sample (line 32) | def sample(self, num_samples: int) -> list[dict] | None:

FILE: src/axolotl/core/trainers/grpo/sampler.py
  class SequenceParallelRepeatRandomSampler (line 13) | class SequenceParallelRepeatRandomSampler(Sampler):
    method __init__ (line 54) | def __init__(
    method __iter__ (line 109) | def __iter__(self) -> Iterator[int]:
    method __len__ (line 155) | def __len__(self) -> int:
    method set_epoch (line 166) | def set_epoch(self, epoch: int) -> None:

FILE: src/axolotl/core/trainers/grpo/trainer.py
  class AxolotlGRPOTrainer (line 57) | class AxolotlGRPOTrainer(
  class AxolotlAsyncGRPOTrainer (line 70) | class AxolotlAsyncGRPOTrainer(
  class AxolotlGRPOSequenceParallelTrainer (line 83) | class AxolotlGRPOSequenceParallelTrainer(AxolotlGRPOTrainer):
    method __init__ (line 86) | def __init__(
    method train (line 166) | def train(self, *args, **kwargs):
    method _get_train_sampler (line 176) | def _get_train_sampler(self) -> Sampler:
    method _create_dataloader_params (line 198) | def _create_dataloader_params(self, is_eval=False, custom_batch_size=N...
    method _prepare_dataloader (line 221) | def _prepare_dataloader(
    method get_train_dataloader (line 264) | def get_train_dataloader(self) -> DataLoader:
    method _generate_and_score_completions (line 291) | def _generate_and_score_completions(

FILE: src/axolotl/core/trainers/mamba.py
  class AxolotlMambaTrainer (line 8) | class AxolotlMambaTrainer(AxolotlTrainer):
    method compute_loss (line 13) | def compute_loss(

FILE: src/axolotl/core/trainers/mixins/activation_checkpointing.py
  class ActivationOffloadingMixin (line 25) | class ActivationOffloadingMixin(Trainer):
    method __init__ (line 30) | def __init__(self, *args, **kwargs):
    method training_step (line 44) | def training_step(self, *args, **kwargs):
  function ac_wrap_hf_model (line 49) | def ac_wrap_hf_model(model: nn.Module, **kwargs):
  function get_lora_act_offloading_ctx_manager (line 54) | def get_lora_act_offloading_ctx_manager(

FILE: src/axolotl/core/trainers/mixins/checkpoints.py
  class CheckpointSaveMixin (line 10) | class CheckpointSaveMixin(Trainer):
    method _save_optimizer_and_scheduler (line 13) | def _save_optimizer_and_scheduler(self, output_dir):

FILE: src/axolotl/core/trainers/mixins/distributed_parallel.py
  class DistributedParallelMixin (line 9) | class DistributedParallelMixin(Trainer):
    method _save (line 14) | def _save(self, output_dir: str | None = None, state_dict=None):
    method create_accelerator_and_postprocess (line 23) | def create_accelerator_and_postprocess(self):

FILE: src/axolotl/core/trainers/mixins/optimizer.py
  class OptimizerMixin (line 17) | class OptimizerMixin(Trainer):
    method create_optimizer_grouped_parameters (line 22) | def create_optimizer_grouped_parameters(
    method create_optimizer (line 107) | def create_optimizer(self, model=None):
  class OptimizerInitMixin (line 201) | class OptimizerInitMixin:
    method __init__ (line 207) | def __init__(self, *args, **kwargs):

FILE: src/axolotl/core/trainers/mixins/packing.py
  class PackingMixin (line 6) | class PackingMixin(Trainer):
    method _set_signature_columns_if_needed (line 11) | def _set_signature_columns_if_needed(self):

FILE: src/axolotl/core/trainers/mixins/rng_state_loader.py
  class RngLoaderMixin (line 24) | class RngLoaderMixin(Trainer):
    method _load_rng_state (line 29) | def _load_rng_state(self, checkpoint):

FILE: src/axolotl/core/trainers/mixins/scheduler.py
  class SchedulerMixin (line 20) | class SchedulerMixin(Trainer):
    method create_scheduler (line 27) | def create_scheduler(

FILE: src/axolotl/core/trainers/trl.py
  class AxolotlORPOTrainer (line 14) | class AxolotlORPOTrainer(
  class AxolotlKTOTrainer (line 29) | class AxolotlKTOTrainer(
  class AxolotlCPOTrainer (line 44) | class AxolotlCPOTrainer(
  class AxolotlRewardTrainer (line 59) | class AxolotlRewardTrainer(
  class AxolotlPRMTrainer (line 74) | class AxolotlPRMTrainer(

FILE: src/axolotl/core/trainers/utils.py
  function sanitize_kwargs_for_tagging (line 4) | def sanitize_kwargs_for_tagging(tag_names, kwargs=None):
  function sanitize_kwargs_for_ds_tagging (line 20) | def sanitize_kwargs_for_ds_tagging(dataset_tags, kwargs=None):

FILE: src/axolotl/core/training_args.py
  class AxolotlTrainingArguments (line 23) | class AxolotlTrainingArguments(AxolotlTrainingMixins, TrainingArguments):
  class AxolotlORPOConfig (line 33) | class AxolotlORPOConfig(AxolotlTrainingMixins, ORPOConfig):
  class AxolotlKTOConfig (line 40) | class AxolotlKTOConfig(AxolotlTrainingMixins, KTOConfig):
  class AxolotlCPOConfig (line 47) | class AxolotlCPOConfig(AxolotlTrainingMixins, CPOConfig):
  class AxolotlRewardConfig (line 59) | class AxolotlRewardConfig(AxolotlTrainingMixins, RewardConfig):
  class AxolotlPRMConfig (line 66) | class AxolotlPRMConfig(AxolotlTrainingMixins, PRMConfig):

FILE: src/axolotl/core/training_args_base.py
  class AxolotlTrainingMixins (line 12) | class AxolotlTrainingMixins:

FILE: src/axolotl/datasets.py
  class TokenizedPromptDataset (line 18) | class TokenizedPromptDataset(Dataset):
    method __init__ (line 28) | def __init__(
    method process (line 44) | def process(self, dataset):
  function wrap_dataset_for_tokenized_prompt (line 72) | def wrap_dataset_for_tokenized_prompt(

FILE: src/axolotl/evaluate.py
  function evaluate_dataset (line 30) | def evaluate_dataset(
  function evaluate (line 68) | def evaluate(*, cfg: DictDefault, dataset_meta: TrainDatasetMeta) -> Dic...

FILE: src/axolotl/integrations/base.py
  class BasePlugin (line 44) | class BasePlugin:
    method __init__ (line 76) | def __init__(self):
    method register (line 79) | def register(self, cfg: dict):
    method get_input_args (line 86) | def get_input_args(self) -> str | None:
    method get_training_args_mixin (line 89) | def get_training_args_mixin(self) -> str | None:
    method load_datasets (line 94) | def load_datasets(
    method pre_model_load (line 107) | def pre_model_load(self, cfg: DictDefault):
    method post_model_build (line 114) | def post_model_build(self, cfg: DictDefault, model: PreTrainedModel):
    method pre_lora_load (line 121) | def pre_lora_load(self, cfg: DictDefault, model: PreTrainedModel):
    method post_lora_load (line 129) | def post_lora_load(self, cfg: DictDefault, model: PreTrainedModel | Pe...
    method post_model_load (line 137) | def post_model_load(self, cfg: DictDefault, model: PreTrainedModel | P...
    method get_trainer_cls (line 145) | def get_trainer_cls(self, cfg: DictDefault) -> type[Trainer] | None:
    method post_trainer_create (line 155) | def post_trainer_create(self, cfg: DictDefault, trainer: Trainer):
    method get_training_args (line 163) | def get_training_args(self, cfg: DictDefault):
    method get_collator_cls_and_kwargs (line 174) | def get_collator_cls_and_kwargs(self, cfg: DictDefault, is_eval: bool ...
    method create_optimizer (line 186) | def create_optimizer(self, cfg: DictDefault, trainer: Trainer) -> Opti...
    method create_lr_scheduler (line 197) | def create_lr_scheduler(
    method add_callbacks_pre_trainer (line 216) | def add_callbacks_pre_trainer(
    method add_callbacks_post_trainer (line 230) | def add_callbacks_post_trainer(
    method post_train (line 245) | def post_train(self, cfg: DictDefault, model: PreTrainedModel | PeftMo...
    method post_train_unload (line 253) | def post_train_unload(self, cfg: DictDefault):
  function load_plugin (line 261) | def load_plugin(plugin_name: str) -> BasePlugin:
  class PluginManager (line 301) | class PluginManager:
    method __new__ (line 320) | def __new__(cls):
    method get_instance (line 330) | def get_instance() -> "PluginManager":
    method cfg (line 339) | def cfg(self):
    method cfg (line 343) | def cfg(self, cfg):
    method register (line 346) | def register(self, plugin_name: str):
    method get_input_args (line 366) | def get_input_args(self) -> list[str]:
    method get_training_args_mixin (line 379) | def get_training_args_mixin(self):
    method load_datasets (line 393) | def load_datasets(
    method pre_model_load (line 415) | def pre_model_load(self, cfg: DictDefault):
    method post_model_build (line 424) | def post_model_build(self, cfg: DictDefault, model: PreTrainedModel):
    method pre_lora_load (line 435) | def pre_lora_load(self, cfg: DictDefault, model: PreTrainedModel):
    method post_lora_load (line 445) | def post_lora_load(self, cfg: DictDefault, model: PreTrainedModel | Pe...
    method post_model_load (line 455) | def post_model_load(self, cfg: DictDefault, model: PreTrainedModel | P...
    method get_trainer_cls (line 466) | def get_trainer_cls(self, cfg: DictDefault) -> Trainer | None:
    method get_training_args (line 482) | def get_training_args(self, cfg):
    method get_collator_cls_and_kwargs (line 500) | def get_collator_cls_and_kwargs(self, cfg, is_eval=False):
    method post_trainer_create (line 518) | def post_trainer_create(self, cfg: DictDefault, trainer: Trainer):
    method create_optimizer (line 528) | def create_optimizer(self, trainer: Trainer) -> Optimizer | None:
    method create_lr_scheduler (line 544) | def create_lr_scheduler(
    method add_callbacks_pre_trainer (line 568) | def add_callbacks_pre_trainer(
    method add_callbacks_post_trainer (line 587) | def add_callbacks_post_trainer(
    method post_train (line 606) | def post_train(self, cfg: DictDefault, model: PreTrainedModel | PeftMo...
    method post_train_unload (line 616) | def post_train_unload(self, cfg: DictDefault):
  class BaseOptimizerFactory (line 626) | class BaseOptimizerFactory:
    method __call__ (line 629) | def __call__(
    method get_decay_parameter_names (line 635) | def get_decay_parameter_names(self, model) -> list[str]:

FILE: src/axolotl/integrations/config.py
  function merge_input_args (line 27) | def merge_input_args():
  function merge_training_args (line 60) | def merge_training_args() -> Type:

FILE: src/axolotl/integrations/cut_cross_entropy/__init__.py
  class CutCrossEntropyPlugin (line 42) | class CutCrossEntropyPlugin(BasePlugin):
    method get_input_args (line 47) | def get_input_args(self):
    method _check_requirements (line 50) | def _check_requirements(self):
    method pre_model_load (line 86) | def pre_model_load(self, cfg):
    method patch_llama_like (line 105) | def patch_llama_like(

FILE: src/axolotl/integrations/cut_cross_entropy/args.py
  class CutCrossEntropyArgs (line 28) | class CutCrossEntropyArgs(BaseModel):
    method check_dtype_is_half (line 37) | def check_dtype_is_half(cls, data):
    method check_chunked_cross_entropy_not_set (line 48) | def check_chunked_cross_entropy_not_set(cls, data):

FILE: src/axolotl/integrations/densemixer/args.py
  class DenseMixerArgs (line 6) | class DenseMixerArgs(BaseModel):

FILE: src/axolotl/integrations/densemixer/plugin.py
  class DenseMixerPlugin (line 11) | class DenseMixerPlugin(BasePlugin):
    method get_input_args (line 16) | def get_input_args(self) -> str | None:
    method pre_model_load (line 19) | def pre_model_load(self, cfg):

FILE: src/axolotl/integrations/diffusion/args.py
  class DiffusionConfig (line 10) | class DiffusionConfig(BaseModel):
    method _validate_mask_ratios (line 83) | def _validate_mask_ratios(self) -> "DiffusionConfig":
  class DiffusionArgs (line 89) | class DiffusionArgs(BaseModel):

FILE: src/axolotl/integrations/diffusion/callbacks.py
  class DiffusionGenerationCallback (line 23) | class DiffusionGenerationCallback(TrainerCallback):
    method __init__ (line 26) | def __init__(self, trainer):
    method on_step_end (line 29) | def on_step_end(
    method _log_samples (line 70) | def _log_samples(self, samples: list, step: int):

FILE: src/axolotl/integrations/diffusion/generation.py
  function generate_samples (line 15) | def generate_samples(
  function _sample_sequences_from_dataloader (line 103) | def _sample_sequences_from_dataloader(
  function generate (line 196) | def generate(
  function _clean_masked_text (line 321) | def _clean_masked_text(masked_text: str, tokenizer: Any, mask_token_id: ...
  function _diffusion_step (line 339) | def _diffusion_step(

FILE: src/axolotl/integrations/diffusion/plugin.py
  class DiffusionPlugin (line 15) | class DiffusionPlugin(BasePlugin):
    method __init__ (line 23) | def __init__(self):
    method get_input_args (line 27) | def get_input_args(self) -> str:
    method post_model_load (line 31) | def post_model_load(self, cfg: DictDefault, model: PreTrainedModel | P...
    method get_trainer_cls (line 35) | def get_trainer_cls(self, cfg: DictDefault) -> type[DiffusionTrainer] ...
    method post_trainer_create (line 39) | def post_trainer_create(self, cfg: DictDefault, trainer: DiffusionTrai...

FILE: src/axolotl/integrations/diffusion/trainer.py
  class DiffusionTrainer (line 19) | class DiffusionTrainer(AxolotlTrainer):
    method __init__ (line 22) | def __init__(self, *args, **kwargs):
    method set_config (line 27) | def set_config(self, config: DictDefault):
    method _resolve_mask_token_id (line 40) | def _resolve_mask_token_id(self) -> None:
    method compute_loss (line 59) | def compute_loss(
    method _cache_special_token_ids (line 82) | def _cache_special_token_ids(self):
    method _forward_process (line 100) | def _forward_process(
    method _compute_diffusion_loss (line 159) | def _compute_diffusion_loss(

FILE: src/axolotl/integrations/diffusion/utils.py
  function resolve_mask_token_id (line 12) | def resolve_mask_token_id(
  function create_bidirectional_attention_mask (line 125) | def create_bidirectional_attention_mask(
  function shift_logits_to_input_positions (line 162) | def shift_logits_to_input_positions(logits: torch.Tensor) -> torch.Tensor:

FILE: src/axolotl/integrations/grokfast/__init__.py
  class GrokfastCallbackHandler (line 16) | class GrokfastCallbackHandler(TrainerCallback):
    method __init__ (line 21) | def __init__(self, *args_, alpha=0.98, lamb=2.0, **kwargs):
    method on_train_begin (line 27) | def on_train_begin(self, *args_, **kwargs):
    method on_pre_optimizer_step (line 30) | def on_pre_optimizer_step(self, args_, state, control, **kwargs):
  class GrokfastPlugin (line 36) | class GrokfastPlugin(BasePlugin):
    method get_input_args (line 41) | def get_input_args(self):
    method add_callbacks_post_trainer (line 44) | def add_callbacks_post_trainer(self, cfg, trainer):

FILE: src/axolotl/integrations/grokfast/args.py
  class GrokfastArgs (line 10) | class GrokfastArgs(BaseModel):

FILE: src/axolotl/integrations/grokfast/optimizer.py
  function gradfilter_ma (line 11) | def gradfilter_ma(
  function gradfilter_ema (line 44) | def gradfilter_ema(

FILE: src/axolotl/integrations/kd/__init__.py
  class KDPlugin (line 29) | class KDPlugin(BasePlugin):
    method get_input_args (line 34) | def get_input_args(self):
    method get_training_args_mixin (line 37) | def get_training_args_mixin(self):
    method get_trainer_cls (line 40) | def get_trainer_cls(self, cfg):
    method get_training_args (line 47) | def get_training_args(self, cfg):
    method get_collator_cls_and_kwargs (line 56) | def get_collator_cls_and_kwargs(self, cfg, is_eval=False):
    method pre_model_load (line 84) | def pre_model_load(self, cfg):
    method add_callbacks_post_trainer (line 89) | def add_callbacks_post_trainer(self, cfg: Any, trainer: Trainer) -> list:

FILE: src/axolotl/integrations/kd/args.py
  class InferenceServerType (line 25) | class InferenceServerType(str, Enum):
  class KDArgs (line 34) | class KDArgs(BaseModel):
  class KDTrainingArgsMixin (line 63) | class KDTrainingArgsMixin:

FILE: src/axolotl/integrations/kd/callbacks.py
  class KDTemperatureSchedulerCallback (line 10) | class KDTemperatureSchedulerCallback(TrainerCallback):
    method __init__ (line 15) | def __init__(self, temperature_start, temperature_min, trainer):
    method on_step_end (line 22) | def on_step_end(self, args, state, control, **kwargs):

FILE: src/axolotl/integrations/kd/chat_template.py
  class ChatTemplateStrategyWithKD (line 29) | class ChatTemplateStrategyWithKD(ChatTemplateStrategy):
    method __init__ (line 34) | def __init__(
    method supports_batched (line 66) | def supports_batched(self) -> bool:
    method transform_logprobs (line 70) | def transform_logprobs(self, sample):
    method _tokenize_single_prompt (line 178) | def _tokenize_single_prompt(self, prompt):
    method _prepare_kd_fields (line 189) | def _prepare_kd_fields(self, tokenized_prompt, original_prompt):
  class ChatTemplateStrategyWithKDv2 (line 196) | class ChatTemplateStrategyWithKDv2(ChatTemplateStrategyWithKD):
    method transform_logprobs (line 201) | def transform_logprobs(self, sample):
    method _prepare_kd_fields (line 295) | def _prepare_kd_fields(self, tokenized_prompt, original_prompt):
  class KDStrategyLoader (line 305) | class KDStrategyLoader(StrategyLoader):
    method _get_strategy_cls (line 310) | def _get_strategy_cls(self, cfg):
    method _get_strategy_params (line 313) | def _get_strategy_params(self, cfg, ds_cfg: Dict[str, Any]):
  class KDStrategyLoaderV2 (line 325) | class KDStrategyLoaderV2(KDStrategyLoader):
    method _get_strategy_cls (line 330) | def _get_strategy_cls(self, cfg):

FILE: src/axolotl/integrations/kd/collator.py
  class DataCollatorForKD (line 32) | class DataCollatorForKD(DataCollatorForSeq2Seq):
    method __init__ (line 49) | def __init__(self, *args, **kwargs):
    method __call__ (line 53) | def __call__(self, features, return_tensors=None):
  class KDBatchSamplerDataCollatorForSeq2Seq (line 198) | class KDBatchSamplerDataCollatorForSeq2Seq(DataCollatorForKD):
    method __call__ (line 204) | def __call__(self, features, return_tensors=None):

FILE: src/axolotl/integrations/kd/collator_online_teacher.py
  function hmac_sha_from_int_list (line 21) | def hmac_sha_from_int_list(int_list, key, hash_func=hashlib.sha256):
  class OnlineTeacherCollator (line 46) | class OnlineTeacherCollator(KDBatchSamplerDataCollatorForSeq2Seq):
    method __init__ (line 53) | def __init__(
    method _normalize_logprobs (line 85) | def _normalize_logprobs(self, raw_logprobs: List[float]) -> List[float]:
    method fetch_online_logprobs_sglang (line 98) | def fetch_online_logprobs_sglang(
    method fetch_online_logprobs_vllm (line 267) | def fetch_online_logprobs_vllm(
    method __call__ (line 467) | def __call__(

FILE: src/axolotl/integrations/kd/kernels/liger.py
  class LigerFusedLinearKLTopKLogprobFunction (line 14) | class LigerFusedLinearKLTopKLogprobFunction(LigerFusedLinearDistillation...
    method distillation_loss_fn (line 20) | def distillation_loss_fn(
    method _compute_loss_kl_topk (line 120) | def _compute_loss_kl_topk(
    method forward (line 180) | def forward(
    method backward (line 352) | def backward(ctx, grad_output):
  class LigerFusedLinearKLTopKLogprobLoss (line 415) | class LigerFusedLinearKLTopKLogprobLoss(torch.nn.Module):
    method __init__ (line 420) | def __init__(
    method forward (line 458) | def forward(

FILE: src/axolotl/integrations/kd/kernels/models.py
  class TransformersKwargs (line 15) | class TransformersKwargs(FlashAttentionKwargs, LossKwargs):
  function kldiv_forward_llama_like (line 28) | def kldiv_forward_llama_like(
  function apply_kernel (line 98) | def apply_kernel(model_type):

FILE: src/axolotl/integrations/kd/topk_logprob/forward_kl.py
  function loss (line 24) | def loss(
  class ChunkedTopKKDLoss (line 100) | class ChunkedTopKKDLoss(nn.Module):
    method __init__ (line 108) | def __init__(self, num_output_chunks: int = 8, kd_temperature: float =...
    method forward (line 113) | def forward(

FILE: src/axolotl/integrations/kd/trainer.py
  class AxolotlKDTrainer (line 26) | class AxolotlKDTrainer(AxolotlTrainer):
    method __init__ (line 31) | def __init__(self, *args, **kwargs):
    method _set_signature_columns_if_needed (line 52) | def _set_signature_columns_if_needed(self):
    method compute_loss (line 66) | def compute_loss(

FILE: src/axolotl/integrations/kd/utils.py
  function normalize_logprobs (line 11) | def normalize_logprobs(logprobs: FloatTensor, topk: int) -> FloatTensor:
  function strided_chunk_views (line 46) | def strided_chunk_views(
  function chunk_overlap (line 94) | def chunk_overlap(input_tensor: Tensor, chunks: int, dim: int = 0, overl...

FILE: src/axolotl/integrations/kernels/args.py
  class KernelsArgs (line 8) | class KernelsArgs(BaseModel):
    method check_mutually_exclusive (line 14) | def check_mutually_exclusive(cls, data):
    method check_use_kernels (line 24) | def check_use_kernels(cls, data):
    method check_experts_implementation (line 35) | def check_experts_implementation(cls, data):
    method disable_mlp_kernel (line 50) | def disable_mlp_kernel(cls, data):

FILE: src/axolotl/integrations/kernels/autotune_callback.py
  function _get_gpu_info (line 19) | def _get_gpu_info() -> dict:
  function _get_smem_capacity (line 35) | def _get_smem_capacity() -> dict:
  class AutotuneReportCallback (line 53) | class AutotuneReportCallback(TrainerCallback):
    method __init__ (line 67) | def __init__(self):
    method on_step_end (line 71) | def on_step_end(

FILE: src/axolotl/integrations/kernels/autotune_collector.py
  function _parse_key_tuple (line 28) | def _parse_key_tuple(key_tuple: tuple) -> dict[str, Any]:
  function _find_lora_ops_module (line 44) | def _find_lora_ops_module() -> ModuleType | None:
  function collect_autotune_configs (line 68) | def collect_autotune_configs() -> list[dict[str, Any]]:

FILE: src/axolotl/integrations/kernels/constants.py
  function resolve_moe_block_classes (line 47) | def resolve_moe_block_classes(model_type: str):

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/kernels/lora_ops.py
  function _next_power_of_2 (line 46) | def _next_power_of_2(n: int) -> int:
  function _block_r_for_rank (line 61) | def _block_r_for_rank(r: int) -> int:
  function round_expert_counts (line 71) | def round_expert_counts(
  function _get_smem_capacity (line 169) | def _get_smem_capacity() -> int:
  function _estimate_smem_usage (line 180) | def _estimate_smem_usage(
  function _estimate_register_pressure (line 198) | def _estimate_register_pressure(
  function _compute_expert_block_lora (line 234) | def _compute_expert_block_lora(
  function _scatter2scatter_lora_configs (line 356) | def _scatter2scatter_lora_configs():
  function _prune_fwd_configs (line 389) | def _prune_fwd_configs(configs, named_args, **kwargs):
  function _scatter2scatter_lora (line 463) | def _scatter2scatter_lora(
  function _scatter2scatter_lora_split (line 590) | def _scatter2scatter_lora_split(
  function scatter2scatter_lora (line 673) | def scatter2scatter_lora(
  function _compute_expert_block_lora_dX (line 807) | def _compute_expert_block_lora_dX(
  function _scatter2scatter_lora_dX_configs (line 936) | def _scatter2scatter_lora_dX_configs():
  function _prune_dX_configs (line 969) | def _prune_dX_configs(configs, named_args, **kwargs):
  function _scatter2scatter_lora_dX (line 1043) | def _scatter2scatter_lora_dX(
  function scatter2scatter_lora_dX (line 1166) | def scatter2scatter_lora_dX(
  function _group_bwd_lora_configs (line 1274) | def _group_bwd_lora_configs():
  function _prune_bwd_lora_configs (line 1308) | def _prune_bwd_lora_configs(configs, named_args, **kwargs):
  function _group_bwd_lora (line 1377) | def _group_bwd_lora(
  function _group_bwd_split_configs (line 1546) | def _group_bwd_split_configs():
  function _prune_split_configs (line 1565) | def _prune_split_configs(configs, named_args, **kwargs):
  function _group_bwd_lora_split (line 1617) | def _group_bwd_lora_split(
  function group_bwd_lora (line 1819) | def group_bwd_lora(
  function _group_bwd_lora_fused (line 1939) | def _group_bwd_lora_fused(
  function group_bwd_lora_fused (line 2145) | def group_bwd_lora_fused(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/kernels/ops.py
  function _compute_expert_block (line 18) | def _compute_expert_block(
  function _scatter2scatter_configs (line 62) | def _scatter2scatter_configs():
  function _scatter2scatter (line 79) | def _scatter2scatter(
  function scatter2scatter (line 169) | def scatter2scatter(
  function scatter2scatter_compileable (line 206) | def scatter2scatter_compileable(
  function _config_XtY (line 264) | def _config_XtY():
  function group_bwd_W (line 272) | def group_bwd_W(DY, X, expert_offsets, E, has_bias=False):
  function groupXtY_compileable (line 284) | def groupXtY_compileable(
  function _groupXtY (line 346) | def _groupXtY(
  function _xty_and_bias (line 469) | def _xty_and_bias(
  function _config_grouping (line 537) | def _config_grouping():
  function group (line 545) | def group(A, sorted_expert_idxs, coeff=None, fan_out=1, out=None):
  function group_compileable (line 558) | def group_compileable(
  function _group (line 595) | def _group(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/kernels/single.py
  function _single2scatter (line 13) | def _single2scatter(
  function single2scatter (line 70) | def single2scatter(X, W, expert_idxs):

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/layers.py
  function peft_lora_B_to_scattermoe (line 46) | def peft_lora_B_to_scattermoe(peft_B, num_experts, rank):
  function peft_lora_to_scattermoe (line 62) | def peft_lora_to_scattermoe(peft_A, peft_B, num_experts, rank):
  function peft_down_proj_lora_to_scattermoe (line 109) | def peft_down_proj_lora_to_scattermoe(peft_A, peft_B, num_experts, rank):
  function _unwrap_gate_lora (line 119) | def _unwrap_gate_lora(gate_module):
  function _convert_smoe_lora (line 155) | def _convert_smoe_lora(lora_A, lora_B, num_experts, rank, scaling):
  function _unwrap_experts_lora (line 161) | def _unwrap_experts_lora(experts_module):
  function _softmax_topk_route (line 228) | def _softmax_topk_route(
  function _sigmoid_topk_route (line 251) | def _sigmoid_topk_route(
  function _route (line 319) | def _route(moe_block, base_gate, hidden_states, gate_weight, gate_lora_d...
  function _compute_shared_expert (line 343) | def _compute_shared_expert(moe_block, hidden_states_flat):
  class ScatterMoEGatedMLP (line 380) | class ScatterMoEGatedMLP(nn.Module):
    method forward (line 381) | def forward(self, layer_input):
  class HFScatterMoEGatedMLP (line 434) | class HFScatterMoEGatedMLP(nn.Module):
    method forward (line 451) | def forward(self: nn.Module, layer_input: torch.Tensor):

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/lora_ops.py
  class ParallelExperts (line 20) | class ParallelExperts(nn.Module):
    method __init__ (line 29) | def __init__(
    method reset_parameters (line 50) | def reset_parameters(self) -> None:
    method extra_repr (line 55) | def extra_repr(self) -> str:
    method set_lora (line 62) | def set_lora(self, lora_A: torch.Tensor, lora_B: torch.Tensor, scaling...
    method clear_lora (line 68) | def clear_lora(self):
    method forward (line 74) | def forward(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/parallel_experts.py
  function compileable_bincount (line 16) | def compileable_bincount(x: torch.Tensor, minlength: int) -> torch.Tensor:
  function _ (line 21) | def _(x: torch.Tensor, minlength: int) -> torch.Tensor:
  function flatten_sort_count (line 26) | def flatten_sort_count(expert_idxs: torch.Tensor, num_experts: int):
  class ParallelLinear (line 37) | class ParallelLinear(torch.autograd.Function):
    method forward (line 39) | def forward(
    method backward (line 87) | def backward(ctx, grad_out: torch.Tensor):
  function parallel_linear (line 178) | def parallel_linear(
  class ParallelExperts (line 205) | class ParallelExperts(nn.Module):
    method __init__ (line 206) | def __init__(self, num_experts, input_size, output_size, bias=False) -...
    method extra_repr (line 220) | def extra_repr(self):
    method reset_parameters (line 225) | def reset_parameters(self) -> None:
    method forward (line 230) | def forward(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/parallel_linear_lora.py
  class ScatterMoELoRA (line 39) | class ScatterMoELoRA(torch.autograd.Function):
    method forward (line 50) | def forward(
    method backward (line 121) | def backward(ctx, grad_out: torch.Tensor):
  function _compute_lora_input_grad (line 342) | def _compute_lora_input_grad(
  function get_lora_params_from_wrapper (line 390) | def get_lora_params_from_wrapper(module) -> tuple:
  function parallel_linear_lora (line 425) | def parallel_linear_lora(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/selective_dequant.py
  function get_active_experts (line 28) | def get_active_experts(sorted_expert_idxs: torch.Tensor, E: int) -> torc...
  function remap_expert_indices (line 41) | def remap_expert_indices(
  function _selective_dequant_bnb4 (line 76) | def _selective_dequant_bnb4(
  function _selective_index_dense (line 178) | def _selective_index_dense(
  function selective_expert_weights (line 189) | def selective_expert_weights(
  function selective_lora_weights (line 255) | def selective_lora_weights(

FILE: src/axolotl/integrations/kernels/libs/scattermoe_lora/selective_dequant_kernel.py
  function _selective_dequant_nf4_kernel (line 45) | def _selective_dequant_nf4_kernel(
  function selective_dequant_nf4_triton (line 118) | def selective_dequant_nf4_triton(

FILE: src/axolotl/integrations/kernels/plugin.py
  function _check_sonicmoe_gpu_compat (line 13) | def _check_sonicmoe_gpu_compat():
  class KernelsPlugin (line 59) | class KernelsPlugin(BasePlugin):
    method get_input_args (line 60) | def get_input_args(self):
    method pre_model_load (line 63) | def pre_model_load(self, cfg):
    method _register_kernels (line 95) | def _register_kernels(self):
    method add_callbacks_pre_trainer (line 122) | def add_callbacks_pre_trainer(self, cfg, model):
    method _kernelize_model (line 132) | def _kernelize_model(self, model_type: str):

FILE: src/axolotl/integrations/kernels/sonicmoe/patch.py
  function patch_sonicmoe (line 34) | def patch_sonicmoe(model_type: str, torch_compile: bool = False):
  function _try_compile_routing (line 55) | def _try_compile_routing(routing_fn):
  function _patch_forward (line 69) | def _patch_forward(moe_cls, routing_fn, activation, router_attr):
  function _make_general_forward (line 99) | def _make_general_forward(moe_cls, routing_fn, activation):
  function _make_fused_forward (line 152) | def _make_fused_forward(moe_cls, activation, router_attr):
  function _compute_shared_expert (line 199) | def _compute_shared_expert(moe_block, hidden_states_flat):

FILE: src/axolotl/integrations/kernels/sonicmoe/routing.py
  function get_model_moe_config (line 18) | def get_model_moe_config(model_type: str):
  function softmax_topk_routing (line 81) | def softmax_topk_routing(
  function softmax_group_topk_routing (line 132) | def softmax_group_topk_routing(
  function sigmoid_topk_routing (line 188) | def sigmoid_topk_routing(

FILE: src/axolotl/integrations/kernels/sonicmoe/weight_converter.py
  function interleave_gate_up (line 23) | def interleave_gate_up(tensor: torch.Tensor) -> torch.Tensor:
  function deinterleave_gate_up (line 28) | def deinterleave_gate_up(tensor: torch.Tensor) -> torch.Tensor:
  class ConcatenatedToInterleaved (line 33) | class ConcatenatedToInterleaved(ConversionOps):
    method __init__ (line 42) | def __init__(self, dim: int = 1):
    method convert (line 46) | def convert(
    method _get_target_pattern (line 63) | def _get_target_pattern(
    method reverse_op (line 79) | def reverse_op(self) -> ConversionOps:
  class InterleavedToConcatenated (line 83) | class InterleavedToConcatenated(ConversionOps):
    method __init__ (line 92) | def __init__(self, dim: int = 1):
    method convert (line 96) | def convert(
    method _get_target_pattern (line 113) | def _get_target_pattern(
    method reverse_op (line 128) | def reverse_op(self) -> ConversionOps:
  function register_sonicmoe_weight_converter (line 132) | def register_sonicmoe_weight_converter(model_type: str):

FILE: src/axolotl/integrations/liger/args.py
  class LigerArgs (line 26) | class LigerArgs(BaseModel):
    method check_deprecated_swiglu (line 50) | def check_deprecated_swiglu(cls, data):
    method check_tiled_mlp_conflict (line 66) | def check_tiled_mlp_conflict(cls, data):
    method check_liger_rms_norm_tensor_parallel (line 79) | def check_liger_rms_norm_tensor_parallel(cls, data):
    method check_liger_use_token_scaling_flce (line 89) | def check_liger_use_token_scaling_flce(cls, data):
    method check_tensor_parallel_size_liger_fused_linear_cross_entropy (line 100) | def check_tensor_parallel_size_liger_fused_linear_cross_entropy(self):

FILE: src/axolotl/integrations/liger/models/base.py
  function lce_forward (line 18) | def lce_forward(
  function lce_maybe_trainable_lm_head (line 121) | def lce_maybe_trainable_lm_head(
  function _liger_for_causal_lm_loss (line 159) | def _liger_for_causal_lm_loss(
  function patch_lce_forward (line 172) | def patch_lce_forward(

FILE: src/axolotl/integrations/liger/models/deepseekv2.py
  function lce_forward (line 15) | def lce_forward(

FILE: src/axolotl/integrations/liger/models/jamba.py
  function lce_forward (line 19) | def lce_forward(

FILE: src/axolotl/integrations/liger/models/llama4.py
  function lce_forward (line 14) | def lce_forward(
  function apply_liger_kernel_to_llama4 (line 122) | def apply_liger_kernel_to_llama4(

FILE: src/axolotl/integrations/liger/models/qwen3.py
  function lce_forward (line 14) | def lce_forward(
  function apply_liger_kernel_to_qwen3 (line 109) | def apply_liger_kernel_to_qwen3(

FILE: src/axolotl/integrations/liger/models/qwen3_moe.py
  function lce_forward (line 15) | def lce_forward(
  function apply_liger_kernel_to_qwen3_moe (line 131) | def apply_liger_kernel_to_qwen3_moe(

FILE: src/axolotl/integrations/liger/plugin.py
  class LigerPlugin (line 14) | class LigerPlugin(BasePlugin):
    method get_input_args (line 19) | def get_input_args(self):
    method pre_model_load (line 22) | def pre_model_load(self, cfg):

FILE: src/axolotl/integrations/liger/utils.py
  function patch_with_compile_disable (line 10) | def patch_with_compile_disable(module, function_name):

FILE: src/axolotl/integrations/llm_compressor/args.py
  class CompressionArgs (line 11) | class CompressionArgs(BaseModel):
  class LLMCompressorArgs (line 32) | class LLMCompressorArgs(BaseModel):

FILE: src/axolotl/integrations/llm_compressor/plugin.py
  class LLMCompressorCallbackHandler (line 26) | class LLMCompressorCallbackHandler(TrainerCallback):
    method __init__ (line 33) | def __init__(self, trainer: Trainer, recipe: Any):
    method on_train_begin (line 50) | def on_train_begin(
    method on_step_begin (line 75) | def on_step_begin(
    method on_step_end (line 88) | def on_step_end(
    method on_train_end (line 103) | def on_train_end(
  class LLMCompressorPlugin (line 118) | class LLMCompressorPlugin(BasePlugin):
    method get_input_args (line 123) | def get_input_args(self) -> str:
    method add_callbacks_post_trainer (line 132) | def add_callbacks_post_trainer(self, cfg: Any, trainer: Trainer) -> list:
  function compute_loss_wrapper (line 151) | def compute_loss_wrapper(

FILE: src/axolotl/integrations/llm_compressor/utils.py
  function save_compressed_model (line 11) | def save_compressed_model(

FILE: src/axolotl/integrations/lm_eval/__init__.py
  class LMEvalPlugin (line 13) | class LMEvalPlugin(BasePlugin):
    method get_input_args (line 18) | def get_input_args(self):
    method post_train_unload (line 21) | def post_train_unload(self, cfg):

FILE: src/axolotl/integrations/lm_eval/args.py
  class LMEvalArgs (line 10) | class LMEvalArgs(BaseModel):

FILE: src/axolotl/integrations/lm_eval/cli.py
  function get_model_path (line 16) | def get_model_path(cfg: DictDefault) -> str | None:
  function build_lm_eval_command (line 31) | def build_lm_eval_command(
  function lm_eval (line 104) | def lm_eval(config: str, cloud: Optional[str] = None):

FILE: src/axolotl/integrations/spectrum/__init__.py
  function _generate_unfrozen_params_yaml (line 31) | def _generate_unfrozen_params_yaml(snr_data, top_fraction=0.5):
  class SpectrumPlugin (line 55) | class SpectrumPlugin(BasePlugin):
    method get_input_args (line 64) | def get_input_args(self):
    method pre_model_load (line 67) | def pre_model_load(self, cfg):

FILE: src/axolotl/integrations/spectrum/args.py
  class SpectrumArgs (line 24) | class SpectrumArgs(BaseModel):
    method check_fsdp_use_orig_params (line 34) | def check_fsdp_use_orig_params(cls, data):

FILE: src/axolotl/integrations/swanlab/args.py
  class SwanLabConfig (line 6) | class SwanLabConfig(BaseModel):
    method validate_swanlab_mode (line 96) | def validate_swanlab_mode(cls, v):
    method validate_swanlab_project (line 116) | def validate_swanlab_project(cls, v):
    method validate_swanlab_enabled_requires_project (line 128) | def validate_swanlab_enabled_requires_project(self):

FILE: src/axolotl/integrations/swanlab/callbacks.py
  class SwanLabRLHFCompletionCallback (line 20) | class SwanLabRLHFCompletionCallback(TrainerCallback):
    method __init__ (line 41) | def __init__(
    method on_init_end (line 61) | def on_init_end(
    method on_log (line 88) | def on_log(
    method on_train_end (line 166) | def on_train_end(

FILE: src/axolotl/integrations/swanlab/completion_logger.py
  class CompletionLogger (line 16) | class CompletionLogger:
    method __init__ (line 39) | def __init__(self, maxlen: int = 128):
    method add_dpo_completion (line 50) | def add_dpo_completion(
    method add_kto_completion (line 78) | def add_kto_completion(
    method add_orpo_completion (line 106) | def add_orpo_completion(
    method add_grpo_completion (line 134) | def add_grpo_completion(
    method log_to_swanlab (line 163) | def log_to_swanlab(self, table_name: str = "completions") -> bool:
    method clear (line 215) | def clear(self) -> None:
    method __len__ (line 219) | def __len__(self) -> int:
    method __repr__ (line 223) | def __repr__(self) -> str:

FILE: src/axolotl/integrations/swanlab/plugins.py
  class SwanLabPlugin (line 18) | class SwanLabPlugin(BasePlugin):
    method __init__ (line 35) | def __init__(self):
    method get_input_args (line 40) | def get_input_args(self) -> str:
    method register (line 44) | def register(self, cfg: dict):
    method pre_model_load (line 147) | def pre_model_load(self, cfg: DictDefault):
    method add_callbacks_pre_trainer (line 225) | def add_callbacks_pre_trainer(self, cfg: DictDefault, model):
    method post_trainer_create (line 261) | def post_trainer_create(self, cfg: DictDefault, trainer):
    method _get_swanlab_init_kwargs (line 298) | def _get_swanlab_init_kwargs(self, cfg: DictDefault) -> dict:
    method _prepare_config_for_logging (line 350) | def _prepare_config_for_logging(self, cfg: DictDefault) -> dict:
    method _register_lark_callback (line 435) | def _register_lark_callback(self, cfg: DictDefault):
    method _register_completion_callback (line 491) | def _register_completion_callback(self, cfg: DictDefault, trainer):

FILE: src/axolotl/integrations/swanlab/profiling.py
  function swanlab_profiling_context (line 18) | def swanlab_profiling_context(trainer: Any, func_name: str):
  function swanlab_profile (line 61) | def swanlab_profile(func: Callable) -> Callable:
  class ProfilingConfig (line 88) | class ProfilingConfig:
    method __init__ (line 99) | def __init__(
    method should_log (line 117) | def should_log(self, func_name: str, duration_seconds: float) -> bool:
  function swanlab_profiling_context_advanced (line 152) | def swanlab_profiling_context_advanced(

FILE: src/axolotl/kernels/geglu.py
  function _geglu_fwd_kernel (line 14) | def _geglu_fwd_kernel(
  function geglu_forward (line 45) | def geglu_forward(gate: torch.Tensor, up: torch.Tensor) -> torch.Tensor:
  function _geglu_bwd_kernel (line 71) | def _geglu_bwd_kernel(
  function geglu_backward (line 125) | def geglu_backward(

FILE: src/axolotl/kernels/lora.py
  function get_lora_parameters (line 23) | def get_lora_parameters(
  function matmul_lora (line 84) | def matmul_lora(
  class LoRA_MLP (line 132) | class LoRA_MLP(torch.autograd.Function):
    method forward (line 137) | def forward(
    method backward (line 219) | def backward(
  function apply_lora_mlp_swiglu (line 393) | def apply_lora_mlp_swiglu(self, X: torch.Tensor, inplace: bool = True) -...
  function apply_lora_mlp_geglu (line 436) | def apply_lora_mlp_geglu(self, X: torch.Tensor, inplace: bool = True) ->...
  class LoRA_QKV (line 478) | class LoRA_QKV(torch.autograd.Function):
    method forward (line 488) | def forward(
    method backward (line 555) | def backward(
  function apply_lora_qkv (line 698) | def apply_lora_qkv(
  class LoRA_O (line 740) | class LoRA_O(torch.autograd.Function):
    method forward (line 745) | def forward(
    method backward (line 783) | def backward(
  function apply_lora_o (line 831) | def apply_lora_o(self, X: torch.Tensor) -> torch.Tensor:

FILE: src/axolotl/kernels/quantize.py
  function dequantize_fp8 (line 18) | def dequantize_fp8(
  function dequantize (line 59) | def dequantize(

FILE: src/axolotl/kernels/swiglu.py
  function _swiglu_fwd_kernel (line 15) | def _swiglu_fwd_kernel(
  function _swiglu_bwd_kernel (line 50) | def _swiglu_bwd_kernel(
  function swiglu_forward (line 102) | def swiglu_forward(gate: torch.Tensor, up: torch.Tensor) -> torch.Tensor:
  function swiglu_backward (line 130) | def swiglu_backward(

FILE: src/axolotl/loaders/adapter.py
  function setup_quantized_meta_for_peft (line 30) | def setup_quantized_meta_for_peft(model: torch.nn.Module):
  function setup_quantized_peft_meta_for_training (line 42) | def setup_quantized_peft_meta_for_training(model: torch.nn.Module):
  function find_all_linear_names (line 50) | def find_all_linear_names(model):
  function load_lora (line 70) | def load_lora(
  function load_adapter (line 194) | def load_adapter(
  function load_llama_adapter (line 214) | def load_llama_adapter(

FILE: src/axolotl/loaders/model.py
  class ModelLoader (line 67) | class ModelLoader:
    method __init__ (line 98) | def __init__(
    method has_flash_attn (line 147) | def has_flash_attn(self) -> bool:
    method is_fsdp_enabled (line 152) | def is_fsdp_enabled(self):
    method is_qlora_and_fsdp_enabled (line 157) | def is_qlora_and_fsdp_enabled(self):
    method load (line 162) | def load(self) -> tuple[PreTrainedModel | PeftModelForCausalLM, PeftCo...
    method _apply_pre_model_load_setup (line 196) | def _apply_pre_model_load_setup(self):
    method _apply_post_model_load_setup (line 224) | def _apply_post_model_load_setup(self):
    method _configure_experts_implementation (line 241) | def _configure_experts_implementation(self):
    method _apply_activation_checkpointing (line 245) | def _apply_activation_checkpointing(self):
    method _resize_token_embeddings (line 254) | def _resize_token_embeddings(self):
    method _adjust_model_config (line 277) | def _adjust_model_config(self):
    method _configure_embedding_dtypes (line 306) | def _configure_embedding_dtypes(self):
    method _configure_qat (line 368) | def _configure_qat(self):
    method _load_adapters (line 381) | def _load_adapters(self) -> PeftConfig | None:
    method _apply_post_lora_load_setup (line 404) | def _apply_post_lora_load_setup(self, skip_move_to_device: bool):
    method _set_parallel_config (line 437) | def _set_parallel_config(self):
    method _set_auto_model_loader (line 444) | def _set_auto_model_loader(self):
    method _set_device_map_config (line 456) | def _set_device_map_config(self):
    method _set_quantization_config (line 539) | def _set_quantization_config(self):
    method _set_attention_config (line 623) | def _set_attention_config(self):
    method _check_model_requirements (line 650) | def _check_model_requirements(self):
    method _configure_zero3_memory_efficient_loading (line 660) | def _configure_zero3_memory_efficient_loading(
    method _load_model_from_config (line 699) | def _load_model_from_config(self, model_loader_class=None) -> PreTrain...
    method _load_model_from_pretrained (line 716) | def _load_model_from_pretrained(self, model_loader_class=None) -> PreT...
    method _build_model (line 726) | def _build_model(self) -> bool:
    method _set_z3_leaf_modules (line 845) | def _set_z3_leaf_modules(self):
    method _prepare_model_for_quantization (line 860) | def _prepare_model_for_quantization(self):
    method _convert_embedding_modules_dtype (line 897) | def _convert_embedding_modules_dtype(

FILE: src/axolotl/loaders/patch_manager.py
  class PatchManager (line 27) | class PatchManager:
    method apply_pre_config_load_patches (line 31) | def apply_pre_config_load_patches(cfg: DictDefault):
    method apply_pre_tokenizer_load_patches (line 52) | def apply_pre_tokenizer_load_patches(cfg: DictDefault):
    method __init__ (line 72) | def __init__(
    method has_flash_attn (line 90) | def has_flash_attn(self) -> bool:
    method apply_pre_model_load_patches (line 94) | def apply_pre_model_load_patches(self):
    method apply_post_plugin_pre_model_load_patches (line 122) | def apply_post_plugin_pre_model_load_patches(self):
    method _apply_transformers_patches (line 127) | def _apply_transformers_patches(self):
    method apply_post_model_build_patches (line 143) | def apply_post_model_build_patches(self, model: PreTrainedModel):
    method apply_post_model_load_patches (line 147) | def apply_post_model_load_patches(self, model: PreTrainedModel):
    method _apply_flash_attention_patches (line 154) | def _apply_flash_attention_patches(self):
    method _apply_chunked_cross_entropy_patch (line 162) | def _apply_chunked_cross_entropy_patch(self):
    method _apply_fsdp_patches (line 171) | def _apply_fsdp_patches(self):
    method _apply_adapter_patches (line 210) | def _apply_adapter_patches(self):
    method _apply_flex_attention_patches (line 217) | def _apply_flex_attention_patches(self):
    method _apply_sageattn_patches (line 227) | def _apply_sageattn_patches(self):
    method _apply_flash_attn_4_patches (line 234) | def _apply_flash_attn_4_patches(self):
    method _apply_model_specific_patches (line 243) | def _apply_model_specific_patches(self):
    method _apply_fp8_patches (line 294) | def _apply_fp8_patches(self):
    method _apply_flash_attention_peft_patches (line 305) | def _apply_flash_attention_peft_patches(self):
    method _apply_gradient_checkpointing_patches (line 314) | def _apply_gradient_checkpointing_patches(self):
    method _apply_mistral_cross_entropy_patch (line 337) | def _apply_mistral_cross_entropy_patch(self):
    method _apply_self_attention_lora_patch (line 349) | def _apply_self_attention_lora_patch(self):
    method _apply_multipack_patches (line 367) | def _apply_multipack_patches(self):
    method _apply_fsdp2_bnb_patches (line 405) | def _apply_fsdp2_bnb_patches(self):
    method _deactivate_hf_async_load (line 425) | def _deactivate_hf_async_load(self):
    method _apply_moe_expert_quantization_patch (line 430) | def _apply_moe_expert_quantization_patch(self):
    method _finalize_moe_expert_quantization (line 448) | def _finalize_moe_expert_quantization(self, model: PreTrainedModel):
    method _apply_tiled_mlp (line 469) | def _apply_tiled_mlp(self, model_type: str):
    method _apply_voxtral_patches (line 481) | def _apply_voxtral_patches(self):
    method _patch_attention (line 490) | def _patch_attention(self):
    method _patch_loss_llama (line 516) | def _patch_loss_llama(self):
    method _patch_llama_flash_attention (line 546) | def _patch_llama_flash_attention(self):
    method _patch_llama_xformers_attention (line 565) | def _patch_llama_xformers_attention(self):
    method _patch_llama_derived_model (line 574) | def _patch_llama_derived_model(self):
    method _apply_llama_flash_attn_patches (line 590) | def _apply_llama_flash_attn_patches(self, model):
    method _apply_unsloth_patches (line 610) | def _apply_unsloth_patches(self, model):
    method _apply_lora_kernel_patch (line 627) | def _apply_lora_kernel_patch(self, model):
    method _apply_patch_deepspeed_zero3 (line 638) | def _apply_patch_deepspeed_zero3(self):
    method _apply_apertus_patches (line 652) | def _apply_apertus_patches(self):
    method _apply_trl_vllm_patches (line 661) | def _apply_trl_vllm_patches(self):
    method _apply_trl_trainer_utils_patches (line 672) | def _apply_trl_trainer_utils_patches(self):
    method _apply_scaling_softmax_patch (line 705) | def _apply_scaling_softmax_patch(self, model: PreTrainedModel):

FILE: src/axolotl/loaders/processor.py
  function load_processor (line 17) | def load_processor(cfg: DictDefault, tokenizer: PreTrainedTokenizerBase):

FILE: src/axolotl/loaders/tokenizer.py
  function modify_tokenizer_files (line 30) | def modify_tokenizer_files(
  function load_tokenizer (line 130) | def load_tokenizer(cfg: DictDefault) -> PreTrainedTokenizer:

FILE: src/axolotl/loaders/utils.py
  function get_module_class_from_name (line 17) | def get_module_class_from_name(
  function check_model_config (line 45) | def check_model_config(cfg: DictDefault, model_config: PretrainedConfig):
  function load_model_config (line 151) | def load_model_config(cfg: DictDefault) -> PretrainedConfig | addict.Dict:
  function ensure_dtype (line 210) | def ensure_dtype(model: PreTrainedModel, dtype: torch.dtype = torch.bflo...
  function get_linear_embedding_layers (line 231) | def get_linear_embedding_layers(model_type: str) -> list[str]:

FILE: src/axolotl/logging_config.py
  class AxolotlOrWarnErrorFilter (line 15) | class AxolotlOrWarnErrorFilter(logging.Filter):
    method __init__ (line 22) | def __init__(self, **kwargs):
    method filter (line 40) | def filter(self, record: LogRecord) -> bool:
  class AxolotlLogger (line 51) | class AxolotlLogger(Logger):
    method __init__ (line 54) | def __init__(self, name: str, level: int = logging.NOTSET):
  class ColorfulFormatter (line 60) | class ColorfulFormatter(Formatter):
    method format (line 71) | def format(self, record):
  function configure_logging (line 142) | def configure_logging():

FILE: src/axolotl/models/mamba/__init__.py
  function check_mamba_ssm_installed (line 8) | def check_mamba_ssm_installed():
  function fix_mamba_attn_for_loss (line 16) | def fix_mamba_attn_for_loss():

FILE: src/axolotl/models/mamba/configuration_mamba.py
  class MambaConfig (line 8) | class MambaConfig(PretrainedConfig):
    method __init__ (line 15) | def __init__(

FILE: src/axolotl/models/mamba/modeling_mamba.py
  class MambaLMHeadModel (line 16) | class MambaLMHeadModel(nn.Module, GenerationMixin):
    method __init__ (line 17) | def __init__(
    method tie_weights (line 60) | def tie_weights(self):
    method allocate_inference_cache (line 63) | def allocate_inference_cache(self, batch_size, max_seqlen, dtype=None,...
    method forward (line 68) | def forward(
    method save_pretrained (line 110) | def save_pretrained(
    method from_pretrained (line 121) | def from_pretrained(cls, pretrained_model_name, device=None, dtype=Non...

FILE: src/axolotl/monkeypatch/accelerate/fsdp2.py
  function fsdp2_load_full_state_dict (line 20) | def fsdp2_load_full_state_dict(
  function get_state_dict (line 93) | def get_state_dict(self, model, unwrap=True):
  function patch_peft_param_wrapper_for_fsdp2 (line 189) | def patch_peft_param_wrapper_for_fsdp2():
  function _process_lora_module_for_fsdp (line 228) | def _process_lora_module_for_fsdp(module, fsdp2_kwargs):
  function fsdp2_prepare_model (line 272) | def fsdp2_prepare_model(accelerator, model: torch.nn.Module) -> torch.nn...
  function patch_tied_keys_for_meta_device (line 445) | def patch_tied_keys_for_meta_device():
  function patch_initialize_missing_keys_for_fsdp (line 482) | def patch_initialize_missing_keys_for_fsdp():
  function patch_accelerate_fsdp2 (line 522) | def patch_accelerate_fsdp2():

FILE: src/axolotl/monkeypatch/accelerate/parallelism_config.py
  function _validate_accelerator (line 11) | def _validate_accelerator(self, accelerator):
  function patched_is_fsdp2 (line 61) | def patched_is_fsdp2(self) -> bool:
  function patch_parallelism_config (line 73) | def patch_parallelism_config():
  function patch_prepare_cp (line 80) | def patch_prepare_cp():

FILE: src/axolotl/monkeypatch/attention/__init__.py
  function patch_xformers_attn_over_fa2 (line 8) | def patch_xformers_attn_over_fa2():
  function unpatch_xformers_attn_over_fa2 (line 16) | def unpatch_xformers_attn_over_fa2():

FILE: src/axolotl/monkeypatch/attention/flash_attn_4.py
  function _get_head_dims (line 10) | def _get_head_dims(model_config):
  function patch_flash_attn_4 (line 36) | def patch_flash_attn_4(model_config=None):

FILE: src/axolotl/monkeypatch/attention/flex_attn.py
  function patch_flex_wrapper (line 15) | def patch_flex_wrapper(**flex_attn_compile_kwargs):

FILE: src/axolotl/monkeypatch/attention/sage_attn.py
  function _is_sageattn_available (line 18) | def _is_sageattn_available():
  function _check_sageattn_imported (line 33) | def _check_sageattn_imported():
  function sage_attention_forward (line 42) | def sage_attention_forward(
  function patch_sageattn (line 193) | def patch_sageattn():

FILE: src/axolotl/monkeypatch/attention/xformers.py
  function xformers_attention_forward (line 19) | def xformers_attention_forward(

FILE: src/axolotl/monkeypatch/btlm_attn_hijack_flash.py
  function replace_btlm_attn_with_flash_attn (line 18) | def replace_btlm_attn_with_flash_attn(model_name="cerebras/btlm-3b-8k-ba...
  function flashattn_attn (line 31) | def flashattn_attn(

FILE: src/axolotl/monkeypatch/data/batch_dataset_fetcher.py
  class _MapDatasetFetcher (line 12) | class _MapDatasetFetcher(_BaseDatasetFetcher):
    method fetch (line 18) | def fetch(self, possibly_batched_index):
  function patch_fetchers (line 45) | def patch_fetchers():
  function patched_worker_loop (line 51) | def patched_worker_loop(*args, **kwargs):
  function apply_multipack_dataloader_patch (line 57) | def apply_multipack_dataloader_patch():
  function remove_multipack_dataloader_patch (line 79) | def remove_multipack_dataloader_patch():

FILE: src/axolotl/monkeypatch/deepspeed_utils.py
  function patch_checkpoint_wrapper_setattr (line 9) | def patch_checkpoint_wrapper_setattr():
  function apply_deepspeed_patches (line 60) | def apply_deepspeed_patches():

FILE: src/axolotl/monkeypatch/fsdp2_qlora.py
  function apply_init_sharded_param_patch (line 19) | def apply_init_sharded_param_patch():
  function apply_init_unsharded_param_patch (line 96) | def apply_init_unsharded_param_patch():
  function apply_linear8bitlt_save_patch (line 172) | def apply_linear8bitlt_save_patch():
  function apply_init_dtype_attrs_patch (line 205) | def apply_init_dtype_attrs_patch():

FILE: src/axolotl/monkeypatch/gradient_checkpointing/__init__.py
  function uses_gc_layers (line 19) | def uses_gc_layers(decoder_layer):
  function uses_gc_layers (line 24) | def uses_gc_layers(_):
  function hf_grad_checkpoint_offload_wrapper (line 28) | def hf_grad_checkpoint_offload_wrapper(decoder_layer, *args, use_reentra...
  function hf_grad_checkpoint_disk_offload_wrapper (line 45) | def hf_grad_checkpoint_disk_offload_wrapper(decoder_layer, *args, use_re...

FILE: src/axolotl/monkeypatch/gradient_checkpointing/offload_cpu.py
  class CPU_Offloaded_Gradient_Checkpointer (line 38) | class CPU_Offloaded_Gradient_Checkpointer(torch.autograd.Function):
    method forward (line 46) | def forward(ctx, forward_function, hidden_states, *args):
    method backward (line 57) | def backward(ctx, dY):

FILE: src/axolotl/monkeypatch/gradient_checkpointing/offload_disk.py
  class DiskOffloadManager (line 43) | class DiskOffloadManager:
    method __init__ (line 49) | def __init__(
    method _save_worker (line 106) | def _save_worker(self):
    method _save_tensor_to_disk (line 129) | def _save_tensor_to_disk(self, tensor: torch.Tensor, file_path: str):
    method _prefetch_worker (line 155) | def _prefetch_worker(self):
    method save_tensor (line 223) | def save_tensor(self, tensor: torch.Tensor):
    method wait_for_save (line 245) | def wait_for_save(self, file_path, timeout=None) -> None:
    method load_tensor (line 267) | def load_tensor(self, file_path, target_device="cuda"):
    method _safe_delete_file (line 308) | def _safe_delete_file(self, file_path):
    method trigger_prefetch (line 335) | def trigger_prefetch(self, n=None):
    method cleanup_tensor (line 356) | def cleanup_tensor(self, file_path: str):
    method cleanup (line 376) | def cleanup(self):
  class Disco (line 425) | class Disco(torch.autograd.Function):
    method get_instance (line 435) | def get_instance(prefetch_size=1, prefetch_to_gpu=True, save_workers=4):
    method forward (line 447) | def forward(
    method backward (line 481) | def backward(ctx, *grad_outputs):

FILE: src/axolotl/monkeypatch/llama_attn_hijack_flash.py
  function is_xformers_available (line 35) | def is_xformers_available() -> bool:
  function is_xformers_swiglu_available (line 39) | def is_xformers_swiglu_available() -> bool:
  function replace_llama_mlp_with_swiglu (line 54) | def replace_llama_mlp_with_swiglu(model):
  function patch_fa_llama_cross_entropy (line 68) | def patch_fa_llama_cross_entropy():
  function patch_llama_rms_norm (line 96) | def patch_llama_rms_norm():
  function replace_llama_attn_with_flash_attn (line 114) | def replace_llama_attn_with_flash_attn(
  function _prepare_decoder_attention_mask (line 136) | def _prepare_decoder_attention_mask(
  function flashattn_forward_with_s2attn (line 150) | def flashattn_forward_with_s2attn(

FILE: src/axolotl/monkeypatch/llama_attn_hijack_xformers.py
  function hijack_llama_attention (line 23) | def hijack_llama_attention():
  function xformers_forward (line 27) | def xformers_forward(

FILE: src/axolotl/monkeypatch/lora_kernels.py
  function original_apply_qkv (line 94) | def original_apply_qkv(
  function original_apply_o (line 115) | def original_apply_o(self: nn.Module, hidden_states: torch.Tensor) -> to...
  function get_attention_cls_from_config (line 131) | def get_attention_cls_from_config(cfg: DictDefault) -> Type[nn.Module]:
  function patch_self_attn_lora (line 201) | def patch_self_attn_lora(cfg: DictDefault):
  function find_self_attn_in_layer (line 265) | def find_self_attn_in_layer(
  function find_mlp_in_layer (line 277) | def find_mlp_in_layer(
  function get_layers (line 309) | def get_layers(model: PeftModelForCausalLM) -> list[nn.Module]:
  function apply_lora_kernel_patches (line 334) | def apply_lora_kernel_patches(
  class FakeMLP (line 476) | class FakeMLP(nn.Module):
    method __init__ (line 485) | def __init__(self, gate_proj, up_proj, down_proj):

FILE: src/axolotl/monkeypatch/loss/chunked.py
  class CEWithChunkedOutputLoss (line 12) | class CEWithChunkedOutputLoss(torch.nn.Module):
    method __init__ (line 19) | def __init__(self, num_output_chunks: int = 8, ignore_index: int = -100):
    method compute_cross_entropy (line 24) | def compute_cross_entropy(
    method forward (line 37) | def forward(
  function _build_chunked_ce_loss_fn (line 74) | def _build_chunked_ce_loss_fn(num_output_chunks: int = 8, ignore_index: ...
  function get_causal_lm_loss (line 82) | def get_causal_lm_loss(num_output_chunks: int = 8, ignore_index: int = -...
  function patch_chunked_ce_loss_fn (line 127) | def patch_chunked_ce_loss_fn(num_output_chunks: int = 8, ignore_index: i...

FILE: src/axolotl/monkeypatch/loss/eaft.py
  function eaft_loss (line 12) | def eaft_loss(outputs, labels, num_items_in_batch=None, alpha=1.0, k=20):

FILE: src/axolotl/monkeypatch/mistral_attn_hijack_flash.py
  function patch_mistral_cross_entropy (line 12) | def patch_mistral_cross_entropy():

FILE: src/axolotl/monkeypatch/mixtral/__init__.py
  function patch_mixtral_moe_forward_zero3 (line 8) | def patch_mixtral_moe_forward_zero3() -> None:

FILE: src/axolotl/monkeypatch/models/apertus/activation.py
  function patch_apertus_xielu_activation (line 6) | def patch_apertus_xielu_activation():

FILE: src/axolotl/monkeypatch/models/kimi_linear/configuration_kimi.py
  class KimiLinearConfig (line 13) | class KimiLinearConfig(PretrainedConfig):
    method __init__ (line 17) | def __init__(
    method is_mla (line 119) | def is_mla(self):
    method is_moe (line 130) | def is_moe(self):
    method is_linear_attn (line 134) | def is_linear_attn(self) -> bool:
    method is_kda_layer (line 144) | def is_kda_layer(self, layer_idx: int):

FILE: src/axolotl/monkeypatch/models/kimi_linear/modeling_kimi.py
  function load_balancing_loss_func (line 57) | def load_balancing_loss_func(
  class KimiDynamicCache (line 83) | class KimiDynamicCache:
    method __init__ (line 91) | def __init__(self, config: KimiLinearConfig):
    method __len__ (line 123) | def __len__(self):
    method update (line 126) | def update(
    method reorder_cache (line 146) | def reorder_cache(self, beam_idx: torch.LongTensor):
    method get_seq_length (line 172) | def get_seq_length(self, layer_idx: Optional[int] = 0) -> int:
    method get_mask_sizes (line 184) | def get_mask_sizes(
    method has_previous_state (line 199) | def has_previous_state(self):
  class KimiRMSNorm (line 206) | class KimiRMSNorm(nn.Module):
    method __init__ (line 207) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 215) | def forward(self, hidden_states):
  class KimiBlockSparseMLP (line 226) | class KimiBlockSparseMLP(nn.Module):
    method __init__ (line 227) | def __init__(
    method forward (line 243) | def forward(self, hidden_states):
  class KimiMLP (line 251) | class KimiMLP(nn.Module):
    method __init__ (line 252) | def __init__(
    method forward (line 266) | def forward(self, x):
  function repeat_kv (line 271) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  function eager_attention_forward (line 285) | def eager_attention_forward(
  class KimiMLAAttention (line 315) | class KimiMLAAttention(nn.Module):
    method __init__ (line 320) | def __init__(self, config: KimiLinearConfig, layer_idx: int):
    method forward (line 372) | def forward(
  class KimiDeltaAttention (line 451) | class KimiDeltaAttention(nn.Module):
    method __init__ (line 452) | def __init__(self, config: KimiLinearConfig, layer_idx: int):
    method forward (line 510) | def forward(
  class KimiMoEGate (line 619) | class KimiMoEGate(nn.Module):
    method __init__ (line 625) | def __init__(self, config: KimiLinearConfig):
    method reset_parameters (line 635) | def reset_parameters(self) -> None:
    method forward (line 640) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class KimiSparseMoeBlock (line 797) | class KimiSparseMoeBlock(nn.Module):
    method __init__ (line 803) | def __init__(self, config: KimiLinearConfig):
    method route_tokens_to_experts (line 830) | def route_tokens_to_experts(
    method forward (line 887) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
    method _training_forward (line 922) | def _training_forward(
    method _inference_forward (line 994) | def _inference_forward(
  class KimiDecoderLayer (line 1037) | class KimiDecoderLayer(nn.Module):
    method __init__ (line 1038) | def __init__(self, config: KimiLinearConfig, layer_idx: int):
    method forward (line 1063) | def forward(
  class KimiPreTrainedModel (line 1127) | class KimiPreTrainedModel(PreTrainedModel):
    method _init_weights (line 1141) | def _init_weights(self, module):
  class KimiLinearModel (line 1153) | class KimiLinearModel(KimiPreTrainedModel):
    method __init__ (line 1154) | def __init__(self, config: KimiLinearConfig):
    method _update_linear_attn_mask (line 1185) | def _update_linear_attn_mask(self, attention_mask, cache_position):
    method forward (line 1199) | def forward(
  class KimiLinearForCausalLM (line 1272) | class KimiLinearForCausalLM(KimiPreTrainedModel, GenerationMixin):
    method __init__ (line 1275) | def __init__(self, config):
    method forward (line 1285) | def forward(

FILE: src/axolotl/monkeypatch/models/kimi_linear/patch_kimi_linear.py
  function get_patch_file_path (line 13) | def get_patch_file_path(package_dot_path: str, filename: str) -> Path:
  function _load_local_module (line 23) | def _load_local_module(module_name: str, filename: str):
  function _patch_get_class_in_module (line 38) | def _patch_get_class_in_module():
  function patch_kimi (line 73) | def patch_kimi():

FILE: src/axolotl/monkeypatch/models/kimi_linear/tokenization_kimi.py
  class TikTokenTokenizer (line 33) | class TikTokenTokenizer(PreTrainedTokenizer):
    method __init__ (line 78) | def __init__(
    method encode (line 167) | def encode(
    method decode (line 232) | def decode(self, token_ids: Union[int, List[int]], **kwargs) -> str:
    method _split_whitespaces_or_nonwhitespaces (line 253) | def _split_whitespaces_or_nonwhitespaces(
    method pre_tokenizer_process (line 278) | def pre_tokenizer_process(self, text: str) -> List[str]:
    method vocab_size (line 288) | def vocab_size(self) -> int:
    method get_vocab (line 291) | def get_vocab(self) -> Dict[str, int]:
    method _tokenize (line 294) | def _tokenize(self, text: str, **kwargs) -> List[str]:
    method _convert_token_to_id (line 297) | def _convert_token_to_id(self, token: str) -> int:
    method _convert_id_to_token (line 300) | def _convert_id_to_token(self, index: int) -> str:
    method clean_up_tokenization (line 304) | def clean_up_tokenization(out_string: str) -> str:
    method convert_tokens_to_string (line 307) | def convert_tokens_to_string(self, tokens: List[str]) -> str:
    method save_vocabulary (line 314) | def save_vocabulary(
    method apply_chat_template (line 334) | def apply_chat_template(
  function deep_sort_dict (line 352) | def deep_sort_dict(obj: Any) -> Any:

FILE: src/axolotl/monkeypatch/models/llama4/modeling.py
  class Llama4TextExperts (line 13) | class Llama4TextExperts(nn.Module):
    method __init__ (line 18) | def __init__(self, config: Llama4Config):
    method forward (line 50) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  function patch_llama4_linearized_modeling (line 90) | def patch_llama4_linearized_modeling():

FILE: src/axolotl/monkeypatch/models/mistral3/mistral_common_tokenizer.py
  function apply_mistral_tokenizer_image_patch (line 14) | def apply_mistral_tokenizer_image_patch():

FILE: src/axolotl/monkeypatch/models/pixtral/modeling_flash_attention_utils.py
  function apply_patch_is_packed_sequence (line 6) | def apply_patch_is_packed_sequence():

FILE: src/axolotl/monkeypatch/models/qwen3_5/modeling.py
  function get_cu_seqlens (line 24) | def get_cu_seqlens(position_ids):
  function _inject_fla_kernels (line 49) | def _inject_fla_kernels(module) -> None:
  function _patched_decoder_forward (line 68) | def _patched_decoder_forward(
  function _make_qwen3_5_gated_delta_forward (line 113) | def _make_qwen3_5_gated_delta_forward(apply_mask_fn):
  function _apply_packing_patches (line 240) | def _apply_packing_patches(model_type: str, cls_prefix: str, forward_fac...
  function patch_qwen3_5_modeling_packing (line 260) | def patch_qwen3_5_modeling_packing():
  function patch_qwen3_5_moe_modeling_packing (line 264) | def patch_qwen3_5_moe_modeling_packing():
  function patch_qwen3_5_vlm_flash_attention (line 270) | def patch_qwen3_5_vlm_flash_attention():

FILE: src/axolotl/monkeypatch/models/qwen3_next/modeling.py
  function get_cu_seqlens (line 18) | def get_cu_seqlens(position_ids):
  function patch_qwen3_next_decoder_layer (line 39) | def patch_qwen3_next_decoder_layer():
  function patch_qwen3_next_gateddelta_layer (line 110) | def patch_qwen3_next_gateddelta_layer():
  function patch_qwen3_next_imports (line 283) | def patch_qwen3_next_imports():
  function patch_qwen3_next_modeling_packing (line 336) | def patch_qwen3_next_modeling_packing():

FILE: src/axolotl/monkeypatch/models/voxtral/modeling.py
  function patch_voxtral_conditional_generation_forward (line 10) | def patch_voxtral_conditional_generation_forward():

FILE: src/axolotl/monkeypatch/moe_quant.py
  class Bnb8bitParametrization (line 23) | class Bnb8bitParametrization(torch.nn.Module):
    method __init__ (line 26) | def __init__(self, row_stats: torch.Tensor):
    method forward (line 31) | def forward(self, quantized_param: torch.Tensor) -> torch.Tensor:
  function _enable_parametrization_cache (line 40) | def _enable_parametrization_cache(module, inputs):
  function _disable_parametrization_cache (line 44) | def _disable_parametrization_cache(module, inputs, output):
  function replace_parameter_8bit (line 50) | def replace_parameter_8bit(module, param_name):
  function patch_moe_quantization_on_load (line 71) | def patch_moe_quantization_on_load(cfg):
  function get_moe_quantized_count (line 146) | def get_moe_quantized_count():
  function patch_peft_target_parameters_matching (line 151) | def patch_peft_target_parameters_matching():

FILE: src/axolotl/monkeypatch/multipack.py
  function patch_for_multipack (line 66) | def patch_for_multipack(model_type, model_name=None, has_remote_code=Fal...
  function patch_remote (line 80) | def patch_remote(model_name):

FILE: src/axolotl/monkeypatch/peft/utils.py
  function get_peft_prep_code (line 32) | def get_peft_prep_code() -> str:
  function check_peft_prep_code_is_patchable (line 37) | def check_peft_prep_code_is_patchable() -> bool:
  function patch_peft_prep_code (line 43) | def patch_peft_prep_code():

FILE: src/axolotl/monkeypatch/relora.py
  function magnitude_pruning_ (line 33) | def magnitude_pruning_(tensor, prune_ratio):
  function reset_optimizer (line 43) | def reset_optimizer(
  class ReLoRACallback (line 81) | class ReLoRACallback(TrainerCallback):
    method __init__ (line 84) | def __init__(self, cfg: DictDefault):
    method on_train_begin (line 101) | def on_train_begin(
    method on_step_begin (line 120) | def on_step_begin(
    method on_save (line 182) | def on_save(
    method on_log (line 220) | def on_log(
    method on_train_end (line 231) | def on_train_end(
  function sharded_paths (line 255) | def sharded_paths(path: str, module_names: List[str]) -> Dict[str, str]:
  function lora_delta_weight (line 270) | def lora_delta_weight(layer: peft.tuners.lora.LoraLayer, device) -> torc...
  function find_lora_modules (line 289) | def find_lora_modules(model: peft.LoraModel) -> Dict[str, peft.tuners.lo...
  function update_weights (line 305) | def update_weights(
  function merge_and_save (line 330) | def merge_and_save(
  function load_weight_checkpoint (line 420) | def load_weight_checkpoint(model: peft.LoraModel, checkpoint_path: str):

FILE: src/axolotl/monkeypatch/ring_attn/adapters/batch.py
  function create_flash_attn_forward_varlen_llama3 (line 42) | def create_flash_attn_forward_varlen_llama3(
  function substitute_hf_flash_attn (line 156) | def substitute_hf_flash_attn(

FILE: src/axolotl/monkeypatch/ring_attn/patch.py
  function get_ring_attn_group (line 37) | def get_ring_attn_group() -> dist.ProcessGroup:
  function set_ring_attn_group (line 44) | def set_ring_attn_group(ring_attn_group: dist.ProcessGroup | None):
  function create_ring_flash_attention_forward (line 50) | def create_ring_flash_attention_forward(
  function register_ring_attn_from_device_mesh (line 135) | def register_ring_attn_from_device_mesh(
  function update_ring_attn_params (line 214) | def update_ring_attn_params(position_ids: torch.Tensor | None):

FILE: src/axolotl/monkeypatch/scaled_softmax_attn.py
  function patch_scaled_softmax_attention (line 29) | def patch_scaled_softmax_attention(
  function ssmax_flex_attention_forward (line 53) | def ssmax_flex_attention_forward(
  function unpatch_scaled_softmax_attention (line 133) | def unpatch_scaled_softmax_attention():

FILE: src/axolotl/monkeypatch/stablelm_attn_hijack_flash.py
  function replace_stablelm_attn_with_flash_attn (line 42) | def replace_stablelm_attn_with_flash_attn(model_name="stabilityai/stable...
  function rotate_half (line 57) | def rotate_half(x: torch.Tensor):
  function apply_rotary_pos_emb (line 64) | def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
  function repeat_kv (line 76) | def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
  function flashattn_attn (line 90) | def flashattn_attn(
  function decoder_layer_forward (line 200) | def decoder_layer_forward(
  function stablelm_model_forward (line 247) | def stablelm_model_forward(

FILE: src/axolotl/monkeypatch/tiled_mlp/base.py
  class DeepSpeedTiledMLPMoE (line 11) | class DeepSpeedTiledMLPMoE(torch.autograd.Function):
    method forward (line 13) | def forward(
    method backward (line 47) | def backward(ctx, *grads) -> torch.Tensor:
  class TiledMLP (line 99) | class TiledMLP(torch.autograd.Function):
    method forward (line 105) | def forward(
    method backward (line 138) | def backward(ctx, *grads) -> torch.Tensor:
  class GradientAccumulator (line 191) | class GradientAccumulator:
    method __init__ (line 197) | def __init__(
    method install_hooks (line 223) | def install_hooks(self, is_last_shard: bool):
    method cleanup (line 251) | def cleanup(self):

FILE: src/axolotl/monkeypatch/tiled_mlp/patch.py
  function patch_tiled_mlp (line 15) | def patch_tiled_mlp(model_type, use_original_mlp=True, cfg_num_shards=No...

FILE: src/axolotl/monkeypatch/trainer/lr.py
  function _get_learning_rate (line 13) | def _get_learning_rate(self):
  function patch_trainer_get_lr (line 39) | def patch_trainer_get_lr():

FILE: src/axolotl/monkeypatch/trainer/trl.py
  function prepare_fsdp (line 4) | def prepare_fsdp(model, accelerator):
  function patch_trl_prepare_fsdp2 (line 10) | def patch_trl_prepare_fsdp2():

FILE: src/axolotl/monkeypatch/trainer/trl_vllm.py
  function _batch_update_named_params (line 20) | def _batch_update_named_params(
  function _update_model_params (line 65) | def _update_model_params(self, model: nn.Module, chunk_size: int | None ...
  function _patched_extract_logprobs (line 71) | def _patched_extract_logprobs(all_outputs):
  function _patched_split_tensor_dict (line 97) | def _patched_split_tensor_dict(tensor_dict, num_chunks):
  function _patched_shuffle_sequence_dict (line 121) | def _patched_shuffle_sequence_dict(seq_dict):
  function _patch_sync_weights_batched (line 146) | def _patch_sync_weights_batched(original_init):
  function _make_batched_sync_weights (line 157) | def _make_batched_sync_weights(original_sync_weights):
  function patch_trl_vllm (line 214) | def patch_trl_vllm():

FILE: src/axolotl/monkeypatch/trainer/utils.py
  function _entropy_online_kernel (line 22) | def _entropy_online_kernel(
  function _entropy_online_kernel_strided (line 62) | def _entropy_online_kernel_strided(
  function entropy_from_logits (line 108) | def entropy_from_logits(logits: torch.Tensor, chunk_size: int = 128) -> ...
  function selective_log_softmax_original (line 174) | def selective_log_softmax_original(logits, index) -> torch.Tensor:
  function _selective_logsoftmax_fwd_kernel (line 199) | def _selective_logsoftmax_fwd_kernel(
  function _selective_logsoftmax_bwd_kernel (line 255) | def _selective_logsoftmax_bwd_kernel(
  class _SelectiveLogSoftmaxTriton (line 320) | class _SelectiveLogSoftmaxTriton(torch.autograd.Function):
    method forward (line 322) | def forward(ctx, flat_logits, flat_index, K, K_BLOCK, V, BLOCK_V, MAX_...
    method backward (line 352) | def backward(ctx, grad_output):
  function selective_log_softmax (line 390) | def selective_log_softmax(logits, index) -> torch.Tensor:

FILE: src/axolotl/monkeypatch/trainer_accelerator_args.py
  function get_create_accelerate_code (line 30) | def get_create_accelerate_code() -> str:
  function check_create_accelerate_code_is_patchable (line 35) | def check_create_accelerate_code_is_patchable() -> bool:
  function patch_create_accelerate_code_for_fp8 (line 41) | def patch_create_accelerate_code_for_fp8(enable_fsdp_float8_all_gather: ...

FILE: src/axolotl/monkeypatch/trainer_fsdp_optim.py
  function get_training_loop_code (line 25) | def get_training_loop_code() -> str:
  function check_training_loop_is_patchable (line 30) | def check_training_loop_is_patchable() -> bool:
  function patch_training_loop_for_fsdp (line 36) | def patch_training_loop_for_fsdp():

FILE: src/axolotl/monkeypatch/transformers/trainer_context_parallel.py
  function patch_prepare_context_parallel_inputs (line 19) | def patch_prepare_context_parallel_inputs() -> None:

FILE: src/axolotl/monkeypatch/transformers/trainer_loss_calc.py
  function check_evaluation_loop_is_patchable (line 39) | def check_evaluation_loop_is_patchable() -> bool:
  function patch_evaluation_loop (line 44) | def patch_evaluation_loop():
  function check_maybe_log_save_evaluate_is_patchable (line 95) | def check_maybe_log_save_evaluate_is_patchable() -> bool:
  function patch_maybe_log_save_evaluate (line 100) | def patch_maybe_log_save_evaluate():

FILE: src/axolotl/monkeypatch/transformers_fa_utils.py
  function fixed_fa_peft_integration_check (line 15) | def fixed_fa_peft_integration_check(
  function patch_fa_peft_integration (line 63) | def patch_fa_peft_integration():

FILE: src/axolotl/monkeypatch/unsloth_.py
  function original_apply_qkv (line 35) | def original_apply_qkv(self, hidden_states):
  function original_apply_o (line 42) | def original_apply_o(self, hidden_states):
  function get_self_attn_code (line 47) | def get_self_attn_code() -> str:
  function check_self_attn_is_patchable (line 52) | def check_self_attn_is_patchable() -> bool:
  function integrate_cross_entropy_loss_patch (line 58) | def integrate_cross_entropy_loss_patch(model_type: str = "llama") -> None:
  function patch_self_attn_lora (line 91) | def patch_self_attn_lora():
  function integrate_rope_embeddings (line 130) | def integrate_rope_embeddings():
  function integrate_lora_mlp_patch (line 148) | def integrate_lora_mlp_patch(peft_model: PeftModelForCausalLM):
  function integrate_lora_patch (line 183) | def integrate_lora_patch(peft_model: PeftModelForCausalLM, cfg):
  function patch_unsloth_layernorm (line 228) | def patch_unsloth_layernorm():

FILE: src/axolotl/monkeypatch/utils.py
  function get_max_seqlen_in_batch (line 13) | def get_max_seqlen_in_batch(attention_mask: torch.Tensor) -> torch.Tensor:
  function get_unpad_data (line 26) | def get_unpad_data(attention_mask: torch.Tensor):
  function get_cu_seqlens (line 43) | def get_cu_seqlens(attn_mask):
  function get_cu_seqlens_from_pos_ids (line 94) | def get_cu_seqlens_from_pos_ids(
  function set_module_name (line 155) | def set_module_name(model, name, value):
  function detab_code (line 168) | def detab_code(code: str) -> Tuple[str, str]:

FILE: src/axolotl/monkeypatch/xformers_/__init__.py
  class FusedMLP (line 12) | class FusedMLP(torch.nn.Module):
    method __init__ (line 17) | def __init__(
    method _post_training (line 38) | def _post_training(self, model, name):
    method forward (line 51) | def forward(self, x: torch.Tensor) -> torch.Tensor:

FILE: src/axolotl/processing_strategies.py
  class ProcessingStrategy (line 21) | class ProcessingStrategy:
    method __init__ (line 24) | def __init__(
    method __call__ (line 47) | def __call__(self, examples: list[dict]) -> list[dict]:
    method _mask_non_assistant (line 223) | def _mask_non_assistant(self, labels: Tensor) -> Tensor:
    method process_labels (line 230) | def process_labels(self, input_ids: Tensor) -> Tensor:
  class Qwen2VLProcessingStrategy (line 244) | class Qwen2VLProcessingStrategy(ProcessingStrategy):
    method __init__ (line 247) | def __init__(
  class Qwen3_5ProcessingStrategy (line 261) | class Qwen3_5ProcessingStrategy(ProcessingStrategy):
    method __init__ (line 264) | def __init__(
    method process_labels (line 281) | def process_labels(self, input_ids):
  class Gemma3ProcessingStrategy (line 287) | class Gemma3ProcessingStrategy(ProcessingStrategy):
    method __init__ (line 290) | def __init__(
    method process_labels (line 303) | def process_labels(self, input_ids):
  class Gemma3nProcessingStrategy (line 314) | class Gemma3nProcessingStrategy(ProcessingStrategy):
    method _mask_non_assistant (line 317) | def _mask_non_assistant(self, labels: Tensor) -> Tensor:
    method process_labels (line 389) | def process_labels(self, input_ids):
  class VoxtralProcessingStrategy (line 407) | class VoxtralProcessingStrategy(ProcessingStrategy):
    method __init__ (line 410) | def __init__(
    method process_labels (line 425) | def process_labels(self, input_ids):
  class SmolVLM2ProcessingStrategy (line 435) | class SmolVLM2ProcessingStrategy(ProcessingStrategy):
    method __init__ (line 438) | def __init__(
  class Mistral3ProcessingStrategy (line 453) | class Mistral3ProcessingStrategy(ProcessingStrategy):
    method __init__ (line 456) | def __init__(
    method process_labels (line 472) | def process_labels(self, input_ids):
  class InternVLProcessingStrategy (line 483) | class InternVLProcessingStrategy(ProcessingStrategy):
    method __init__ (line 486) | def __init__(
    method process_labels (line 500) | def process_labels(self, input_ids):
  class Glm4vProcessingStrategy (line 514) | class Glm4vProcessingStrategy(ProcessingStrategy):
    method __init__ (line 517) | def __init__(
    method process_labels (line 550) | def process_labels(self, input_ids):
  function get_processing_strategy (line 566) | def get_processing_strategy(

FILE: src/axolotl/prompt_strategies/__init__.py
  function load (line 12) | def load(strategy, tokenizer, cfg, ds_cfg, processor=None):

FILE: src/axolotl/prompt_strategies/alpaca_chat.py
  function load (line 12) | def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
  class AlpacaConcisePrompter (line 25) | class AlpacaConcisePrompter(AlpacaPrompter):
  class AlpacaChatPrompter (line 34) | class AlpacaChatPrompter(AlpacaPrompter):
    method __init__ (line 42) | def __init__(self):
  class NoSystemPrompter (line 47) | class NoSystemPrompter(AlpacaPrompter):
    method __init__ (line 57) | def __init__(self):
  class AlpacaQAPromptTokenizingStrategy (line 61) | class AlpacaQAPromptTokenizingStrategy(InstructionPromptTokenizingStrate...
    method parse_instruction_fields (line 66) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class CamelAIPromptTokenizingStrategy (line 74) | class CamelAIPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    method parse_instruction_fields (line 79) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  function load_concise (line 87) | def load_concise(tokenizer, cfg):
  function load_qa (line 96) | def load_qa(tokenizer, cfg):
  function load_camel_ai (line 105) | def load_camel_ai(tokenizer, cfg):
  function load_no_prompt (line 114) | def load_no_prompt(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/alpaca_instruct.py
  function load (line 7) | def load(tokenizer, cfg):
  function load_no_prompt (line 16) | def load_no_prompt(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/alpaca_w_system.py
  class InstructionWSystemPromptTokenizingStrategy (line 11) | class InstructionWSystemPromptTokenizingStrategy(PromptTokenizingStrategy):
    method parse_instruction_fields (line 16) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str]:
    method tokenize_prompt (line 24) | def tokenize_prompt(self, prompt):
  class SystemDataPrompter (line 55) | class SystemDataPrompter(AlpacaPrompter):
    method build_prompt_w_system (line 62) | def build_prompt_w_system(
  class OpenOrcaSystemDataPrompter (line 89) | class OpenOrcaSystemDataPrompter(SystemDataPrompter):
    method match_prompt_style (line 94) | def match_prompt_style(self):
  class OpenOrcaPromptTokenizingStrategy (line 111) | class OpenOrcaPromptTokenizingStrategy(InstructionWSystemPromptTokenizin...
    method parse_instruction_fields (line 116) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str]:
  function load (line 125) | def load(tokenizer, cfg):
  function load_instruct (line 129) | def load_instruct(tokenizer, cfg):
  function load_chat (line 138) | def load_chat(tokenizer, cfg):
  function load_open_orca (line 147) | def load_open_orca(tokenizer, cfg):
  function load_open_orca_chatml (line 156) | def load_open_orca_chatml(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/base.py
  function load (line 12) | def load(strategy, cfg, module_base=None, **kwargs):

FILE: src/axolotl/prompt_strategies/bradley_terry/__init__.py
  function load (line 12) | def load(strategy, tokenizer, cfg, ds_cfg):

FILE: src/axolotl/prompt_strategies/bradley_terry/chat_template.py
  class BTChatTemplateStrategy (line 19) | class BTChatTemplateStrategy(ChatTemplateStrategy):
    method supports_batched (line 25) | def supports_batched(self) -> bool:
    method _tokenize_single_prompt (line 28) | def _tokenize_single_prompt(self, prompt):
  function load (line 83) | def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):

FILE: src/axolotl/prompt_strategies/bradley_terry/llama3.py
  function icr (line 6) | def icr(

FILE: src/axolotl/prompt_strategies/chat_template.py
  class ChatTemplatePrompter (line 28) | class ChatTemplatePrompter(Prompter):
    method __init__ (line 31) | def __init__(
    method chat_template_msg_variables (line 89) | def chat_template_msg_variables(self) -> Set[str]:
    method build_prompt (line 92) | def build_prompt(
    method get_offsets_for_train_detail (line 158) | def get_offsets_for_train_detail(
    method adjust_train_details (line 202) | def adjust_train_details(
    method get_chat_template_msg_variables (line 260) | def get_chat_template_msg_variables(
  class ChatTemplateStrategy (line 267) | class ChatTemplateStrategy(PromptTokenizingStrategy):
    method __init__ (line 272) | def __init__(
    method _validate_eot_and_eos_tokens (line 318) | def _validate_eot_and_eos_tokens(self):
    method supports_batched (line 384) | def supports_batched(self) -> bool:
    method is_prompt_batched (line 388) | def is_prompt_batched(self, prompt: dict[str, Any]) -> bool:
    method tokenize_prompt (line 396) | def tokenize_prompt(self, prompt: dict[str, Any]):
    method _tokenize_single_prompt (line 423) | def _tokenize_single_prompt(self, prompt: dict) -> Dict[str, List[int]]:
    method find_first_eos_token (line 587) | def find_first_eos_token(self, input_ids, start_idx):
    method find_first_eot_token (line 594) | def find_first_eot_token(self, input_ids, start_idx):
    method find_turn (line 613) | def find_turn(
    method get_conversation_thread (line 700) | def get_conversation_thread(self, prompt):
    method transform_message (line 733) | def transform_message(self, message: dict) -> dict:
    method _get_images (line 828) | def _get_images(self, prompt):
    method _get_tools (line 831) | def _get_tools(self, prompt) -> list[dict] | None:
    method _get_messages (line 862) | def _get_messages(self, prompt):
  class MistralStrategy (line 876) | class MistralStrategy(ChatTemplateStrategy):
    method __init__ (line 881) | def __init__(
    method find_first_eot_token (line 931) | def find_first_eot_token(self, input_ids, start_idx):
  class MistralPrompter (line 937) | class MistralPrompter(ChatTemplatePrompter):
    method __init__ (line 942) | def __init__(self, *args, **kwargs):
  class StrategyLoader (line 948) | class StrategyLoader:
    method _get_strategy_cls (line 953) | def _get_strategy_cls(self, cfg):
    method _get_prompter_cls (line 959) | def _get_prompter_cls(self, cfg):
    method _get_strategy_params (line 965) | def _get_strategy_params(self, cfg, ds_cfg: Dict[str, Any]):
    method __call__ (line 976) | def __call__(

FILE: src/axolotl/prompt_strategies/completion.py
  class CompletionPromptTokenizingStrategy (line 11) | class CompletionPromptTokenizingStrategy(InstructionPromptTokenizingStra...
    method __init__ (line 18) | def __init__(self, *args, max_length=None, **kwargs):
    method supports_batched (line 24) | def supports_batched(self):
    method field (line 28) | def field(self) -> str:
    method field (line 32) | def field(self, new_field: str):
    method parse_instruction_fields (line 35) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
    method tokenize_prompt (line 42) | def tokenize_prompt(self, prompt):
    method _build_full_prompt (line 62) | def _build_full_prompt(self, instruction, input, response):
  class CompletionPrompter (line 66) | class CompletionPrompter:
    method build_prompt (line 71) | def build_prompt(
  function load (line 80) | def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):

FILE: src/axolotl/prompt_strategies/context_qa.py
  function load_404 (line 10) | def load_404(tokenizer, cfg):
  function load (line 19) | def load(tokenizer, cfg):
  function load_v2 (line 28) | def load_v2(tokenizer, cfg):
  class AlpacaContextPrompter (line 37) | class AlpacaContextPrompter(AlpacaPrompter):
  class AlpacaContextPromptTokenizingStrategy (line 50) | class AlpacaContextPromptTokenizingStrategy(InstructionPromptTokenizingS...
    method parse_instruction_fields (line 55) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class ContextQaV2PromptTokenizingStrategy (line 63) | class ContextQaV2PromptTokenizingStrategy(InstructionPromptTokenizingStr...
    method parse_instruction_fields (line 68) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class ContextV2Prompter (line 80) | class ContextV2Prompter(AlpacaPrompter):
    method match_prompt_style (line 88) | def match_prompt_style(self):
  class AlpacaMissingInfoContextPromptTokenizingStrategy (line 94) | class AlpacaMissingInfoContextPromptTokenizingStrategy(
    method parse_instruction_fields (line 102) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:

FILE: src/axolotl/prompt_strategies/creative_acr.py
  class CreativeAnsweringPromptTokenizingStrategy (line 10) | class CreativeAnsweringPromptTokenizingStrategy(InstructionPromptTokeniz...
    method parse_instruction_fields (line 15) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class CreativeCritiquePromptTokenizingStrategy (line 27) | class CreativeCritiquePromptTokenizingStrategy(InstructionPromptTokenizi...
    method parse_instruction_fields (line 63) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class CreativeRevisePromptTokenizingStrategy (line 84) | class CreativeRevisePromptTokenizingStrategy(InstructionPromptTokenizing...
    method parse_instruction_fields (line 103) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class CreativePrompterBase (line 126) | class CreativePrompterBase:
    method build_prompt (line 134) | def build_prompt(
  class CreativeAnswerPrompter (line 149) | class CreativeAnswerPrompter(CreativePrompterBase):
  class CreativeCritiquePrompter (line 157) | class CreativeCritiquePrompter(CreativePrompterBase):
  class CreativeRevisePrompter (line 165) | class CreativeRevisePrompter(CreativePrompterBase):
  function load_answer (line 173) | def load_answer(tokenizer, cfg):
  function load_critique (line 182) | def load_critique(tokenizer, cfg):
  function load_revise (line 191) | def load_revise(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/dpo/chat_template.py
  function default (line 9) | def default(cfg, dataset_idx=0, **kwargs):
  function argilla_chat (line 125) | def argilla_chat(cfg, dataset_idx=0, **kwargs):

FILE: src/axolotl/prompt_strategies/dpo/chatml.py
  function default (line 6) | def default(
  function argilla_chat (line 46) | def argilla_chat(
  function icr (line 65) | def icr(
  function intel (line 91) | def intel(cfg, **kwargs):
  function prompt_pairs (line 113) | def prompt_pairs(cfg, **kwargs):
  function ultra (line 131) | def ultra(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/dpo/llama3.py
  function default (line 6) | def default(
  function argilla_chat (line 46) | def argilla_chat(
  function icr (line 65) | def icr(
  function intel (line 91) | def intel(cfg, **kwargs):
  function prompt_pairs (line 113) | def prompt_pairs(cfg, **kwargs):
  function ultra (line 131) | def ultra(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/dpo/passthrough.py
  function default (line 6) | def default(cfg, dataset_idx=0, **kwargs):

FILE: src/axolotl/prompt_strategies/dpo/user_defined.py
  function default (line 6) | def default(cfg, dataset_idx=0, **kwargs):

FILE: src/axolotl/prompt_strategies/dpo/zephyr.py
  function nectar (line 6) | def nectar(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/input_output.py
  class RawInputOutputStrategy (line 9) | class RawInputOutputStrategy(PromptTokenizingStrategy):
    method __init__ (line 12) | def __init__(self, *args, eos_token=None, **kwargs):
    method tokenize_prompt (line 18) | def tokenize_prompt(self, prompt):
  class RawInputOutputPrompter (line 40) | class RawInputOutputPrompter(Prompter):
    method build_prompt (line 43) | def build_prompt(self, source) -> Generator[Tuple[bool, str], None, No...
  function load (line 48) | def load(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/jinja_template_analyzer.py
  class JinjaTemplateAnalysis (line 9) | class JinjaTemplateAnalysis(TypedDict):
  class GenerationTagIgnore (line 31) | class GenerationTagIgnore(Extension):
    method parse (line 38) | def parse(self, parser):
  class JinjaTemplateAnalyzer (line 43) | class JinjaTemplateAnalyzer:
    method __init__ (line 72) | def __init__(self, template: str):
    method _visit_node (line 83) | def _visit_node(self, node) -> None:
    method _get_target_name (line 132) | def _get_target_name(self, node) -> Optional[str]:
    method _get_target_names (line 146) | def _get_target_names(self, node) -> list[str]:
    method _get_base_name (line 167) | def _get_base_name(self, node) -> Optional[str]:
    method get_template_variables (line 180) | def get_template_variables(self) -> Dict[str, Set[str]]:
    method analyze_template (line 212) | def analyze_template(self) -> Dict[str, JinjaTemplateAnalysis]:
    method get_downstream_properties (line 269) | def get_downstream_properties(self, start_var: str) -> Dict[str, Set[s...
    method get_message_vars (line 318) | def get_message_vars(self, field_messages: str = "messages") -> Set[str]:

FILE: src/axolotl/prompt_strategies/kto/chatml.py
  function argilla (line 6) | def argilla(
  function argilla_chat (line 26) | def argilla_chat(
  function intel (line 44) | def intel(cfg, **kwargs):
  function prompt_pairs (line 66) | def prompt_pairs(cfg, **kwargs):
  function ultra (line 83) | def ultra(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/kto/llama3.py
  function argilla (line 6) | def argilla(
  function argilla_chat (line 26) | def argilla_chat(
  function intel (line 44) | def intel(cfg, **kwargs):
  function prompt_pairs (line 66) | def prompt_pairs(cfg, **kwargs):
  function ultra (line 83) | def ultra(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/kto/user_defined.py
  function default (line 6) | def default(cfg, dataset_idx=0, **kwargs):

FILE: src/axolotl/prompt_strategies/llama2_chat.py
  class Llama2ChatConversation (line 38) | class Llama2ChatConversation:
    method get_prompt (line 58) | def get_prompt(self) -> str:
    method append_message (line 73) | def append_message(self, role: str, message: str):
  class LLama2ChatTokenizingStrategy (line 78) | class LLama2ChatTokenizingStrategy(PromptTokenizingStrategy):
    method __init__ (line 84) | def __init__(self, *args, **kwargs):
    method tokenize_prompt (line 91) | def tokenize_prompt(self, prompt):
  class Llama2ChatPrompter (line 156) | class Llama2ChatPrompter:
    method build_prompt (line 169) | def build_prompt(self, source) -> Generator[Llama2ChatConversation, No...
  function load (line 202) | def load(tokenizer, cfg) -> LLama2ChatTokenizingStrategy:

FILE: src/axolotl/prompt_strategies/messages/__init__.py
  function load (line 11) | def load(tokenizer, cfg, ds_cfg, processor=None):

FILE: src/axolotl/prompt_strategies/messages/chat.py
  class ChatMessageDatasetWrappingStrategy (line 12) | class ChatMessageDatasetWrappingStrategy(DatasetWrappingStrategy):
    method __init__ (line 17) | def __init__(
    method wrap_dataset (line 33) | def wrap_dataset(
  function load (line 51) | def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):

FILE: src/axolotl/prompt_strategies/metharme.py
  class MetharmePromptTokenizingStrategy (line 14) | class MetharmePromptTokenizingStrategy(InstructionPromptTokenizingStrate...
    method parse_instruction_fields (line 19) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
    method _tokenize (line 22) | def _tokenize(
  class MetharmePrompter (line 56) | class MetharmePrompter(AlpacaPrompter):
    method __init__ (line 67) | def __init__(self, *args, **kwargs):
  function load (line 71) | def load(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/orcamini.py
  class OrcaMiniPrompter (line 19) | class OrcaMiniPrompter(AlpacaPrompter):
    method match_prompt_style (line 22) | def match_prompt_style(self):
    method build_prompt_w_system (line 27) | def build_prompt_w_system(
  function load (line 41) | def load(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/orpo/chat_template.py
  class Message (line 12) | class Message(BaseModel):
  class MessageList (line 20) | class MessageList(BaseModel):
  function load (line 26) | def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None, **kwar...
  class ORPODatasetParsingStrategy (line 44) | class ORPODatasetParsingStrategy:
    method get_chosen_conversation_thread (line 47) | def get_chosen_conversation_thread(self, prompt) -> MessageList:
    method get_rejected_conversation_thread (line 63) | def get_rejected_conversation_thread(self, prompt) -> MessageList:
    method get_prompt (line 79) | def get_prompt(self, prompt) -> MessageList:
    method get_chosen (line 112) | def get_chosen(self, prompt) -> MessageList:
    method get_rejected (line 121) | def get_rejected(self, prompt) -> MessageList:
  class ORPOTokenizingStrategy (line 131) | class ORPOTokenizingStrategy(PromptTokenizingStrategy):
    method __init__ (line 141) | def __init__(
    method tokenize_prompt (line 150) | def tokenize_prompt(self, prompt):
  class ORPOPrompter (line 205) | class ORPOPrompter(Prompter):
    method __init__ (line 208) | def __init__(self, chat_template, tokenizer):
    method build_prompt (line 212) | def build_prompt(
  function argilla (line 251) | def argilla(cfg, **kwargs):

FILE: src/axolotl/prompt_strategies/pretrain.py
  class PretrainTokenizer (line 10) | class PretrainTokenizer:
    method build_prompt (line 13) | def build_prompt(self, prompt) -> Generator[str, None, None]:
  class PretrainTokenizationStrategy (line 17) | class PretrainTokenizationStrategy(PromptTokenizingStrategy):
    method supports_batched (line 21) | def supports_batched(self):
    method __init__ (line 24) | def __init__(self, *args, max_length=None, text_column="text", **kwargs):
    method _tokenize (line 30) | def _tokenize(
    method tokenize_prompt (line 48) | def tokenize_prompt(self, prompt):
  function load (line 52) | def load(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/pygmalion.py
  class PygmalionPromptTokenizingStrategy (line 19) | class PygmalionPromptTokenizingStrategy(PromptTokenizingStrategy):
    method __init__ (line 26) | def __init__(self, prompter, tokenizer, *args, **kwargs):
    method tokenize_prompt (line 31) | def tokenize_prompt(self, prompt):
  class PygmalionPrompter (line 82) | class PygmalionPrompter:
    method __init__ (line 87) | def __init__(self, *args, **kwargs):
    method build_prompt (line 90) | def build_prompt(
  function load (line 100) | def load(tokenizer, cfg):

FILE: src/axolotl/prompt_strategies/stepwise_supervised.py
  class StepwiseSupervisedPromptTokenizingStrategy (line 15) | class StepwiseSupervisedPromptTokenizingStrategy:
    method __init__ (line 24) | def __init__(
    method tokenize_prompt (line 38) | def tokenize_prompt(
    method supports_batched (line 101) | def supports_batched(self):
  function load (line 105) | def load(

FILE: src/axolotl/prompt_strategies/user_defined.py
  class UserDefinedDatasetConfig (line 16) | class UserDefinedDatasetConfig:
    method __getitem__ (line 30) | def __getitem__(self, item):
  class UserDefinedPromptTokenizationStrategy (line 34) | class UserDefinedPromptTokenizationStrategy(InstructionWSystemPromptToke...
  function load (line 40) | def load(tokenizer, cfg, ds_cfg: Optional[UserDefinedDatasetConfig] = No...

FILE: src/axolotl/prompt_tokenizers.py
  class InvalidDataException (line 21) | class InvalidDataException(Exception):
  class DatasetWrappingStrategy (line 27) | class DatasetWrappingStrategy(abc.ABC):
    method wrap_dataset (line 33) | def wrap_dataset(
  class PromptTokenizingStrategy (line 43) | class PromptTokenizingStrategy(abc.ABC):
    method __init__ (line 50) | def __init__(
    method tokenize_prompt (line 66) | def tokenize_prompt(self, prompt):
    method supports_batched (line 70) | def supports_batched(self):
    method _tokenize (line 73) | def _tokenize(
  class InstructionPromptTokenizingStrategy (line 108) | class InstructionPromptTokenizingStrategy(PromptTokenizingStrategy):
    method parse_instruction_fields (line 113) | def parse_instruction_fields(
    method tokenize_prompt (line 118) | def tokenize_prompt(self, prompt):
    method _build_full_prompt (line 146) | def _build_full_prompt(
  class AlpacaPromptTokenizingStrategy (line 163) | class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    method parse_instruction_fields (line 168) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class AlpacaMultipleChoicePromptTokenizingStrategy (line 176) | class AlpacaMultipleChoicePromptTokenizingStrategy(InstructionPromptToke...
    method parse_instruction_fields (line 181) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class JeopardyPromptTokenizingStrategy (line 189) | class JeopardyPromptTokenizingStrategy(InstructionPromptTokenizingStrate...
    method parse_instruction_fields (line 194) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class OpenAssistantPromptTokenizingStrategy (line 202) | class OpenAssistantPromptTokenizingStrategy(InstructionPromptTokenizingS...
    method parse_instruction_fields (line 207) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class SummarizeTLDRPromptTokenizingStrategy (line 215) | class SummarizeTLDRPromptTokenizingStrategy(InstructionPromptTokenizingS...
    method parse_instruction_fields (line 220) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class GPTeacherPromptTokenizingStrategy (line 228) | class GPTeacherPromptTokenizingStrategy(InstructionPromptTokenizingStrat...
    method parse_instruction_fields (line 233) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class NomicGPT4AllPromptTokenizingStrategy (line 241) | class NomicGPT4AllPromptTokenizingStrategy(InstructionPromptTokenizingSt...
    method parse_instruction_fields (line 246) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str]:
  class ReflectionPromptTokenizingStrategy (line 254) | class ReflectionPromptTokenizingStrategy(PromptTokenizingStrategy):
    method parse_instruction_fields (line 259) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str...
    method tokenize_prompt (line 262) | def tokenize_prompt(self, prompt):
    method _build_full_prompt (line 292) | def _build_full_prompt(self, instruction, input, output, reflection, c...
    method _tokenize (line 305) | def _tokenize(self, prompt, add_eos_token=True, strip_bos_token=False):
  class AlpacaReflectionPTStrategy (line 325) | class AlpacaReflectionPTStrategy(ReflectionPromptTokenizingStrategy):
    method parse_instruction_fields (line 330) | def parse_instruction_fields(self, prompt) -> Tuple[str, str, str, str...
  function tokenize_prompt_default (line 340) | def tokenize_prompt_default() -> Tuple[Dict[str, List[int]], int]:
  function parse_tokenized_to_result (line 354) | def parse_tokenized_to_result(

FILE: src/axolotl/prompters.py
  class PromptStyle (line 15) | class PromptStyle(Enum):
  class Prompter (line 26) | class Prompter:
  class AlpacaPrompter (line 32) | class AlpacaPrompter(Prompter):
    method __init__ (line 44) | def __init__(self, prompt_style: Optional[str] = PromptStyle.INSTRUCT....
    method match_prompt_style (line 48) | def match_prompt_style(self):
    method _build_result (line 72) | def _build_result(self, instruction, input_text, output):
    method build_prompt (line 92) | def build_prompt(
    method __repr__ (line 100) | def __repr__(self) -> str:
  class UnpromptedPrompter (line 106) | class UnpromptedPrompter(AlpacaPrompter):
  class JeopardyPrompter (line 115) | class JeopardyPrompter(AlpacaPrompter):
  class MultipleChoiceExplainPrompter (line 123) | class MultipleChoiceExplainPrompter(AlpacaPrompter):
  class MultipleChoiceConcisePrompter (line 136) | class MultipleChoiceConcisePrompter(AlpacaPrompter):
    method match_prompt_style (line 144) | def match_prompt_style(self):
  class SummarizeTLDRPrompter (line 149) | class SummarizeTLDRPrompter(AlpacaPrompter):
    method match_prompt_style (line 157) | def match_prompt_style(self):
  class GPTeacherPrompter (line 162) | class GPTeacherPrompter(AlpacaPrompter):
  class NomicGPT4AllPrompter (line 168) | class NomicGPT4AllPrompter(AlpacaPrompter):
  class ReflectAlpacaPrompter (line 174) | class ReflectAlpacaPrompter(Prompter):
    method __init__ (line 189) | def __init__(self, prompt_style="instruct"):
    method match_prompt_style (line 193) | def match_prompt_style(self):
    method _build_result (line 217) | def _build_result(
    method build_prompt (line 241) | def build_prompt(
    method __repr__ (line 257) | def __repr__(self) -> str:
  class UnsupportedPrompter (line 268) | class UnsupportedPrompter(Prompter):
    method __init__ (line 273) | def __init__(self) -> None:
    method __repr__ (line 276) | def __repr__(self):

FILE: src/axolotl/scripts/vllm_serve_lora.py
  class LoRAScriptArguments (line 38) | class LoRAScriptArguments(ScriptArguments):
  function llm_worker (line 59) | def llm_worker(
  function main (line 125) | def main(script_args: ScriptArguments):

FILE: src/axolotl/scripts/vllm_worker_ext.py
  class BatchWeightSyncWorkerExtension (line 33) | class BatchWeightSyncWorkerExtension(WeightSyncWorkerExtension):
    method init_communicator (line 36) | def init_communicator(self, host, port, world_size, client_device_uuid):
    method _direct_set_weight (line 42) | def _direct_set_weight(self, name: str, weight: torch.Tensor) -> None:
    method update_named_param (line 112) | def update_named_param(self, name, dtype, shape):
    method batch_update_named_params (line 129) | def batch_update_named_params(self, params_list: list[tuple[str, str, ...

FILE: src/axolotl/telemetry/callbacks.py
  class TelemetryCallback (line 21) | class TelemetryCallback(TrainerCallback):
    method __init__ (line 31) | def __init__(self):
    method on_train_begin (line 41) | def on_train_begin(
    method on_train_end (line 52) | def on_train_end(
    method on_epoch_begin (line 68) | def on_epoch_begin(
    method on_epoch_end (line 80) | def on_epoch_end(
    method on_step_end (line 91) | def on_step_end(
    method _extract_last_metrics (line 155) | def _extract_last_metrics(self, state: TrainerState) -> dict:

FILE: src/axolotl/telemetry/errors.py
  function sanitize_stack_trace (line 18) | def sanitize_stack_trace(stack_trace: str) -> str:
  function send_errors (line 104) | def send_errors(func: Callable) -> Callable:

FILE: src/axolotl/telemetry/manager.py
  function is_main_process (line 61) | def is_main_process() -> bool:
  class TelemetryManager (line 98) | class TelemetryManager:
    method __new__ (line 104) | def __new__(cls):
    method __init__ (line 115) | def __init__(self):
    method get_instance (line 140) | def get_instance(cls) -> "TelemetryManager":
    method _check_telemetry_enabled (line 146) | def _check_telemetry_enabled(self) -> bool:
    method _load_whitelist (line 187) | def _load_whitelist(self) -> dict:
    method _is_whitelisted (line 199) | def _is_whitelisted(self, value: str) -> bool:
    method _init_posthog (line 220) | def _init_posthog(self):
    method _redact_paths (line 226) | def _redact_paths(self, properties: dict[str, Any]) -> dict[str, Any]:
    method _get_system_info (line 267) | def _get_system_info(self) -> dict[str, Any]:
    method send_event (line 359) | def send_event(self, event_type: str, properties: dict[str, Any] | Non...
    method send_system_info (line 387) | def send_system_info(self):
    method shutdown (line 392) | def shutdown(self):

FILE: src/axolotl/telemetry/runtime_metrics.py
  class RuntimeMetrics (line 17) | class RuntimeMetrics:
    method __post_init__ (line 34) | def __post_init__(self):
    method elapsed_time (line 41) | def elapsed_time(self) -> float:
    method epoch_time (line 45) | def epoch_time(self, epoch: int) -> float | None:
    method average_epoch_time (line 52) | def average_epoch_time(self) -> float | None:
    method steps_per_second (line 68) | def steps_per_second(self) -> float | None:
    method to_dict (line 75) | def to_dict(self) -> dict[str, Any]:
  class RuntimeMetricsTracker (line 112) | class RuntimeMetricsTracker:
    method __init__ (line 117) | def __init__(self):
    method start_epoch (line 123) | def start_epoch(self, epoch: int):
    method end_epoch (line 129) | def end_epoch(self, epoch: int):
    method update_step (line 133) | def update_step(self, step: int):
    method _get_allocated_memory (line 142) | def _get_allocated_memory(self) -> dict[int, int]:
    method update_memory_metrics (line 182) | def update_memory_metrics(self):
    method get_memory_metrics (line 195) | def get_memory_metrics(self) -> dict[str, Any]:

FILE: src/axolotl/train.py
  function setup_model_and_tokenizer (line 54) | def setup_model_and_tokenizer(
  function setup_reference_model (line 120) | def setup_reference_model(
  function setup_signal_handler (line 149) | def setup_signal_handler(cfg: DictDefault, model: PreTrainedModel):
  function execute_training (line 175) | def execute_training(
  function save_trained_model (line 224) | def save_trained_model(
  function create_model_card (line 354) | def create_model_card(cfg: DictDefault, trainer: Trainer):
  function save_initial_configs (line 390) | def save_initial_configs(
  function setup_model_card (line 430) | def setup_model_card(cfg: DictDefault):
  function handle_untrained_tokens_fix (line 447) | def handle_untrained_tokens_fix(
  function setup_model_and_trainer (line 487) | def setup_model_and_trainer(
  function train (line 558) | def train(

FILE: src/axolotl/utils/__init__.py
  function is_mlflow_available (line 12) | def is_mlflow_available():
  function is_comet_available (line 16) | def is_comet_available():
  function is_opentelemetry_available (line 20) | def is_opentelemetry_available():
  function is_trackio_available (line 27) | def is_trackio_available():
  function get_pytorch_version (line 31) | def get_pytorch_version() -> tuple[int, int, int]:
  function set_pytorch_cuda_alloc_conf (line 47) | def set_pytorch_cuda_alloc_conf():
  function set_misc_env (line 66) | def set_misc_env():
  function get_not_null (line 71) | def get_not_null(value, default=None):

FILE: src/axolotl/utils/bench.py
  function check_cuda_device (line 25) | def check_cuda_device(default_value):
  function gpu_memory_usage (line 54) | def gpu_memory_usage(device=0):
  function gpu_memory_usage_all (line 59) | def gpu_memory_usage_all(device=0):
  function mps_memory_usage_all (line 67) | def mps_memory_usage_all():
  function npu_memory_usage_all (line 73) | def npu_memory_usage_all(device=0):
  function gpu_memory_usage_smi (line 80) | def gpu_memory_usage_smi(device=0):
  function get_gpu_memory_usage (line 96) | def get_gpu_memory_usage(device: int | torch.device = 0):
  function log_gpu_memory_usage (line 110) | def log_gpu_memory_usage(

FILE: src/axolotl/utils/callbacks/__init__.py
  class LossWatchDogCallback (line 57) | class LossWatchDogCallback(TrainerCallback):
    method __init__ (line 60) | def __init__(self, cfg):
    method on_step_end (line 66) | def on_step_end(
  class SaveModelOnFirstStepCallback (line 86) | class SaveModelOnFirstStepCallback(TrainerCallback):
    method on_step_end (line 89) | def on_step_end(
  function bench_eval_callback_factory (line 101) | def bench_eval_callback_factory(trainer, tokenizer):
  function causal_lm_bench_eval_callback_factory (line 303) | def causal_lm_bench_eval_callback_factory(trainer: Trainer, tokenizer):
  function log_prediction_callback_factory (line 512) | def log_prediction_callback_factory(trainer: Trainer, tokenizer, logger:...
  class SaveAxolotlConfigtoWandBCallback (line 731) | class SaveAxolotlConfigtoWandBCallback(TrainerCallback):
    method __init__ (line 734) | def __init__(self, axolotl_config_path):
    method on_train_begin (line 737) | def on_train_begin(
  class GCCallback (line 829) | class GCCallback(TrainerCallback):
    method __init__ (line 832) | def __init__(self, gc_steps: int | None = -1):
    method _gc (line 836) | def _gc(self):
    method on_train_begin (line 840) | def on_train_begin(
    method on_step_begin (line 849) | def on_step_begin(
    method on_step_end (line 859) | def on_step_end(
    method on_epoch_end (line 885) | def on_epoch_end(
  function colab_inference_post_train_callback (line 895) | def colab_inference_post_train_callback(trainer: Trainer):

FILE: src/axolotl/utils/callbacks/comet_.py
  class SaveAxolotlConfigtoCometCallback (line 17) | class SaveAxolotlConfigtoCometCallback(TrainerCallback):
    method __init__ (line 20) | def __init__(self, axolotl_config_path):
    method on_train_begin (line 23) | def on_train_begin(

FILE: src/axolotl/utils/callbacks/dynamic_checkpoint.py
  class DynamicCheckpointCallback (line 22) | class DynamicCheckpointCallback(TrainerCallback):
    method _get_config_value (line 34) | def _get_config_value(self, config, key, default=None):
    method __init__ (line 40) | def __init__(self, cfg):
    method on_step_end (line 64) | def on_step_end(

FILE: src/axolotl/utils/callbacks/generation.py
  class SFTGenerationCallback (line 12) | class SFTGenerationCallback(TrainerCallback):
    method __init__ (line 15) | def __init__(self, trainer):
    method on_evaluate (line 18) | def on_evaluate(
    method _log_samples (line 62) | def _log_samples(self, samples: list, step: int):

FILE: src/axolotl/utils/callbacks/lisa.py
  function lisa_callback_factory (line 23) | def lisa_callback_factory(trainer: "AxolotlTrainer"):

FILE: src/axolotl/utils/callbacks/mlflow_.py
  function should_log_artifacts (line 20) | def should_log_artifacts() -> bool:
  class SaveAxolotlConfigtoMlflowCallback (line 25) | class SaveAxolotlConfigtoMlflowCallback(TrainerCallback):
    method __init__ (line 28) | def __init__(self, axolotl_config_path):
    method on_train_begin (line 31) | def on_train_begin(

FILE: src/axolotl/utils/callbacks/models.py
  function get_causal_lm_model_cls_prefix (line 8) | def get_causal_lm_model_cls_prefix(model_type: str) -> Tuple[str, str]:

FILE: src/axolotl/utils/callbacks/opentelemetry.py
  class OpenTelemetryMetricsCallback (line 30) | class OpenTelemetryMetricsCallback(TrainerCallback):
    method __init__ (line 45) | def __init__(self, cfg):
    method _create_metrics (line 76) | def _create_metrics(self):
    method _start_metrics_server (line 120) | def _start_metrics_server(self):
    method on_train_begin (line 135) | def on_train_begin(
    method on_log (line 149) | def on_log(
    method on_step_end (line 178) | def on_step_end(
    method on_evaluate (line 194) | def on_evaluate(
    method on_train_end (line 224) | def on_train_end(

FILE: src/axolotl/utils/callbacks/perplexity.py
  class Perplexity (line 19) | class Perplexity:
    method __init__ (line 25) | def __init__(
    method _feature_names (line 36) | def _feature_names(self) -> List[str]:
    method compute (line 39) | def compute(

FILE: src/axolotl/utils/callbacks/profiler.py
  class PytorchProfilerCallback (line 17) | class PytorchProfilerCallback(TrainerCallback):
    method __init__ (line 24) | def __init__(self, steps_to_profile: int = 5, profiler_steps_start: in...
    method on_step_begin (line 36) | def on_step_begin(
    method on_step_end (line 60) | def on_step_end(
    method on_train_end (line 82) | def on_train_end(

FILE: src/axolotl/utils/callbacks/qat.py
  function toggle_fake_quant (line 16) | def toggle_fake_quant(mod: nn.Module, enable: bool):
  class QATCallback (line 33) | class QATCallback(TrainerCallback):
    method __init__ (line 38) | def __init__(self, cfg: QATConfig):
    method on_step_begin (line 41) | def on_step_begin(self, args, state, control, model, **kwargs):

FILE: src/axolotl/utils/callbacks/swanlab.py
  class CustomSwanLabCallback (line 26) | class CustomSwanLabCallback(TrainerCallback):
    method __init__ (line 34) | def __init__(self):
    method setup (line 38) | def setup(self):
    method on_train_begin (line 58) | def on_train_begin(
    method on_log (line 95) | def on_log(
    method on_train_end (line 129) | def on_train_end(
  class SaveAxolotlConfigtoSwanLabCallback (line 146) | class SaveAxolotlConfigtoSwanLabCallback(TrainerCallback):
    method __init__ (line 149) | def __init__(self, axolotl_config_path):
    method on_train_begin (line 152) | def on_train_begin(

FILE: src/axolotl/utils/callbacks/tokens_per_second.py
  class TokensPerSecondCallback (line 22) | class TokensPerSecondCallback(TrainerCallback):
    method __init__ (line 28) | def __init__(
    method on_train_begin (line 41) | def on_train_begin(
    method on_step_begin (line 61) | def on_step_begin(
    method on_step_end (line 73) | def on_step_end(
    method on_log (line 95) | def on_log(

FILE: src/axolotl/utils/callbacks/trackio_.py
  class SaveAxolotlConfigtoTrackioCallback (line 18) | class SaveAxolotlConfigtoTrackioCallback(TrainerCallback):
    method __init__ (line 21) | def __init__(self, axolotl_config_path):
    method on_train_begin (line 24) | def on_train_begin(

FILE: src/axolotl/utils/chat_templates/base.py
  function get_chat_template (line 26) | def get_chat_template(
  function extract_chat_template_args (line 88) | def extract_chat_template_args(cfg, ds_cfg: Dict[str, Any] | None = None):
  function get_chat_template_from_config (line 98) | def get_chat_template_from_config(
  function register_chat_template (line 113) | def register_chat_template(template_name: str, chat_template: str):

FILE: src/axolotl/utils/collators/batching.py
  class DataCollatorForSeq2Seq (line 12) | class DataCollatorForSeq2Seq:
    method __call__ (line 55) | def __call__(self, features, return_tensors=None):
  class BatchSamplerDataCollatorForSeq2Seq (line 129) | class BatchSamplerDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
    method __call__ (line 134) | def __call__(self, features, return_tensors=None):
  class V2BatchSamplerDataCollatorForSeq2Seq (line 159) | class V2BatchSamplerDataCollatorForSeq2Seq(DataCollatorForSeq2Seq):
    method __call__ (line 166) | def __call__(self, features, return_tensors=None):
  class PretrainingBatchSamplerDataCollatorForSeq2Seq (line 200) | class PretrainingBatchSamplerDataCollatorForSeq2Seq(DataCollatorForSeq2S...
    method __init__ (line 205) | def __init__(self, *args, multipack_attn=True, **kwargs):
    method __call__ (line 209) | def __call__(self, features, return_tensors=None):

FILE: src/axolotl/utils/collators/mamba.py
  class MambaDataCollator (line 15) | class MambaDataCollator:
    method __call__ (line 22) | def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:

FILE: src/axolotl/utils/collators/mm_chat.py
  class MultiModalChatDataCollator (line 17) | class MultiModalChatDataCollator(DataCollatorMixin):
    method __post_init__ (line 29) | def __post_init__(self):
    method torch_call (line 33) | def torch_call(self, examples: list[dict]) -> dict[str, Any]:
    method process_rows (line 36) | def process_rows(

FILE: src/axolotl/utils/comet_.py
  function python_value_to_environ_value (line 45) | def python_value_to_environ_value(python_value):
  function setup_comet_env_vars (line 61) | def setup_comet_env_vars(cfg: DictDefault):

FILE: src/axolotl/utils/config/__init__.py
  function choose_device (line 30) | def choose_device(cfg):
  function resolve_dtype (line 64) | def resolve_dtype(cfg):
  function normalize_config (line 107) | def normalize_config(cfg):
  function normalize_cfg_datasets (line 269) | def normalize_cfg_datasets(cfg):
  function validate_config (line 288) | def validate_config(
  function prepare_plugins (line 337) | def prepare_plugins(cfg):

FILE: src/axolotl/utils/ctx_managers/sequence_parallel.py
  function apply_sequence_parallelism (line 24) | def apply_sequence_parallelism(
  class SequenceParallelContextManager (line 170) | class SequenceParallelContextManager:
    method __init__ (line 189) | def __init__(
    method __enter__ (line 233) | def __enter__(self):
    method __exit__ (line 238) | def __exit__(self, exc_type, exc_val, exc_tb):
    method _register_ring_attn (line 246) | def _register_ring_attn(self):
    method _register_model_hooks (line 255) | def _register_model_hooks(self):
    method _gather_outputs (line 359) | def _gather_outputs(self, output: CausalLMOutputWithPast) -> CausalLMO...
  class AllGatherWithGrad (line 368) | class AllGatherWithGrad(torch.autograd.Function):
    method forward (line 372) | def forward(
    method backward (line 419) | def backward(

FILE: src/axolotl/utils/data/lock.py
  class FileLockLoader (line 17) | class FileLockLoader:
    method __init__ (line 24) | def __init__(self, cfg: DictDefault):
    method load (line 33) | def load(self, load_fn: Callable[[], Any]) -> Any:
    method _increment_counter (line 46) | def _increment_counter(self):
    method cleanup (line 55) | def cleanup(self):

FILE: src/axolotl/utils/data/rl.py
  function prepare_preference_datasets (line 37) | def prepare_preference_datasets(
  function _map_dataset (line 86) | def _map_dataset(
  function _drop_long_sequences (line 125) | def _drop_long_sequences(
  function _load_split (line 182) | def _load_split(cfg: DictDefault, split: Literal["train", "test"]) -> Da...
  function _load_or_create_dataset_split (line 262) | def _load_or_create_dataset_split(

FILE: src/axolotl/utils/data/sft.py
  function prepare_datasets (line 48) | def prepare_datasets(
  function _prepare_standard_dataset (line 68) | def _prepare_standard_dataset(
  function _prepare_streaming_dataset (line 125) | def _prepare_streaming_dataset(
  function _extract_pretraining_config (line 176) | def _extract_pretraining_config(cfg: DictDefault) -> DictDefault:
  function _load_streaming_dataset (line 205) | def _load_streaming_dataset(
  function _create_placeholder_dataset (line 251) | def _create_placeholder_dataset() -> IterableDataset:
  function _load_tokenized_prepared_datasets (line 260) | def _load_tokenized_prepared_datasets(
  function _load_raw_datasets (line 311) | def _load_raw_datasets(
  function _load_and_process_single_dataset (line 369) | def _load_and_process_single_dataset(
  function _parse_dataset_type (line 420) | def _parse_dataset_type(d_type: str) -> tuple[str | None, str | None]:
  function _handle_train_dataset_split (line 432) | def _handle_train_dataset_split(
  function _apply_dataset_sharding (line 451) | def _apply_dataset_sharding(dataset: Dataset, cfg: DictDefault) -> Dataset:
  function _load_and_prepare_datasets (line 472) | def _load_and_prepare_datasets(

FILE: src/axolotl/utils/data/shared.py
  function get_dataset_type (line 48) | def get_dataset_type(dataset_config: DictDefault) -> str:
  function datasets_with_name_generator (line 60) | def datasets_with_name_generator(
  function load_dataset_with_config (line 93) | def load_dataset_with_config(
  function _check_if_hub_dataset (line 151) | def _check_if_hub_dataset(dataset_config: DictDefault, use_auth_token: b...
  function _get_remote_filesystem (line 173) | def _get_remote_filesystem(
  function _load_from_local_path (line 222) | def _load_from_local_path(
  function _load_from_hub (line 258) | def _load_from_hub(
  function _load_from_cloud (line 271) | def _load_from_cloud(
  function _load_from_url (line 298) | def _load_from_url(
  function _load_from_data_files (line 310) | def _load_from_data_files(
  function generate_split_fingerprints (line 339) | def generate_split_fingerprints(
  function get_prepared_dataset_path (line 354) | def get_prepared_dataset_path(cfg: DictDefault, dataset_hash: str) -> Path:
  function create_train_validation_split (line 368) | def create_train_validation_split(
  function _generate_from_iterable_dataset (line 400) | def _generate_from_iterable_dataset(
  function save_preprocessed_dataset (line 409) | def save_preprocessed_dataset(
  function load_preprocessed_dataset (line 457) | def load_preprocessed_dataset(cfg: DictDefault, dataset_hash: str) -> Da...
  function try_load_from_hub (line 486) | def try_load_from_hub(
  function generate_dataset_hash_from_config (line 506) | def generate_dataset_hash_from_config(
  function merge_datasets (line 529) | def merge_datasets(datasets: list[Dataset], cfg: DictDefault) -> Dataset:

FILE: src/axolotl/utils/data/streaming.py
  function encode_streaming (line 20) | def encode_streaming(
  function wrap_streaming_dataset (line 179) | def wrap_streaming_dataset(
  function encode_packed_streaming (line 254) | def encode_packed_streaming(

FILE: src/axolotl/utils/data/utils.py
  class RetryStrategy (line 23) | class RetryStrategy(Enum):
  function retry_on_request_exceptions (line 31) | def retry_on_request_exceptions(
  function md5 (line 73) | def md5(to_hash: str, encoding: str = "utf-8") -> str:
  function sha256 (line 81) | def sha256(to_hash: str, encoding: str = "utf-8") -> str:
  function _deduplicate_dataset (line 86) | def _deduplicate_dataset(
  function deduplicate_and_log_datasets (line 112) | def deduplicate_and_log_datasets(
  function keep_min_len (line 151) | def keep_min_len(sample, min_sequence_len=2):
  function truncate_long_seq (line 167) | def truncate_long_seq(sample, sequence_len=2048):
  function _should_skip_processing (line 188) | def _should_skip_processing(dataset: Dataset) -> bool:
  function _log_dataset_stats (line 208) | def _log_dataset_stats(dataset: Dataset) -> None:
  function _build_filter_kwargs (line 216) | def _build_filter_kwargs(dataset: Dataset, cfg: DictDefault) -> dict:
  function _filter_short_sequences (line 225) | def _filter_short_sequences(
  function _truncate_long_sequences (line 251) | def _truncate_long_sequences(
  function _drop_outside_range (line 269) | def _drop_outside_range(
  function handle_long_seq_in_dataset (line 314) | def handle_long_seq_in_dataset(

FILE: src/axolotl/utils/data/wrappers.py
  function handle_unknown_dataset_strategy (line 45) | def handle_unknown_dataset_strategy(dataset_config: DictDefault) -> NoRe...
  function get_dataset_wrapper (line 57) | def get_dataset_wrapper(
  function _is_dataset_already_tokenized (line 134) | def _is_dataset_already_tokenized(dataset: Dataset | IterableDataset) ->...
  function _handle_custom_dataset_type (line 144) | def _handle_custom_dataset_type(
  function _handle_bradley_terry_dataset (line 165) | def _handle_bradley_terry_dataset(
  function _handle_stepwise_supervised_dataset (line 189) | def _handle_stepwise_supervised_dataset(
  function _handle_loaded_strategy (line 213) | def _handle_loaded_strategy(
  function _handle_alpaca_dataset (line 231) | def _handle_alpaca_dataset(
  function _handle_explainchoice_dataset (line 254) | def _handle_explainchoice_dataset(
  function _handle_concisechoice_dataset (line 277) | def _handle_concisechoice_dataset(
  function _handle_summarizetldr_dataset (line 300) | def _handle_summarizetldr_dataset(
  function _handle_jeopardy_dataset (line 323) | def _handle_jeopardy_dataset(
  function _handle_oasst_dataset (line 346) | def _handle_oasst_dataset(
  function _handle_gpteacher_dataset (line 369) | def _handle_gpteacher_dataset(
  function _handle_reflection_dataset (line 392) | def _handle_reflection_dataset(

FILE: src/axolotl/utils/datasets.py
  function get_default_process_count (line 10) | def get_default_process_count():

FILE: src/axolotl/utils/dict.py
  class DictDefault (line 6) | class DictDefault(Dict):
    method __missing__ (line 11) | def __missing__(self, key):
    method __or__ (line 14) | def __or__(self, other):
    method __setitem__ (line 17) | def __setitem__(self, name, value):
  function remove_none_values (line 41) | def remove_none_values(obj):

FILE: src/axolotl/utils/distributed.py
  function get_device_type (line 21) | def get_device_type() -> torch.device:
  function get_device_count (line 32) | def get_device_count() -> int:
  function get_current_device (line 41) | def get_current_device() -> int:
  function init_distributed_state (line 50) | def init_distributed_state():
  function get_distributed_state (line 60) | def get_distributed_state() -> PartialState | None:
  function is_distributed (line 64) | def is_distributed() -> bool:
  function barrier (line 74) | def barrier():
  function is_main_process (line 83) | def is_main_process() -> bool:
  function is_local_main_process (line 101) | def is_local_main_process() -> bool:
  function get_world_size (line 107) | def get_world_size() -> int:
  function cleanup_distributed (line 111) | def cleanup_distributed():
  function zero_first (line 129) | def zero_first(is_main: bool):
  function gather_scalar_from_all_ranks (line 140) | def gather_scalar_from_all_ranks(fn, world_size=1):
  function broadcast_dict (line 176) | def broadcast_dict(vals: dict):
  function compute_and_broadcast (line 204) | def compute_and_broadcast(fn):
  function gather_from_all_ranks (line 237) | def gather_from_all_ranks(fn, world_size=1):
  function reduce_and_broadcast (line 274) | def reduce_and_broadcast(fn1, fn2):
  function build_parallelism_config (line 299) | def build_parallelism_config(cfg):
  function _get_parallel_config_kwargs (line 319) | def _get_parallel_config_kwargs(

FILE: src/axolotl/utils/environment.py
  function check_cuda_p2p_ib_support (line 19) | def check_cuda_p2p_ib_support():
  function check_cuda_p2p_support (line 27) | def check_cuda_p2p_support() -> bool:
  function get_package_version (line 49) | def get_package_version(package: str) -> Version:
  function is_package_version_ge (line 54) | def is_package_version_ge(package: str, version_: str) -> bool:

FILE: src/axolotl/utils/freeze.py
  function freeze_layers_except (line 14) | def freeze_layers_except(model, regex_patterns):
  function _invert_ranges (line 72) | def _invert_ranges(
  function _merge_ranges (line 102) | def _merge_ranges(
  function _create_freeze_parameters_hook (line 145) | def _create_freeze_parameters_hook(ranges_to_freeze: List[Tuple[int, int...
  class LayerNamePattern (line 173) | class LayerNamePattern:
    method __init__ (line 178) | def __init__(self, pattern: str):
    method match (line 189) | def match(self, name: str) -> bool:
    method _parse_pattern (line 201) | def _parse_pattern(

FILE: src/axolotl/utils/generation/sft.py
  function generate_samples (line 14) | def generate_samples(
  function format_generation_for_logging (line 143) | def format_generation_for_logging(

FILE: src/axolotl/utils/import_helper.py
  function get_cls_from_module_str (line 8) | def get_cls_from_module_str(module_str: str):

FILE: src/axolotl/utils/logging.py
  class MultiProcessAdapter (line 20) | class MultiProcessAdapter(logging.LoggerAdapter):
    method _should_log (line 26) | def _should_log(main_process_only: bool):
    method log (line 29) | def log(self, level, msg, *args, **kwargs):
    method warning_once (line 38) | def warning_once(self, *args, **kwargs):
  function get_logger (line 49) | def get_logger(name: str, log_level: str | None = None) -> MultiProcessA...

FILE: src/axolotl/utils/lora.py
  function get_lora_merged_state_dict (line 24) | def get_lora_merged_state_dict(

FILE: src/axolotl/utils/mistral/mistral3_processor.py
  class Mistral3ProcessorKwargs (line 14) | class Mistral3ProcessorKwargs(ProcessingKwargs):
  class Mistral3Processor (line 27) | class Mistral3Processor(ProcessorMixin):
    method __init__ (line 33) | def __init__(self, tokenizer: HFMistralTokenizer):
    method audio_tokenizer (line 37) | def audio_tokenizer(self) -> None:
    method _merge_kwargs (line 41) | def _merge_kwargs(
    method apply_chat_template (line 64) | def apply_chat_template(
    method __call__ (line 140) | def __call__(

FILE: src/axolotl/utils/mistral/mistral_tokenizer.py
  class HFMistralTokenizer (line 14) | class HFMistralTokenizer(MistralCommonBackend):
    method __init__ (line 20) | def __init__(self, name_or_path: str, **kwargs):
    method name_or_path (line 37) | def name_or_path(self) -> str:
    method name_or_path (line 41) | def name_or_path(self, name_or_path: str) -> None:
    method chat_template (line 45) | def chat_template(self) -> str | None:
    method chat_template (line 50) | def chat_template(self, chat_template: str | None) -> None:
    method _set_mode (line 53) | def _set_mode(self, mode: ValidationMode):
    method apply_chat_template (line 82) | def apply_chat_template(  # type: ignore
    method decode (line 107) | def decode(  # type: ignore
    method from_pretrained (line 124) | def from_pretrained(
    method save_pretrained (line 233) | def save_pretrained(self, *args, **kwargs) -> tuple[str, ...]:

FILE: src/axolotl/utils/mlflow_.py
  function setup_mlflow_env_vars (line 8) | def setup_mlflow_env_vars(cfg: DictDefault):

FILE: src/axolotl/utils/model_shard_quant.py
  function _replace_linear (line 21) | def _replace_linear(
  function load_and_quantize (line 62) | def load_and_quantize(
  function n_loading_workers (line 128) | def n_loading_workers(quant_method: str, param_count: float):
  function load_sharded_model (line 140) | def load_sharded_model(
  function load_sharded_model_quant (line 167) | def load_sharded_model_quant(

FILE: src/axolotl/utils/optimizers/adopt.py
  class ADOPT (line 39) | class ADOPT(Optimizer):
    method __init__ (line 40) | def __init__(
    method __setstate__ (line 104) | def __setstate__(self, state):
    method _init_group (line 126) | def _init_group(
    method step (line 192) | def step(self, closure=None):
  function _single_tensor_adopt (line 249) | def _single_tensor_adopt(
  function _multi_tensor_adopt (line 331) | def _multi_tensor_adopt(
  function adopt (line 460) | def adopt(

FILE: src/axolotl/utils/quantization.py
  function get_quantization_config (line 52) | def get_quantization_config(
  function quantize_model (line 138) | def quantize_model(
  function prepare_model_for_qat (line 176) | def prepare_model_for_qat(
  function convert_qat_model (line 231) | def convert_qat_model(

FILE: src/axolotl/utils/samplers/multipack.py
  function ffd_check (line 25) | def ffd_check(sequence_lengths: np.ndarray, bin_capacity: int, num_bins:...
  function pack_group (line 61) | def pack_group(
  function _process_group (line 115) | def _process_group(
  function pack_parallel (line 125) | def pack_parallel(
  function allocate_sequentially (line 194) | def allocate_sequentially(
  class MultipackBatchSampler (line 244) | class MultipackBatchSampler(BatchSampler):
    method __init__ (line 257) | def __init__(
    method set_epoch (line 305) | def set_epoch(self, epoch: int):
    method generate_batches (line 310) | def generate_batches(self, set_stats: bool = False) -> list[list[list[...
    method __iter__ (line 383) | def __iter__(self) -> Iterator[list[list[int]]]:
    method efficiency (line 395) | def efficiency(self) -> float:
    method gather_efficiency (line 406) | def gather_efficiency(self) -> float:
    method gather_len_batches (line 432) | def gather_len_batches(self, num: int) -> int:
    method __len__ (line 445) | def __len__(self) -> int:

FILE: src/axolotl/utils/samplers/utils.py
  function get_dataset_lengths (line 8) | def get_dataset_lengths(dataset, from_arrow=False):

FILE: src/axolotl/utils/schedulers.py
  class RexLR (line 12) | class RexLR(LRScheduler):
    method __init__ (line 29) | def __init__(
    method last_step (line 57) | def last_step(self):
    method last_step (line 61) | def last_step(self, value):
    method get_lr (line 64) | def get_lr(self):
  class InterpolatingLogScheduler (line 88) | class InterpolatingLogScheduler(LRScheduler):
    method __init__ (line 93) | def __init__(self, optimizer, num_steps, min_lr, max_lr, last_epoch=-1):
    method get_lr (line 113) | def get_lr(self):
  function _get_cosine_schedule_with_quadratic_warmup_lr_lambda (line 127) | def _get_cosine_schedule_with_quadratic_warmup_lr_lambda(
  function get_cosine_schedule_with_quadratic_warmup (line 144) | def get_cosine_schedule_with_quadratic_warmup(
  function _get_cosine_schedule_with_min_lr_lambda (line 182) | def _get_cosine_schedule_with_min_lr_lambda(
  function get_cosine_schedule_with_min_lr (line 201) | def get_cosine_schedule_with_min_lr(
  function _get_cosine_schedule_with_warmup_decay_constant_lr_lambda (line 222) | def _get_cosine_schedule_with_warmup_decay_constant_lr_lambda(
  function get_cosine_schedule_with_warmup_decay_constant (line 252) | def get_cosine_schedule_with_warmup_decay_constant(
  class JaggedLRRestartScheduler (line 299) | class JaggedLRRestartScheduler(LRScheduler):
    method __init__ (line 302) | def __init__(
    method get_lr (line 318) | def get_lr(self) -> float | Sequence[float]:

FILE: src/axolotl/utils/schemas/config.py
  class AxolotlInputConfig (line 57) | class AxolotlInputConfig(
    method datasets_serializer (line 1150) | def datasets_serializer(
    method warn_peft_trainable_token_to_fix_untrained (line 1159) | def warn_peft_trainable_token_to_fix_untrained(cls, data):
    method check_sageattn_wo_sample_packing (line 1179) | def check_sageattn_wo_sample_packing(cls, data):
    method check_sageattn_fft (line 1190) | def check_sageattn_fft(cls, data):
  class AxolotlConfigWCapabilities (line 1199) | class AxolotlConfigWCapabilities(AxolotlInputConfig):
    method check_bf16 (line 1206) | def check_bf16(self):
    method check_tf32 (line 1224) | def check_tf32(self):
    method check_fp8 (line 1230) | def check_fp8(self):
    method check_sample_packing_w_sdpa_bf16 (line 1241) | def check_sample_packing_w_sdpa_bf16(cls, data):
    method check_compute_capability_w_sageattn (line 1262) | def check_compute_capability_w_sageattn(cls, data):
    method check_multigpu_unsloth (line 1277) | def check_multigpu_unsloth(cls, data):
    method check_multigpu_lora_kernels (line 1292) | def check_multigpu_lora_kernels(cls, data):
    method check_quantize_moe_experts (line 1311) | def check_quantize_moe_experts(cls, data):
    method check_auto_enable_lora_kernels (line 1336) | def check_auto_enable_lora_kernels(cls, data):
    method check_adopt_torch_version (line 1392) | def check_adopt_torch_version(cls, data):
    method check_flex_torch_version (line 1410) | def check_flex_torch_version(cls, data):
    method check_torch_compile_auto (line 1428) | def check_torch_compile_auto(cls, data):
    method check_beta_and_trl_beta_match (line 1447) | def check_beta_and_trl_beta_match(cls, data):
    method check_min_torch_version (line 1454) | def check_min_torch_version(self):
    method check_qat_config (line 1466) | def check_qat_config(cls, data):
    method check_fsdp_torch_version (line 1495) | def check_fsdp_torch_version(cls, data):
    method default_dataloader_opts (line 1512) | def default_dataloader_opts(cls, data):
    method default_dataset_num_proc (line 1526) | def default_dataset_num_proc(cls, data):
    method check_deduplication_with_streaming (line 1546) | def check_deduplication_with_streaming(cls, data):
    method check_deduplication_with_skip_prepare (line 1557) | def check_deduplication_with_skip_prepare(cls, data):

FILE: src/axolotl/utils/schemas/datasets.py
  class UserDefinedPrompterType (line 11) | class UserDefinedPrompterType(BaseModel):
  class SFTDataset (line 39) | class SFTDataset(BaseModel):
    method handle_legacy_message_fields (line 200) | def handle_legacy_message_fields(cls, data):
    method check_chat_template_config (line 206) | def check_chat_template_config(cls, data):
  class PretrainingDataset (line 229) | class PretrainingDataset(BaseModel):
  class UserDefinedDPOType (line 242) | class UserDefinedDPOType(BaseModel):
  class DPODataset (line 254) | class DPODataset(BaseModel):
  class StepwiseSupervisedDataset (line 265) | class StepwiseSupervisedDataset(BaseModel):
  class UserDefinedKTOType (line 277) | class UserDefinedKTOType(BaseModel):
  class KTODataset (line 288) | class KTODataset(BaseModel):

FILE: src/axolotl/utils/schemas/deprecated.py
  class DeprecatedParameters (line 12) | class DeprecatedParameters(BaseModel):
    method validate_max_packed_sequence_len (line 27) | def validate_max_packed_sequence_len(cls, max_packed_sequence_len):
    method validate_rope_scaling (line 34) | def validate_rope_scaling(cls, rope_scaling):
    method validate_noisy_embedding_alpha (line 43) | def validate_noisy_embedding_alpha(cls, noisy_embedding_alpha):
    method validate_dpo_beta (line 50) | def validate_dpo_beta(cls, dpo_beta):
    method validate_evaluation_strategy (line 57) | def validate_evaluation_strategy(cls, evaluation_strategy):
    method validate_eval_table_size (line 64) | def validate_eval_table_size(cls, eval_table_size):
    method validate_eval_max_new_tokens (line 75) | def validate_eval_max_new_tokens(cls, eval_max_new_tokens):
    method validate_dpo_use_logits_to_keep (line 85) | def validate_dpo_use_logits_to_keep(cls, dpo_use_logits_to_keep):
    method validate_dpo_generate_during_eval (line 95) | def validate_dpo_generate_during_eval(cls, dpo_generate_during_eval):
  class RemappedParameters (line 104) | class RemappedParameters(BaseModel):

FILE: src/axolotl/utils/schemas/dynamic_checkpoint.py
  class DynamicCheckpointConfig (line 6) | class DynamicCheckpointConfig(BaseModel):

FILE: src/axolotl/utils/schemas/enums.py
  class TorchAOQuantDType (line 8) | class TorchAOQuantDType(Enum):
    method from_string (line 15) | def from_string(str):
  class RLType (line 28) | class RLType(str, Enum):
  class ChatTemplate (line 40) | class ChatTemplate(str, Enum):
  class CustomSupportedOptimizers (line 79) | class CustomSupportedOptimizers(str, Enum):
  class RingAttnFunc (line 97) | class RingAttnFunc(str, Enum):

FILE: src/axolotl/utils/schemas/fsdp.py
  class FSDPConfig (line 10) | class FSDPConfig(BaseModel):

FILE: src/axolotl/utils/schemas/integrations.py
  class MLFlowConfig (line 12) | class MLFlowConfig(BaseModel):
  class LISAConfig (line 33) | class LISAConfig(BaseModel):
  class WandbConfig (line 50) | class WandbConfig(BaseModel):
    method check_wandb_run (line 84) | def check_wandb_run(cls, data):
  class CometConfig (line 95) | class CometConfig(BaseModel):
  class GradioConfig (line 146) | class GradioConfig(BaseModel):
  class RayConfig (line 157) | class RayConfig(BaseModel):
  class OpenTelemetryConfig (line 181) | class OpenTelemetryConfig(BaseModel):
  class TrackioConfig (line 205) | class TrackioConfig(BaseModel):

FILE: src/axolotl/utils/schemas/internal/__init__.py
  class GPUCapabilities (line 8) | class GPUCapabilities(BaseModel):
  class EnvCapabilities (line 19) | class EnvCapabilities(BaseModel):

FILE: src/axolotl/utils/schemas/model.py
  class ModelInputConfig (line 12) | class ModelInputConfig(BaseModel):
    method hint_trust_remote_code (line 101) | def hint_trust_remote_code(cls, trust_remote_code):
  class ModelOutputConfig (line 109) | class ModelOutputConfig(BaseModel):
    method validate_save_safetensors (line 138) | def validate_save_safetensors(cls, v):
  class SpecialTokensConfig (line 149) | class SpecialTokensConfig(BaseModel):

FILE: src/axolotl/utils/schemas/multimodal.py
  class MultiModalConfig (line 9) | class MultiModalConfig(BaseModel):
    method convert_image_resize_algorithm (line 32) | def convert_image_resize_algorithm(cls, image_resize_algorithm):

FILE: src/axolotl/utils/schemas/peft.py
  class LoftQConfig (line 8) | class LoftQConfig(BaseModel):
  class PeftConfig (line 17) | class PeftConfig(BaseModel):
  class LoraConfig (line 28) | class LoraConfig(BaseModel):
    method validate_adapter (line 161) | def validate_adapter(cls, data):
    method validate_qlora (line 174) | def validate_qlora(self):
    method convert_loraplus_lr_embedding (line 200) | def convert_loraplus_lr_embedding(cls, loraplus_lr_embedding):
    method validate_lora_dropout (line 207) | def validate_lora_dropout(cls, data):
    method validate_lora_target_parameters_dropout (line 213) | def validate_lora_target_parameters_dropout(self):
  class ReLoRAConfig (line 226) | class ReLoRAConfig(BaseModel):

FILE: src/axolotl/utils/schemas/quantization.py
  function validate_ao_dtype (line 12) | def validate_ao_dtype(v: Any) -> TorchAOQuantDType | None:
  class QATConfig (line 31) | class QATConfig(BaseModel):
    method validate_dtype (line 57) | def validate_dtype(cls, v: Any) -> TorchAOQuantDType | None:
  class PTQConfig (line 61) | class PTQConfig(BaseModel):
    method validate_dtype (line 84) | def validate_dtype(cls, v: Any) -> TorchAOQuantDType | None:

FILE: src/axolotl/utils/schemas/training.py
  class LrGroup (line 15) | class LrGroup(BaseModel):
  class HyperparametersConfig (line 23) | class HyperparametersConfig(BaseModel):
    method hint_batch_size_set (line 168) | def hint_batch_size_set(cls, batch_size):
    method convert_learning_rate (line 179) | def convert_learning_rate(cls, learning_rate):
  class JaggedLRConfig (line 185) | class JaggedLRConfig(BaseModel):

FILE: src/axolotl/utils/schemas/trl.py
  class TRLConfig (line 8) | class TRLConfig(BaseModel):

FILE: src/axolotl/utils/schemas/utils.py
  function handle_legacy_message_fields_logic (line 8) | def handle_legacy_message_fields_logic(data: dict) -> dict:

FILE: src/axolotl/utils/schemas/validation.py
  class DatasetValidationMixin (line 22) | class DatasetValidationMixin:
    method set_default_seed (line 27) | def set_default_seed(cls, seed):
    method deprecate_sharegpt_datasets (line 35) | def deprecate_sharegpt_datasets(cls, datasets):
    method check_dataset_or_pretraining_dataset (line 57) | def check_dataset_or_pretraining_dataset(cls, data):
    method check_pretraining_streaming_deprecation (line 64) | def check_pretraining_streaming_deprecation(cls, data):
    method check_push_ds_auth (line 78) | def check_push_ds_auth(cls, data):
    method check_val_w_test_datasets (line 90) | def check_val_w_test_datasets(cls, data):
    method check_test_datasets_bench (line 99) | def check_test_datasets_bench(cls, data):
    method check_eval_packing (line 113) | def check_eval_packing(cls, data):
    method check_mm_prepare (line 148) | def check_mm_prepare(cls, data):
  class AttentionValidationMixin (line 159) | class AttentionValidationMixin:
    method check_attention_fields (line 164) | def check_attention_fields(cls, data):
    method check_sample_packing_without_attention (line 181) | def check_sample_packing_without_attention(cls, data):
    method check_sample_packing_with_s2attn (line 197) | def check_sample_packing_with_s2attn(cls, data):
    method check_scaling_softmax_requires_flex (line 207) | def check_scaling_softmax_requires_flex(cls, data):
  class TrainingValidationMixin (line 216) | class TrainingValidationMixin:
    method check_batch_size_fields (line 221) | def check_batch_size_fields(cls, data):
    method hint_sample_packing_padding (line 231) | def hint_sample_packing_padding(cls, data):
    method hint_reward_model_pad (line 247) | def hint_reward_model_pad(cls, data):
    method set_reward_model_defaults (line 258) | def set_reward_model_defaults(cls, data):
    method check_gas_bsz (line 275) | def check_gas_bsz(cls, data):
    method hint_eval_train_mbsz (line 284) | def hint_eval_train_mbsz(cls, data):
    method check_warmup (line 297) | def check_warmup(cls, data):
    method check_saves (line 304) | def check_saves(cls, data):
    method check_push_save (line 321) | def check_push_save(cls, data):
    method check_evals (line 332) | def check_evals(cls, data):
    method check_neftune (line 374) | def check_neftune(cls, data):
    method check_multipack_buffer_size (line 386) | def check_multipack_buffer_size(cls, data):
    method check_fft_possible_bad_config (line 409) | def check_fft_possible_bad_config(self):
    method check_fp8_config (line 427) | def check_fp8_config(cls, data):
    method check_use_reentrant_mismatch (line 457) | def check_use_reentrant_mismatch(cls, data):
    method check_eval_strategy (line 472) | def check_eval_strategy(cls, data):
    method check_causal_lm_evals (line 485) | def check_causal_lm_evals(cls, data):
    method check_tokenizer_use_mistral_common (line 503) | def check_tokenizer_use_mistral_common(cls, data):
    method check_mistral_common_import (line 522) | def check_mistral_common_import(cls, tokenizer_use_mistral_common):
    method check_mistral_common_incompatible_options (line 535) | def check_mistral_common_incompatible_options(cls, data):
    method pretrain_with_tps (line 565) | def pretrain_with_tps(cls, data):
  class LoRAValidationMixin (line 578) | class LoRAValidationMixin:
    method check_lr_groups (line 583) | def check_lr_groups(cls, data):
    method check_frozen (line 590) | def check_frozen(cls, data):
    method check_peft_layers_pattern (line 603) | def check_peft_layers_pattern(cls, data):
    method check_qlora_unsloth (line 612) | def check_qlora_unsloth(cls, data):
    method check_lora_axolotl_unsloth (line 626) | def check_lora_axolotl_unsloth(cls, data):
    method check_fused_lora (line 641) | def check_fused_lora(self):
    method warn_qlora_zero3_w_use_reentrant (line 648) | def warn_qlora_zero3_w_use_reentrant(cls, data):
    method check_lora_kernels_8bit (line 668) | def check_lora_kernels_8bit(cls, data):
    method check_lora_kernels_dora (line 683) | def check_lora_kernels_dora(cls, data):
    method check_lora_kernels_trust_remote_code (line 697) | def check_lora_kernels_trust_remote_code(cls, data):
  class RLValidationMixin (line 711) | class RLValidationMixin:
    method check_sample_packing_w_rl (line 716) | def check_sample_packing_w_rl(cls, data):
    method check_kto_config (line 723) | def check_kto_config(cls, data):
    method check_grpo_liger_sequence_parallel (line 734) | def check_grpo_liger_sequence_parallel(cls, data):
    method check_rl_config_gradient_checkpointing (line 746) | def check_rl_config_gradient_checkpointing(cls, data):
    method check_gdpo (line 770) | def check_gdpo(cls, data):
  class OptimizationValidationMixin (line 782) | class OptimizationValidationMixin:
    method check_adamw_optimizer_params (line 786) | def check_adamw_optimizer_params(self):
    method _resolve_fsdp_version (line 794) | def _resolve_fsdp_version(data):
    method check_muon_deepspeed_fsdp (line 803) | def check_muon_deepspeed_fsdp(cls, data):
    method check_flashoptim_deepspeed_fsdp (line 819) | def check_flashoptim_deepspeed_fsdp(cls, data):
    method check_batch_flattening_fa (line 838) | def check_batch_flattening_fa(cls, data):
    method check_xentropy_patch_conflicts (line 862) | def check_xentropy_patch_conflicts(cls, data):
    method check_cross_entropy_conflicts (line 873) | def check_cross_entropy_conflicts(cls, data):
    method check_fsdp_version (line 903) | def check_fsdp_version(cls, data):
    method check_fsdp2_cpu_offload_pin_memory (line 916) | def check_fsdp2_cpu_offload_pin_memory(cls, data):
    method check_fsdp2_base_model_quant_rl (line 933) | def check_fsdp2_base_model_quant_rl(cls, data):
    method check_fsdp_config_kwargs_prefix (line 949) | def check_fsdp_config_kwargs_prefix(cls, data):
    method check_fsdp_version_in_fsdp_config (line 971) | def check_fsdp_version_in_fsdp_config(cls, data):
    method check_fsdp_offload_w_8bit_optimizer (line 985) | def check_fsdp_offload_w_8bit_optimizer(self):
    method check_fsdp2_w_8bit_optimizer (line 1000) | def check_fsdp2_w_8bit_optimizer(self):
    method check_tensor_parallel_size_update_ds_json (line 1018) | def check_tensor_parallel_size_update_ds_json(cls, data):
    method check_deepcompile (line 1050) | def check_deepcompile(cls, data):
  class SystemValidationMixin (line 1069) | class SystemValidationMixin:
    method check_mem_mismatch (line 1074) | def check_mem_mismatch(cls, data):
    method check_fsdp_deepspeed (line 1086) | def check_fsdp_deepspeed(cls, data):
    method check_model_quantization_config_vs_bnb (line 1093) | def check_model_quantization_config_vs_bnb(cls, data):
    method check_npu_config (line 1103) | def check_npu_config(cls, data):
  class ChatTemplateValidationMixin (line 1136) | class ChatTemplateValidationMixin:
    method check_chat_template_config (line 1141) | def check_chat_template_config(cls, data):
  class PretrainingValidationMixin (line 1157) | class PretrainingValidationMixin:
    method check_pretraining_w_max_steps (line 1162) | def check_pretraining_w_max_steps(cls, data):
    method check_pretraining_w_group_by_length (line 1171) | def check_pretraining_w_group_by_length(cls, data):
    method check_pretraining_split_batches_accelerate (line 1180) | def check_pretraining_split_batches_accelerate(cls, data):
    method check_pretraining_w_val_set_size (line 1198) | def check_pretraining_w_val_set_size(cls, data):
    method check_streaming_w_val_set_size (line 1208) | def check_streaming_w_val_set_size(cls, data):
    method check_streaming_w_max_steps (line 1218) | def check_streaming_w_max_steps(cls, data):
    method check_streaming_w_multiple_datasets (line 1228) | def check_streaming_w_multiple_datasets(cls, data):
  class ModelCompatibilityValidationMixin (line 1241) | class ModelCompatibilityValidationMixin:
    method check_falcon_fsdp (line 1245) | def check_falcon_fsdp(self):
    method check_mpt_checkpointing (line 1251) | def check_mpt_checkpointing(self):
    method check_gradient_checkpointing_w_offload (line 1259) | def check_gradient_checkpointing_w_offload(self):
    method check_activation_offloading_wo_gc (line 1278) | def check_activation_offloading_wo_gc(self):
    method check_better_transformers (line 1284) | def check_better_transformers(self):
    method check_gptq_w_revision (line 1301) | def check_gptq_w_revision(cls, data):
    method check_gpt_oss_fsdp_loading (line 1312) | def check_gpt_oss_fsdp_loading(cls, data):
  class ComplexValidationMixin (line 1322) | class ComplexValidationMixin:
    method validate_neftune_noise_alpha (line 1327) | def validate_neftune_noise_alpha(cls, neftune_noise_alpha):
    method check_rl_beta (line 1333) | def check_rl_beta(self):
    method check_simpo_warmup (line 1340) | def check_simpo_warmup(self):
    method check_relora (line 1348) | def check_relora(self):
    method check_early_stopping (line 1371) | def check_early_stopping(self):
    method check_tensor_parallel_size (line 1384) | def check_tensor_parallel_size(self):
    method check_context_parallel_size (line 1390) | def check_context_parallel_size(self):
    method validate_ring_attn_func (line 1446) | def validate_ring_attn_func(self):
    method hint_gradient_checkpointing_dpo_lora_ddp (line 1463) | def hint_gradient_checkpointing_dpo_lora_ddp(self):
  class DistributedValidationMixin (line 1479) | class DistributedValidationMixin:
    method check_tensor_parallel_optimizer (line 1483) | def check_tensor_parallel_optimizer(self):
  class GRPOVllmValidationMixin (line 1493) | class GRPOVllmValidationMixin:
    method check_vllm_mode_set (line 1497) | def check_vllm_mode_set(self):
  class ValidationMixin (line 1506) | class ValidationMixin(

FILE: src/axolotl/utils/schemas/vllm.py
  class VllmConfig (line 8) | class VllmConfig(BaseModel):

FILE: src/axolotl/utils/tee.py
  class _FileOnlyWriter (line 23) | class _FileOnlyWriter(io.TextIOBase):
    method write (line 29) | def write(self, s: str) -> int:  # type: ignore[override]
    method flush (line 36) | def flush(self) -> None:  # type: ignore[override]
  class _StreamTee (line 48) | class _StreamTee(io.TextIOBase):
    method __init__ (line 54) | def __init__(self, stream: io.TextIOBase):
    method write (line 57) | def write(self, s: str) -> int:  # type: ignore[override]
    method flush (line 64) | def flush(self) -> None:  # type: ignore[override]
    method encoding (line 74) | def encoding(self):  # type: ignore[override]
    method errors (line 78) | def errors(self):  # type: ignore[override]
    method isatty (line 81) | def isatty(self):  # type: ignore[override]
    method fileno (line 84) | def fileno(self):  # type: ignore[override]
  function prepare_debug_log (line 90) | def prepare_debug_log(cfg, filename: str = "debug.log") -> str:
  function close_debug_log (line 140) | def close_debug_log() -> None:

FILE: src/axolotl/utils/tokenization.py
  function check_dataset_labels (line 10) | def check_dataset_labels(
  function check_example_labels (line 25) | def check_example_labels(example, tokenizer, text_only=False):
  function color_token_for_rl_debug (line 57) | def color_token_for_rl_debug(decoded_token, encoded_token, color, text_o...
  function process_tokens_for_rl_debug (line 67) | def process_tokens_for_rl_debug(tokens, color, tokenizer, text_only):
  function check_rl_example_labels (line 76) | def check_rl_example_labels(example, tokenizer, text_only=False):

FILE: src/axolotl/utils/trackio_.py
  function setup_trackio_env_vars (line 8) | def setup_trackio_env_vars(cfg: DictDefault):

FILE: src/axolotl/utils/train.py
  function determine_last_checkpoint (line 11) | def determine_last_checkpoint(cfg: DictDefault, update: bool = True) -> ...

FILE: src/axolotl/utils/trainer.py
  function weighted_cross_entropy (line 29) | def weighted_cross_entropy(
  function create_weighted_mask (line 47) | def create_weighted_mask(labels: torch.Tensor):
  function trainer_weighted_loss (line 78) | def trainer_weighted_loss(model_output, labels, shift_labels=True):
  function disable_datasets_caching (line 91) | def disable_datasets_caching():
  function add_position_ids (line 99) | def add_position_ids(sample):
  function add_pose_position_ids (line 138) | def add_pose_position_ids(
  function add_length (line 203) | def add_length(sample):
  function filter_sequences_by_length (line 208) | def filter_sequences_by_length(
  function process_datasets_for_packing (line 252) | def process_datasets_for_packing(cfg, train_dataset, eval_dataset):
  function process_pretraining_datasets_for_packing (line 386) | def process_pretraining_datasets_for_packing(
  function calculate_total_num_steps (line 408) | def calculate_total_num_steps(cfg, train_dataset, update=True):
  function setup_torch_compile_env (line 525) | def setup_torch_compile_env(cfg):
  function setup_deepspeed_env (line 533) | def setup_deepspeed_env(cfg, stage=None):
  function setup_fsdp_envs (line 589) | def setup_fsdp_envs(cfg):
  function setup_parallelism_envs (line 621) | def setup_parallelism_envs(cfg):
  function prepare_optim_env (line 643) | def prepare_optim_env(cfg):
  function setup_trainer (line 679) | def setup_trainer(

FILE: src/axolotl/utils/wandb_.py
  function setup_wandb_env_vars (line 8) | def setup_wandb_env_vars(cfg: DictDefault):

FILE: src/setuptools_axolotl_dynamic_dependencies.py
  function parse_requirements (line 12) | def parse_requirements():
  class BuildPyCommand (line 94) | class BuildPyCommand(_build_py):
    method finalize_options (line 99) | def finalize_options(self):

FILE: tests/cli/conftest.py
  function cli_runner (line 22) | def cli_runner():
  function valid_test_config (line 27) | def valid_test_config():
  function config_path (line 32) | def config_path(tmp_path):

FILE: tests/cli/test_cli_base.py
  class BaseCliTest (line 9) | class BaseCliTest:
    method _test_cli_validation (line 12) | def _test_cli_validation(self, cli_runner, command: str):
    method _test_basic_execution (line 30) | def _test_basic_execution(
    method _test_cli_overrides (line 78) | def _test_cli_overrides(self, tmp_path: Path, valid_test_config: str):

FILE: tests/cli/test_cli_evaluate.py
  class TestEvaluateCommand (line 10) | class TestEvaluateCommand(BaseCliTest):
    method test_evaluate_cli_validation (line 15) | def test_evaluate_cli_validation(self, cli_runner):
    method test_evaluate_basic_execution (line 19) | def test_evaluate_basic_execution(self, cli_runner, tmp_path, valid_te...
    method test_evaluate_basic_execution_no_accelerate (line 25) | def test_evaluate_basic_execution_no_accelerate(
    method test_evaluate_cli_overrides (line 47) | def test_evaluate_cli_overrides(self, cli_runner, tmp_path, valid_test...
    method test_evaluate_with_launcher_args_torchrun (line 73) | def test_evaluate_with_launcher_args_torchrun(
    method test_evaluate_with_launcher_args_accelerate (line 106) | def test_evaluate_with_launcher_args_accelerate(
    method test_evaluate_backward_compatibility_no_launcher_args (line 140) | def test_evaluate_backward_compatibility_no_launcher_args(

FILE: tests/cli/test_cli_fetch.py
  function test_fetch_cli_examples (line 8) | def test_fetch_cli_examples(cli_runner):
  function test_fetch_cli_deepspeed (line 17) | def test_fetch_cli_deepspeed(cli_runner):
  function test_fetch_cli_with_dest (line 26) | def test_fetch_cli_with_dest(cli_runner, tmp_path):
  function test_fetch_cli_invalid_directory (line 36) | def test_fetch_cli_invalid_directory(cli_runner):

FILE: tests/cli/test_cli_inference.py
  function test_inference_basic (line 8) | def test_inference_basic(cli_runner, config_path):
  function test_inference_gradio (line 21) | def test_inference_gradio(cli_runner, config_path):
  function test_inference_with_launcher_args_torchrun (line 34) | def test_inference_with_launcher_args_torchrun(cli_runner, config_path):
  function test_inference_with_launcher_args_accelerate (line 63) | def test_inference_with_launcher_args_accelerate(cli_runner, config_path):
  function test_inference_gradio_with_launcher_args (line 93) | def test_inference_gradio_with_launcher_args(cli_runner, config_path):
  function test_inference_backward_compatibility_no_launcher_args (line 123) | def test_inference_backward_compatibility_no_launcher_args(cli_runner, c...

FILE: tests/cli/test_cli_interface.py
  function test_build_command (line 6) | def test_build_command():
  function test_invalid_command_options (line 28) | def test_invalid_command_options(cli_runner):
  function test_required_config_argument (line 43) | def test_required_config_argument(cli_runner):

FILE: tests/cli/test_cli_merge_lora.py
  function test_merge_lora_basic (line 8) | def test_merge_lora_basic(cli_runner, config_path):
  function test_merge_lora_with_dirs (line 18) | def test_merge_lora_with_dirs(cli_runner, config_path, tmp_path):
  function test_merge_lora_nonexistent_config (line 44) | def test_merge_lora_nonexistent_config(cli_runner, tmp_path):
  function test_merge_lora_nonexistent_lora_dir (line 51) | def test_merge_lora_nonexistent_lora_dir(cli_runner, config_path, tmp_pa...

FILE: tests/cli/test_cli_merge_sharded_fsdp_weights.py
  function test_merge_sharded_fsdp_weights_no_accelerate (line 8) | def test_merge_sharded_fsdp_weights_no_accelerate(cli_runner, config_path):
  function test_merge_sharded_fsdp_weights_with_launcher_args_torchrun (line 21) | def test_merge_sharded_fsdp_weights_with_launcher_args_torchrun(
  function test_merge_sharded_fsdp_weights_with_launcher_args_accelerate (line 52) | def test_merge_sharded_fsdp_weights_with_launcher_args_accelerate(
  function test_merge_sharded_fsdp_weights_backward_compatibility_no_launcher_args (line 84) | def test_merge_sharded_fsdp_weights_backward_compatibility_no_launcher_a...

FILE: tests/cli/test_cli_preprocess.py
  function cleanup_last_run_prepared (line 13) | def cleanup_last_run_prepared():
  function test_preprocess_config_not_found (line 20) | def test_preprocess_config_not_found(cli_runner):
  function test_preprocess_basic (line 26) | def test_preprocess_basic(cli_runner, config_path):
  function test_preprocess_without_download (line 40) | def test_preprocess_without_download(cli_runner, config_path):
  function test_preprocess_custom_path (line 53) | def test_preprocess_custom_path(cli_runner, tmp_path, valid_test_config):

FILE: tests/cli/test_cli_sweeps.py
  function test_generate_sweep_configs_no_pairs (line 8) | def test_generate_sweep_configs_no_pairs():
  function test_generate_sweep_configs_with_pairs (line 33) | def test_generate_sweep_configs_with_pairs():

FILE: tests/cli/test_cli_train.py
  class TestTrainCommand (line 10) | class TestTrainCommand(BaseCliTest):
    method test_train_cli_validation (line 15) | def test_train_cli_validation(self, cli_runner):
    method test_train_basic_execution (line 19) | def test_train_basic_execution(self, cli_runner, tmp_path, valid_test_...
    method test_train_basic_execution_no_accelerate (line 25) | def test_train_basic_execution_no_accelerate(
    method test_train_cli_overrides (line 51) | def test_train_cli_overrides(self, cli_runner, tmp_path, valid_test_co...
    method test_train_with_launcher_args_torchrun (line 79) | def test_train_with_launcher_args_torchrun(
    method test_train_with_launcher_args_accelerate (line 112) | def test_train_with_launcher_args_accelerate(
    method test_train_backward_compatibility_no_launcher_args (line 147) | def test_train_backward_compatibility_no_launcher_args(
    method test_train_mixed_args_with_launcher_args (line 182) | def test

Download .json

Condensed preview — 1070 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (6,005K chars).

[
  {
    "path": ".axolotl-complete.bash",
    "chars": 1435,
    "preview": "#!/bin/bash\n\n_axolotl_completions() {\n    local cur prev\n    COMPREPLY=()\n    cur=\"${COMP_WORDS[COMP_CWORD]}\"\n    prev=\""
  },
  {
    "path": ".bandit",
    "chars": 53,
    "preview": "[bandit]\nexclude = tests\nskips = B101,B615,B102,B110\n"
  },
  {
    "path": ".coderabbit.yaml",
    "chars": 412,
    "preview": "# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json\nlanguage: \"en-US\"\nearly_access: false\n"
  },
  {
    "path": ".coveragerc",
    "chars": 213,
    "preview": "[run]\nsource = axolotl\nomit =\n    */tests/*\n    setup.py\n\n[report]\nexclude_lines =\n    pragma: no cover\n    def __repr__"
  },
  {
    "path": ".editorconfig",
    "chars": 186,
    "preview": "root = true\n\n[*]\nend_of_line = lf\ninsert_final_newline = true\ntrim_trailing_whitespace = true\n\n[*.py]\nindent_style = spa"
  },
  {
    "path": ".gitattributes",
    "chars": 49,
    "preview": "data/*.jsonl filter=lfs diff=lfs merge=lfs -text\n"
  },
  {
    "path": ".github/CODE_OF_CONDUCT.md",
    "chars": 5242,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nWe as members, contributors, and leaders pledge to make participa"
  },
  {
    "path": ".github/CONTRIBUTING.md",
    "chars": 4073,
    "preview": "# Contributing to axolotl\n\nFirst of all, thank you for your interest in contributing to axolotl! We appreciate the time "
  },
  {
    "path": ".github/FUNDING.yml",
    "chars": 803,
    "preview": "# These are supported funding model platforms\n\ngithub: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [u"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yaml",
    "chars": 3509,
    "preview": "name: Bug Report\ndescription: File a bug report\nlabels: [\"bug\", \"needs triage\"]\nbody:\n  - type: markdown\n    attributes:"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 294,
    "preview": "blank_issues_enabled: false\ncontact_links:\n  - name: Ask a question\n    url: https://github.com/axolotl-ai-cloud/axolotl"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/docs.yml",
    "chars": 1873,
    "preview": "name: Documentation Improvement / Clarity\ndescription: Make a suggestion to improve the project documentation.\nlabels: ["
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.yaml",
    "chars": 2466,
    "preview": "name: Feature Request / Enhancement\ndescription: Suggest a new feature or feature enhancement for the project\nlabels: [\""
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 1049,
    "preview": "<!--- Provide a general summary of your changes in the Title above -->\n\n# Description\n\n<!--- Describe your changes in de"
  },
  {
    "path": ".github/SECURITY.md",
    "chars": 407,
    "preview": "# Security Policy\n\n## Supported Versions\n\nDue to the nature of the fast development that is happening in this project, o"
  },
  {
    "path": ".github/SUPPORT.md",
    "chars": 474,
    "preview": "# Support\n\nIf you need help with this project or have questions, please:\n\n1. Check the documentation.\n2. Search the exis"
  },
  {
    "path": ".github/release-drafter.yml",
    "chars": 632,
    "preview": "name-template: 'v$RESOLVED_VERSION'\ntag-template: 'v$RESOLVED_VERSION'\ncategories:\n  - title: '🚀 Features'\n    labels:\n "
  },
  {
    "path": ".github/workflows/base.yml",
    "chars": 10420,
    "preview": "name: ci-cd-base\n\non:\n  push:\n    branches:\n      - \"main\"\n    paths:\n      - 'docker/Dockerfile-base'\n      - 'docker/D"
  },
  {
    "path": ".github/workflows/docs.yml",
    "chars": 1030,
    "preview": "name: Publish Docs\non:\n  push:\n    branches:\n      - main\n\npermissions:\n    contents: write\n    pages: write\n\njobs:\n    "
  },
  {
    "path": ".github/workflows/lint.yml",
    "chars": 715,
    "preview": "name: lint\non:\n  # check on PRs, and manual triggers\n  merge_group:\n  pull_request:\n      types: [opened, synchronize, r"
  },
  {
    "path": ".github/workflows/main.yml",
    "chars": 14905,
    "preview": "name: ci-cd\n\non:\n  push:\n    branches:\n      - \"main\"\n    tags:\n      - \"v*\"\n  workflow_dispatch:\n\npermissions:\n  conten"
  },
  {
    "path": ".github/workflows/multi-gpu-e2e.yml",
    "chars": 3018,
    "preview": "name: docker-multigpu-tests-biweekly\n\non:\n  pull_request:\n    paths:\n      - 'tests/e2e/multigpu/**.py'\n      - 'require"
  },
  {
    "path": ".github/workflows/nightlies.yml",
    "chars": 4037,
    "preview": "name: docker-nightlies\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron: '0 0 * * *'  # Runs at 00:00 UTC every day\n\nper"
  },
  {
    "path": ".github/workflows/precommit-autoupdate.yml",
    "chars": 1113,
    "preview": "name: Pre-commit auto-update\n\non:\n  schedule:\n    - cron: '0 0 1 * *'  # Run monthly\n  workflow_dispatch:  # Manual kick"
  },
  {
    "path": ".github/workflows/preview-docs.yml",
    "chars": 2377,
    "preview": "name: Preview\non:\n  workflow_dispatch:\n  pull_request:\n    types: [opened, synchronize, reopened, ready_for_review]\n\n   "
  },
  {
    "path": ".github/workflows/pypi.yml",
    "chars": 1617,
    "preview": "name: publish pypi\n\non:\n  push:\n    tags:\n      - \"v*\"\n  workflow_dispatch:\n\npermissions: {}\n\njobs:\n  setup_release:\n   "
  },
  {
    "path": ".github/workflows/tests-nightly.yml",
    "chars": 7380,
    "preview": "name: Tests Nightly against upstream main\non:\n  workflow_dispatch:\n  schedule:\n    - cron: '0 0 * * *'  # Runs at 00:00 "
  },
  {
    "path": ".github/workflows/tests.yml",
    "chars": 14804,
    "preview": "name: Tests\non:\n  # check on push/merge to main, PRs, and manual triggers\n  merge_group:\n  push:\n    branches:\n      - \""
  },
  {
    "path": ".gitignore",
    "chars": 3445,
    "preview": "**/axolotl.egg-info\nconfigs\nlast_run_prepared/\noutputs\n.vscode\n_site/\n\n# Byte-compiled / optimized / DLL files\n__pycache"
  },
  {
    "path": ".mypy.ini",
    "chars": 899,
    "preview": "[mypy]\nplugins = pydantic.mypy\nexclude = venv\n\n[mypy-alpaca_lora_4bit.*]\nignore_missing_imports = True\n\n[mypy-axolotl.mo"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 799,
    "preview": "default_language_version:\n    python: python3\n\nrepos:\n-   repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: "
  },
  {
    "path": ".runpod/.gitignore",
    "chars": 3102,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".runpod/Dockerfile",
    "chars": 662,
    "preview": "FROM axolotlai/axolotl-cloud:main-py3.11-cu124-2.6.0\n\nCOPY .runpod/requirements.txt /requirements.txt\nRUN --mount=type=c"
  },
  {
    "path": ".runpod/README.md",
    "chars": 14939,
    "preview": "<h1>LLM Post Training- Full fine-tune, LoRA, QLoRa etc. Llama/Mistral/Gemma and more</h1>\n\n# Configuration Options\n\nThis"
  },
  {
    "path": ".runpod/hub.json",
    "chars": 2334,
    "preview": "{\n  \"title\": \"Axolotl Fine-Tuning\",\n  \"description\": \"Serverless fine-tuning of open-source LLMs with Axolotl. Supports "
  },
  {
    "path": ".runpod/requirements.txt",
    "chars": 341,
    "preview": "# Required Python packages get listed here, one per line.\n# Reccomended to lock the version number to avoid unexpected c"
  },
  {
    "path": ".runpod/src/config/config.yaml",
    "chars": 20840,
    "preview": "# # This is the huggingface model that contains *.pt, *.safetensors, or *.bin files\n# # This can also be a relative path"
  },
  {
    "path": ".runpod/src/handler.py",
    "chars": 1865,
    "preview": "\"\"\"\nRunpod serverless entrypoint handler\n\"\"\"\n\nimport os\n\nimport runpod\nimport yaml\nfrom huggingface_hub._login import lo"
  },
  {
    "path": ".runpod/src/test_input.json",
    "chars": 1557,
    "preview": "{\n  \"input\": {\n    \"user_id\": \"user\",\n    \"model_id\": \"llama-test\",\n    \"run_id\": \"llama-test\",\n    \"credentials\": {\n   "
  },
  {
    "path": ".runpod/src/train.py",
    "chars": 1430,
    "preview": "\"\"\"\nRunpod train entrypoint\n\"\"\"\n\nimport asyncio\n\n\nasync def train(config_path: str, gpu_id: str = \"0\", preprocess: bool "
  },
  {
    "path": ".runpod/src/utils.py",
    "chars": 2740,
    "preview": "\"\"\"\nRunpod launcher utils\n\"\"\"\n\nimport os\n\nimport yaml\n\n\ndef get_output_dir(run_id):\n    path = f\"fine-tuning/{run_id}\"\n "
  },
  {
    "path": ".runpod/test-input.json",
    "chars": 2024,
    "preview": "{\n  \"input\": {\n    \"name\": \"quick_smoke_test_sft\",\n    \"user_id\": \"user\",\n    \"model_id\": \"llama-test\",\n    \"run_id\": \"l"
  },
  {
    "path": ".runpod/tests.json",
    "chars": 2297,
    "preview": "{\n  \"tests\": [\n    {\n      \"name\": \"quick_smoke_test_sft\",\n      \"input\": {\n        \"user_id\": \"user\",\n        \"model_id"
  },
  {
    "path": "CITATION.cff",
    "chars": 340,
    "preview": "cff-version: 1.2.0\ntype: software\ntitle: \"Axolotl: Open Source LLM Post-Training\"\nmessage: \"If you use this software, pl"
  },
  {
    "path": "CNAME",
    "chars": 16,
    "preview": "docs.axolotl.ai\n"
  },
  {
    "path": "FAQS.md",
    "chars": 648,
    "preview": "# FAQs\n\n- Can you train StableLM with this? Yes, but only with a single GPU atm. Multi GPU support is coming soon! Just "
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "MANIFEST.in",
    "chars": 204,
    "preview": "include requirements.txt\ninclude README.md\ninclude LICENSE\ninclude src/setuptools_axolotl_dynamic_dependencies.py\ninclud"
  },
  {
    "path": "README.md",
    "chars": 15284,
    "preview": "<p align=\"center\">\n    <picture>\n        <source media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercont"
  },
  {
    "path": "VERSION",
    "chars": 12,
    "preview": "0.16.0.dev0\n"
  },
  {
    "path": "_quarto.yml",
    "chars": 11132,
    "preview": "project:\n  type: website\n  pre-render:\n   - docs/scripts/generate_config_docs.py\n   - docs/scripts/generate_examples_doc"
  },
  {
    "path": "benchmarks/bench_entropy.py",
    "chars": 6320,
    "preview": "\"\"\"Benchmark for entropy_from_logits Triton kernel vs original chunked implementation.\n\nUsage: CUDA_VISIBLE_DEVICES=0 py"
  },
  {
    "path": "benchmarks/bench_scattermoe_lora.py",
    "chars": 8845,
    "preview": "\"\"\"Benchmark for ScatterMoE LoRA Triton kernels.\n\nMeasures forward, backward dX, and backward dA/dB kernels at common Mo"
  },
  {
    "path": "benchmarks/bench_selective_logsoftmax.py",
    "chars": 5661,
    "preview": "\"\"\"Benchmark for selective_log_softmax Triton kernel vs original implementation.\n\nUsage: CUDA_VISIBLE_DEVICES=0 python b"
  },
  {
    "path": "cicd/Dockerfile-uv.jinja",
    "chars": 2355,
    "preview": "FROM axolotlai/axolotl-base-uv:{{ BASE_TAG }}\n\nENV TORCH_CUDA_ARCH_LIST=\"7.0 7.5 8.0 8.6 9.0+PTX\"\nENV AXOLOTL_EXTRAS=\"{{"
  },
  {
    "path": "cicd/Dockerfile.jinja",
    "chars": 2347,
    "preview": "FROM axolotlai/axolotl-base:{{ BASE_TAG }}\n\nENV TORCH_CUDA_ARCH_LIST=\"7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX\"\nENV AXOLOTL_EXTRA"
  },
  {
    "path": "cicd/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cicd/cicd.sh",
    "chars": 1901,
    "preview": "#!/bin/bash\nset -e\n\npython -c \"import torch; assert '$PYTORCH_VERSION' in torch.__version__\"\n\ncurl -L https://axolotl-ci"
  },
  {
    "path": "cicd/cleanup.py",
    "chars": 358,
    "preview": "\"\"\"Modal app to run axolotl GPU cleanup\"\"\"\n\nfrom .single_gpu import VOLUME_CONFIG, app, cicd_image, run_cmd\n\n\n@app.funct"
  },
  {
    "path": "cicd/cleanup.sh",
    "chars": 297,
    "preview": "#!/bin/bash\nset -e\n\n# cleanup old cache files for datasets processing and intermediate mappings\nfind /workspace/data/hug"
  },
  {
    "path": "cicd/e2e_tests.py",
    "chars": 404,
    "preview": "\"\"\"Modal app to run axolotl GPU tests\"\"\"\n\nfrom .single_gpu import GPU_CONFIG, VOLUME_CONFIG, app, cicd_image, run_cmd\n\n\n"
  },
  {
    "path": "cicd/multigpu.py",
    "chars": 2403,
    "preview": "\"\"\"\nmodal application to run axolotl gpu tests in Modal\n\"\"\"\n\nimport os\nimport pathlib\nimport tempfile\n\nimport jinja2\nimp"
  },
  {
    "path": "cicd/multigpu.sh",
    "chars": 852,
    "preview": "#!/bin/bash\nset -e\n\n# Only run two tests at a time to avoid OOM on GPU (with coverage collection)\npytest -v --durations="
  },
  {
    "path": "cicd/single_gpu.py",
    "chars": 2370,
    "preview": "\"\"\"Modal app to run axolotl GPU tests\"\"\"\n\nimport os\nimport pathlib\nimport tempfile\n\nimport jinja2\nimport modal\nimport mo"
  },
  {
    "path": "codecov.yml",
    "chars": 1046,
    "preview": "codecov:\n  require_ci_to_pass: yes\n  notify:\n    wait_for_ci: true\n\ncoverage:\n  precision: 2\n  round: down\n  range: \"70."
  },
  {
    "path": "deepspeed_configs/zero1.json",
    "chars": 484,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 1,\n    \"overlap_comm\": true\n  },\n  \"bf16\": {\n    \"enabled\": \"auto\"\n  },\n  \"fp16\""
  },
  {
    "path": "deepspeed_configs/zero1_torch_compile.json",
    "chars": 552,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 1,\n    \"overlap_comm\": true\n  },\n  \"bf16\": {\n    \"enabled\": \"auto\"\n  },\n  \"fp16\""
  },
  {
    "path": "deepspeed_configs/zero2.json",
    "chars": 574,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 2,\n    \"offload_optimizer\": {\n      \"device\": \"cpu\"\n    },\n    \"contiguous_gradi"
  },
  {
    "path": "deepspeed_configs/zero2_torch_compile.json",
    "chars": 642,
    "preview": "{\n  \"compile\": {\n    \"disable\": false,\n    \"backend\": \"inductor\"\n  },\n  \"zero_optimization\": {\n    \"stage\": 2,\n    \"offl"
  },
  {
    "path": "deepspeed_configs/zero3.json",
    "chars": 777,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 3,\n    \"overlap_comm\": true,\n    \"contiguous_gradients\": true,\n    \"sub_group_si"
  },
  {
    "path": "deepspeed_configs/zero3_bf16.json",
    "chars": 583,
    "preview": "{\n  \"zero_optimization\": {\n    \"stage\": 3,\n    \"overlap_comm\": true,\n    \"contiguous_gradients\": true,\n    \"sub_group_si"
  },
  {
    "path": "deepspeed_configs/zero3_bf16_cpuoffload_all.json",
    "chars": 824,
    "preview": "{\n  \"zero_force_ds_cpu_optimizer\": false,\n  \"zero_allow_untested_optimizer\": true,\n  \"zero_optimization\": {\n    \"stage\":"
  },
  {
    "path": "deepspeed_configs/zero3_bf16_cpuoffload_params.json",
    "chars": 742,
    "preview": "{\n  \"zero_force_ds_cpu_optimizer\": false,\n  \"zero_allow_untested_optimizer\": true,\n  \"zero_optimization\": {\n    \"stage\":"
  },
  {
    "path": "devtools/README.md",
    "chars": 158,
    "preview": "This directory contains example config files that might be useful for debugging. Please see [docs/debugging.qmd](../docs"
  },
  {
    "path": "devtools/dev_chat_template.yml",
    "chars": 939,
    "preview": "# Example config for debugging the chat_template prompt format\nbase_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0\nmodel_type"
  },
  {
    "path": "docker/Dockerfile",
    "chars": 1691,
    "preview": "ARG BASE_TAG=main-base\nFROM axolotlai/axolotl-base:$BASE_TAG\n\nARG TORCH_CUDA_ARCH_LIST=\"7.0 7.5 8.0 8.6+PTX\"\nARG AXOLOTL"
  },
  {
    "path": "docker/Dockerfile-base",
    "chars": 3340,
    "preview": "ARG CUDA_VERSION=\"11.8.0\"\nARG CUDNN_VERSION=\"8\"\nARG UBUNTU_VERSION=\"22.04\"\nARG MAX_JOBS=4\nARG TARGETARCH\n\nFROM nvidia/cu"
  },
  {
    "path": "docker/Dockerfile-base-next",
    "chars": 1501,
    "preview": "ARG CUDA_VERSION=\"12.8.1\"\nARG CUDNN_VERSION=\"8\"\nARG UBUNTU_VERSION=\"22.04\"\nARG MAX_JOBS=4\n\nFROM nvidia/cuda:$CUDA_VERSIO"
  },
  {
    "path": "docker/Dockerfile-base-nightly",
    "chars": 1849,
    "preview": "ARG CUDA_VERSION=\"12.8.1\"\nARG CUDNN_VERSION=\"8\"\nARG UBUNTU_VERSION=\"22.04\"\nARG MAX_JOBS=4\n\nFROM nvidia/cuda:$CUDA_VERSIO"
  },
  {
    "path": "docker/Dockerfile-cloud",
    "chars": 1139,
    "preview": "ARG BASE_TAG=main\nFROM axolotlai/axolotl:$BASE_TAG\n\nENV HF_DATASETS_CACHE=\"/workspace/data/huggingface-cache/datasets\"\nE"
  },
  {
    "path": "docker/Dockerfile-cloud-no-tmux",
    "chars": 1037,
    "preview": "ARG BASE_TAG=main\nFROM axolotlai/axolotl:$BASE_TAG\n\nENV HF_DATASETS_CACHE=\"/workspace/data/huggingface-cache/datasets\"\nE"
  },
  {
    "path": "docker/Dockerfile-cloud-uv",
    "chars": 1145,
    "preview": "ARG BASE_TAG=main\nFROM axolotlai/axolotl-uv:$BASE_TAG\n\nENV HF_DATASETS_CACHE=\"/workspace/data/huggingface-cache/datasets"
  },
  {
    "path": "docker/Dockerfile-tests",
    "chars": 1183,
    "preview": "ARG BASE_TAG=main-base\nFROM axolotlai/axolotl-base:$BASE_TAG\n\nARG TORCH_CUDA_ARCH_LIST=\"7.0 7.5 8.0 8.6+PTX\"\nARG AXOLOTL"
  },
  {
    "path": "docker/Dockerfile-uv",
    "chars": 1713,
    "preview": "ARG BASE_TAG=main-base\nFROM axolotlai/axolotl-base-uv:$BASE_TAG\n\nARG TORCH_CUDA_ARCH_LIST=\"7.0 7.5 8.0 8.6+PTX\"\nARG AXOL"
  },
  {
    "path": "docker/Dockerfile-uv-base",
    "chars": 2113,
    "preview": "ARG CUDA_VERSION=\"12.6.3\"\nARG CUDNN_VERSION=\"\"\nARG UBUNTU_VERSION=\"22.04\"\nARG MAX_JOBS=4\nARG TARGETARCH\n\nFROM nvidia/cud"
  },
  {
    "path": "docker-compose.yaml",
    "chars": 701,
    "preview": "# version: '3.8'\nservices:\n  axolotl:\n    build:\n      context: .\n      dockerfile: ./docker/Dockerfile\n    volumes:\n   "
  },
  {
    "path": "docs/.gitignore",
    "chars": 94,
    "preview": "/.quarto/\n_site/\n/api/*.qmd\n/api/*.html\nconfig-reference.qmd\nmodels/**/*.qmd\nmodels/**/*.html\n"
  },
  {
    "path": "docs/amd_hpc.qmd",
    "chars": 3341,
    "preview": "---\ntitle: AMD GPUs on HPC Systems\ndescription: A comprehensive guide for using Axolotl on distributed systems with AMD "
  },
  {
    "path": "docs/attention.qmd",
    "chars": 3820,
    "preview": "---\ntitle: Attention\ndescription: Supported attention modules in Axolotl\n---\n\n## SDP Attention\n\nThis is the default buil"
  },
  {
    "path": "docs/batch_vs_grad.qmd",
    "chars": 3008,
    "preview": "---\ntitle: Batch size vs Gradient accumulation\ndescription: Understanding of batch size and gradient accumulation steps\n"
  },
  {
    "path": "docs/checkpoint_saving.qmd",
    "chars": 2394,
    "preview": "---\ntitle: \"Checkpoint Saving\"\nformat:\n  html:\n    toc: true\n    toc-depth: 2\n    number-sections: true\nexecute:\n  enabl"
  },
  {
    "path": "docs/cli.qmd",
    "chars": 8582,
    "preview": "---\ntitle: \"Command Line Interface (CLI)\"\nformat:\n  html:\n    toc: true\n    toc-expand: 2\n    toc-depth: 3\nexecute:\n  en"
  },
  {
    "path": "docs/custom_integrations.qmd",
    "chars": 3984,
    "preview": "---\ntitle: Custom Integrations\ntoc: true\ntoc-depth: 3\n---\n\n```{python}\n#| echo: false\n\nimport os\nimport re\n\ndef process_"
  },
  {
    "path": "docs/dataset-formats/conversation.qmd",
    "chars": 9552,
    "preview": "---\ntitle: Conversation\ndescription: Conversation format for supervised fine-tuning.\norder: 3\n---\n\n## chat_template\n\nCha"
  },
  {
    "path": "docs/dataset-formats/index.qmd",
    "chars": 17703,
    "preview": "---\ntitle: Dataset Formats\ndescription: Guide to Dataset Formats in Axolotl\nback-to-top-navigation: true\ntoc: true\ntoc-d"
  },
  {
    "path": "docs/dataset-formats/inst_tune.qmd",
    "chars": 3818,
    "preview": "---\ntitle: Instruction Tuning\ndescription: Instruction tuning formats for supervised fine-tuning.\norder: 2\n---\n\n## alpac"
  },
  {
    "path": "docs/dataset-formats/pretraining.qmd",
    "chars": 764,
    "preview": "---\ntitle: Pre-training\ndescription: Data format for a pre-training completion task.\norder: 1\n---\n\nFor pretraining, ther"
  },
  {
    "path": "docs/dataset-formats/stepwise_supervised.qmd",
    "chars": 683,
    "preview": "---\ntitle: Stepwise Supervised Format\ndescription: Format for datasets with stepwise completions and labels\norder: 3\n---"
  },
  {
    "path": "docs/dataset-formats/template_free.qmd",
    "chars": 6465,
    "preview": "---\ntitle: Template-Free\ndescription: Construct prompts without a template.\ntoc: true\ntoc-depth: 3\norder: 4\n---\n\n## Back"
  },
  {
    "path": "docs/dataset-formats/tokenized.qmd",
    "chars": 910,
    "preview": "---\ntitle: Custom Pre-Tokenized Dataset\ndescription: How to use a custom pre-tokenized dataset.\norder: 5\n---\n\n- Pass an "
  },
  {
    "path": "docs/dataset_loading.qmd",
    "chars": 6700,
    "preview": "---\ntitle: Dataset Loading\ndescription: Understanding how to load datasets from different sources\nback-to-top-navigation"
  },
  {
    "path": "docs/dataset_preprocessing.qmd",
    "chars": 2025,
    "preview": "---\ntitle: Dataset Preprocessing\ndescription: How datasets are processed\n---\n\n## Overview\n\nDataset pre-processing is the"
  },
  {
    "path": "docs/debugging.qmd",
    "chars": 14344,
    "preview": "---\ntitle: Debugging\ndescription: How to debug Axolotl\n---\n\n\nThis document provides some tips and tricks for debugging A"
  },
  {
    "path": "docs/docker.qmd",
    "chars": 3189,
    "preview": "---\ntitle: \"Docker\"\nformat:\n  html:\n    toc: true\n    toc-depth: 4\n---\n\nThis section describes the different Docker imag"
  },
  {
    "path": "docs/expert_quantization.qmd",
    "chars": 2831,
    "preview": "---\ntitle: \"MoE Expert Quantization\"\ndescription: \"Reduce VRAM usage when training MoE model adapters by quantizing expe"
  },
  {
    "path": "docs/faq.qmd",
    "chars": 8564,
    "preview": "---\ntitle: FAQ\ndescription: Frequently asked questions\n---\n\n### General\n\n**Q: The trainer stopped and hasn't progressed "
  },
  {
    "path": "docs/fsdp_qlora.qmd",
    "chars": 2221,
    "preview": "---\ntitle: \"FSDP + QLoRA\"\ndescription: Use FSDP with QLoRA to fine-tune large LLMs on consumer GPUs.\nformat:\n  html:\n   "
  },
  {
    "path": "docs/getting-started.qmd",
    "chars": 4624,
    "preview": "---\ntitle: \"Quickstart\"\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n    number-sections: true\nexecute:\n  enabled: fal"
  },
  {
    "path": "docs/gradient_checkpointing.qmd",
    "chars": 997,
    "preview": "---\ntitle: Gradient Checkpointing and Activation Offloading\n---\n\nGradient checkpointing and activation offloading are te"
  },
  {
    "path": "docs/inference.qmd",
    "chars": 2824,
    "preview": "---\ntitle: \"Inference and Merging\"\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n    number-sections: true\nexecute:\n  e"
  },
  {
    "path": "docs/input_output.qmd",
    "chars": 200,
    "preview": "---\ntitle: Template-free prompt construction\ndescription: \"Template-free prompt construction with the `input_output` for"
  },
  {
    "path": "docs/installation.qmd",
    "chars": 5355,
    "preview": "---\ntitle: \"Installation\"\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n    number-sections: true\nexecute:\n  enabled: f"
  },
  {
    "path": "docs/lora_optims.qmd",
    "chars": 4694,
    "preview": "---\ntitle: \"LoRA Optimizations\"\ndescription: \"Custom autograd functions and Triton kernels in Axolotl for optimized LoRA"
  },
  {
    "path": "docs/lr_groups.qmd",
    "chars": 1041,
    "preview": "---\ntitle: Learning Rate Groups\ndescription: \"Setting different learning rates by module name\"\n---\n\n## Background\n\nInspi"
  },
  {
    "path": "docs/mac.qmd",
    "chars": 608,
    "preview": "---\ntitle: Mac M-series\ndescription: Mac M-series support\n---\n\nCurrently Axolotl on Mac is partially usable, many of the"
  },
  {
    "path": "docs/mixed_precision.qmd",
    "chars": 4855,
    "preview": "---\ntitle: \"Mixed Precision Training\"\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n    number-sections: true\n    code-"
  },
  {
    "path": "docs/multi-gpu.qmd",
    "chars": 5576,
    "preview": "---\ntitle: \"Multi-GPU\"\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n    # number-sections: true\n    code-tools: true\ne"
  },
  {
    "path": "docs/multi-node.qmd",
    "chars": 3263,
    "preview": "---\ntitle: Multi Node\ndescription: How to use Axolotl on multiple machines\n---\n\nThe below are three ways to train multi-"
  },
  {
    "path": "docs/multimodal.qmd",
    "chars": 7696,
    "preview": "---\ntitle: MultiModal / Vision Language Models (BETA)\nformat:\n  html:\n    toc: true\n    toc-depth: 3\n---\n\n## Supported M"
  },
  {
    "path": "docs/multipack.qmd",
    "chars": 2016,
    "preview": "---\ntitle: Multipack (Sample Packing)\ndescription: Multipack is a technique to pack multiple sequences into a single bat"
  },
  {
    "path": "docs/nccl.qmd",
    "chars": 3106,
    "preview": "---\ntitle: NCCL\ndescription: Troubleshooting NCCL issues\n---\n\nNVIDIA NCCL is a library to facilitate and optimize multi-"
  },
  {
    "path": "docs/nd_parallelism.qmd",
    "chars": 7587,
    "preview": "---\ntitle: \"N-D Parallelism (Beta)\"\n---\n\nAxolotl enables training models at scale by composing different parallelism tec"
  },
  {
    "path": "docs/optimizations.qmd",
    "chars": 6738,
    "preview": "---\ntitle: Optimizations Guide\ndescription: A guide to the performance and memory optimizations available in Axolotl.\n--"
  },
  {
    "path": "docs/optimizers.qmd",
    "chars": 3015,
    "preview": "---\ntitle: Optimizers\ndescription: Configuring optimizers\n---\n\n## Overview\n\nAxolotl supports all optimizers supported by"
  },
  {
    "path": "docs/qat.qmd",
    "chars": 2169,
    "preview": "---\ntitle: \"Quantization Aware Training (QAT)\"\nback-to-top-navigation: true\ntoc: true\ntoc-expand: 2\ntoc-depth: 4\n---\n\n##"
  },
  {
    "path": "docs/quantize.qmd",
    "chars": 2224,
    "preview": "---\ntitle: \"Quantization with torchao\"\nback-to-top-navigation: true\ntoc: true\ntoc-expand: 2\ntoc-depth: 4\n---\n\nQuantizati"
  },
  {
    "path": "docs/ray-integration.qmd",
    "chars": 3876,
    "preview": "---\ntitle: Ray Train\ndescription: How to use Axolotl with Ray Train\n---\n\nAxolotl supports using Ray as an alternative to"
  },
  {
    "path": "docs/reward_modelling.qmd",
    "chars": 2234,
    "preview": "---\ntitle: \"Reward Modelling\"\ndescription: \"Reward models are used to guide models towards behaviors which is preferred "
  },
  {
    "path": "docs/rlhf.qmd",
    "chars": 26077,
    "preview": "---\ntitle: \"RLHF (Beta)\"\ndescription: \"Reinforcement Learning from Human Feedback is a method whereby a language model i"
  },
  {
    "path": "docs/scripts/examples-allowlist.yml",
    "chars": 1487,
    "preview": "examples:\n  # December 2025\n  - name: kimi-linear\n    title: Kimi Linear\n  - name: plano\n    title: Plano Orchestrator\n "
  },
  {
    "path": "docs/scripts/generate_config_docs.py",
    "chars": 29346,
    "preview": "# type: ignore\n\n\"\"\"\nQuarto documentation generation from Pydantic models. Uses Pydantic model source code\nto automatical"
  },
  {
    "path": "docs/scripts/generate_examples_docs.py",
    "chars": 15598,
    "preview": "\"\"\"\nauto generate example docs from allowlist\n\"\"\"\n\nimport re\nimport shutil\nimport sys\nfrom pathlib import Path\n\nimport y"
  },
  {
    "path": "docs/sequence_parallelism.qmd",
    "chars": 3735,
    "preview": "---\ntitle: Sequence Parallelism\ndescription: Train with long sequences split across multiple GPUs.\n---\n\nSequence paralle"
  },
  {
    "path": "docs/streaming.qmd",
    "chars": 3425,
    "preview": "---\ntitle: Streaming Datasets\ndescription: How to use streaming mode for large-scale datasets and memory-efficient train"
  },
  {
    "path": "docs/telemetry.qmd",
    "chars": 2440,
    "preview": "---\ntitle: Telemetry\ndescription: A description of the telemetry implementation in Axolotl.\n---\n\n# Telemetry in Axolotl\n"
  },
  {
    "path": "docs/torchao.qmd",
    "chars": 731,
    "preview": "---\ntitle: \"PyTorch ao\"\ndescription: \"Custom data types and layouts for training and inference\"\n---\n\nTo use experimental"
  },
  {
    "path": "docs/unsloth.qmd",
    "chars": 1414,
    "preview": "---\ntitle: \"Unsloth\"\ndescription: \"Hyper-optimized QLoRA finetuning for single GPUs\"\n---\n\n### Overview\n\nUnsloth provides"
  },
  {
    "path": "examples/LiquidAI/README.md",
    "chars": 3153,
    "preview": "# Finetune Liquid Foundation Models 2 (LFM2) with Axolotl\n\n[Liquid Foundation Models 2 (LFM2)](https://huggingface.co/co"
  },
  {
    "path": "examples/LiquidAI/lfm2-350m-fft.yaml",
    "chars": 947,
    "preview": "base_model: LiquidAI/LFM2-350M\n\nplugins:\n  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\neot_tokens:\n "
  },
  {
    "path": "examples/LiquidAI/lfm2-8b-a1b-lora.yaml",
    "chars": 1141,
    "preview": "base_model: LiquidAI/LFM2-8B-A1B\n\nplugins:\n  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin\n\nload_in_8bi"
  },
  {
    "path": "examples/LiquidAI/lfm2-vl-lora.yaml",
    "chars": 1278,
    "preview": "base_model: LiquidAI/LFM2-VL-450M\ntrust_remote_code: true\nmodel_type: AutoModelForImageTextToText\nprocessor_type: AutoPr"
  },
  {
    "path": "examples/alst/README.md",
    "chars": 1051,
    "preview": "# Arctic Long Sequence Training (ALST)\n\nArtic Long Sequence Training (ALST) is a technique for training long context mod"
  },
  {
    "path": "examples/alst/llama3-8b-deepspeed-alst.yaml",
    "chars": 1191,
    "preview": "base_model: meta-llama/Llama-3.1-8B\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/cus"
  },
  {
    "path": "examples/alst/llama3-8b-fsdp2-alst.yaml",
    "chars": 1416,
    "preview": "base_model: meta-llama/Llama-3.1-8B\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/cus"
  },
  {
    "path": "examples/apertus/README.md",
    "chars": 3838,
    "preview": "# Finetune Swiss-AI's Apertus with Axolotl\n\n[Apertus](https://huggingface.co/collections/swiss-ai/apertus-llm-68b699e654"
  },
  {
    "path": "examples/apertus/apertus-8b-qlora.yaml",
    "chars": 1161,
    "preview": "base_model: swiss-ai/Apertus-8B-Instruct-2509\n\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: u"
  },
  {
    "path": "examples/arcee/README.md",
    "chars": 2350,
    "preview": "# Finetune ArceeAI's AFM with Axolotl\n\n[Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-4"
  },
  {
    "path": "examples/arcee/afm-4.5b-qlora.yaml",
    "chars": 1145,
    "preview": "base_model: arcee-ai/AFM-4.5B\n\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/custom_m"
  },
  {
    "path": "examples/archived/README.md",
    "chars": 195,
    "preview": "# Archived Examples\n\nThis directory contains examples that are no longer maintained and may no longer be functional.\n\nWe"
  },
  {
    "path": "examples/archived/cerebras/btlm-ft.yml",
    "chars": 1587,
    "preview": "base_model: cerebras/btlm-3b-8k-base\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCausal"
  },
  {
    "path": "examples/archived/cerebras/qlora.yml",
    "chars": 993,
    "preview": "base_model: cerebras/Cerebras-GPT-1.3B\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/"
  },
  {
    "path": "examples/archived/code-llama/13b/lora.yml",
    "chars": 1051,
    "preview": "base_model: codellama/CodeLlama-13b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM"
  },
  {
    "path": "examples/archived/code-llama/13b/qlora.yml",
    "chars": 1057,
    "preview": "base_model: codellama/CodeLlama-13b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM"
  },
  {
    "path": "examples/archived/code-llama/34b/lora.yml",
    "chars": 1051,
    "preview": "base_model: codellama/CodeLlama-34b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM"
  },
  {
    "path": "examples/archived/code-llama/34b/qlora.yml",
    "chars": 1057,
    "preview": "base_model: codellama/CodeLlama-34b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM"
  },
  {
    "path": "examples/archived/code-llama/7b/lora.yml",
    "chars": 1050,
    "preview": "base_model: codellama/CodeLlama-7b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\n"
  },
  {
    "path": "examples/archived/code-llama/7b/qlora.yml",
    "chars": 1056,
    "preview": "base_model: codellama/CodeLlama-7b-hf\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\n"
  },
  {
    "path": "examples/archived/code-llama/README.md",
    "chars": 769,
    "preview": "# Overview\n\nThis is an example of CodeLLaMA configuration for 7b, 13b and 34b.\n\nThe 7b variant fits on any 24GB VRAM GPU"
  },
  {
    "path": "examples/archived/dbrx/16bit-lora.yaml",
    "chars": 1510,
    "preview": "base_model: LnL-AI/dbrx-base-converted-v2\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: userna"
  },
  {
    "path": "examples/archived/dbrx/8bit-lora.yaml",
    "chars": 1550,
    "preview": "base_model: LnL-AI/dbrx-base-converted-v2\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: userna"
  },
  {
    "path": "examples/archived/dbrx/README.md",
    "chars": 1132,
    "preview": "# DBRX MoE\n\nCurrently, for LoRA, only the `q_proj`, `k_proj`, `v_proj` `out_proj` and `layer` Linear layers are trainabl"
  },
  {
    "path": "examples/archived/dbrx/fft-ds-zero3.yaml",
    "chars": 925,
    "preview": "base_model: LnL-AI/dbrx-base-converted-v2\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: userna"
  },
  {
    "path": "examples/archived/deepcoder/deepcoder-14B-preview-lora.yml",
    "chars": 941,
    "preview": "base_model: agentica-org/DeepCoder-14B-Preview\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: u"
  },
  {
    "path": "examples/archived/falcon/config-7b-lora.yml",
    "chars": 1264,
    "preview": "base_model: tiiuae/falcon-7b\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCausalLM\ntoken"
  },
  {
    "path": "examples/archived/falcon/config-7b-qlora.yml",
    "chars": 2193,
    "preview": "# 1b: tiiuae/falcon-rw-1b\n# 40b: tiiuae/falcon-40b\nbase_model: tiiuae/falcon-7b\n# optionally might have model_type or to"
  },
  {
    "path": "examples/archived/falcon/config-7b.yml",
    "chars": 1219,
    "preview": "base_model: tiiuae/falcon-7b\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCausalLM\ntoken"
  },
  {
    "path": "examples/archived/gemma/qlora.yml",
    "chars": 1028,
    "preview": "# use google/gemma-7b if you have access\nbase_model: mhenrichsen/gemma-7b\n# optionally might have model_type or tokenize"
  },
  {
    "path": "examples/archived/gptj/qlora.yml",
    "chars": 980,
    "preview": "base_model: EleutherAI/gpt-j-6b\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/custom_"
  },
  {
    "path": "examples/archived/jeopardy-bot/config.yml",
    "chars": 1069,
    "preview": "base_model: huggyllama/llama-7b\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\ntokeni"
  },
  {
    "path": "examples/archived/mpt-7b/README.md",
    "chars": 89,
    "preview": "# MPT-7B\n\n```shell\naccelerate launch scripts/finetune.py examples/mpt-7b/config.yml\n\n```\n"
  },
  {
    "path": "examples/archived/mpt-7b/config.yml",
    "chars": 1200,
    "preview": "base_model: mosaicml/mpt-7b\n# optionally might have model_type or tokenizer_type\ntokenizer_type: AutoTokenizer\n# Automat"
  },
  {
    "path": "examples/archived/openllama-3b/README.md",
    "chars": 294,
    "preview": "# openllama-3b\n\nBasic full tune\n```shell\naccelerate launch scripts/finetune.py examples/openllama-3b/config.yml\n```\n\nLoR"
  },
  {
    "path": "examples/archived/openllama-3b/config.yml",
    "chars": 1105,
    "preview": "base_model: openlm-research/open_llama_3b_v2\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCa"
  },
  {
    "path": "examples/archived/openllama-3b/lora.yml",
    "chars": 1201,
    "preview": "base_model: openlm-research/open_llama_3b_v2\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCa"
  },
  {
    "path": "examples/archived/openllama-3b/qlora.yml",
    "chars": 1127,
    "preview": "base_model: openlm-research/open_llama_3b_v2\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCa"
  },
  {
    "path": "examples/archived/pythia/lora.yml",
    "chars": 778,
    "preview": "base_model: EleutherAI/pythia-1.4b-deduped\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: usern"
  },
  {
    "path": "examples/archived/pythia-12b/README.md",
    "chars": 197,
    "preview": "# Pythia 12B\n\n- Single-GPU A100 only (?)\n\n```shell\npython scripts/finetune.py examples/pythia-12b/config.yml\n```\n\n⚠️ Mul"
  },
  {
    "path": "examples/archived/pythia-12b/config.yml",
    "chars": 1005,
    "preview": "base_model: EleutherAI/pythia-12b-deduped\nbase_model_ignore_patterns: pytorch*  # prefer safetensors\n# optionally might "
  },
  {
    "path": "examples/archived/qwen/README.md",
    "chars": 110,
    "preview": "# Qwen\n\nTODO\n\n# Qwen2 MoE\n\n✅ multipack\n✅ qwen2_moe 4-bit QLoRA\n✅ qwen2_moe 16-bit LoRA\n❓ qwen2_moe 8-bit LoRA\n"
  },
  {
    "path": "examples/archived/qwen/lora.yml",
    "chars": 1041,
    "preview": "base_model: Qwen/Qwen-7B\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCausalLM\ntokenizer"
  },
  {
    "path": "examples/archived/qwen/qlora.yml",
    "chars": 1042,
    "preview": "base_model: Qwen/Qwen-7B\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCausalLM\ntokenizer"
  },
  {
    "path": "examples/archived/qwen/qwen2-moe-lora.yaml",
    "chars": 954,
    "preview": "base_model: Qwen/Qwen1.5-MoE-A2.7B\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/cust"
  },
  {
    "path": "examples/archived/qwen/qwen2-moe-qlora.yaml",
    "chars": 995,
    "preview": "base_model: Qwen/Qwen1.5-MoE-A2.7B\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/cust"
  },
  {
    "path": "examples/archived/redpajama/README.md",
    "chars": 117,
    "preview": "# RedPajama 3B preview release\n\n```shell\naccelerate launch scripts/finetune.py examples/redpajama/config-3b.yml\n\n```\n"
  },
  {
    "path": "examples/archived/redpajama/config-3b.yml",
    "chars": 1173,
    "preview": "base_model: togethercomputer/RedPajama-INCITE-Chat-3B-v1\n# optionally might have model_type or tokenizer_type\nmodel_type"
  },
  {
    "path": "examples/archived/replit-3b/config-lora.yml",
    "chars": 907,
    "preview": "base_model: replit/replit-code-v1-3b\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/cu"
  },
  {
    "path": "examples/archived/stablelm-2/1.6b/fft.yml",
    "chars": 1111,
    "preview": "base_model: stabilityai/stablelm-2-1_6b\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCau"
  },
  {
    "path": "examples/archived/stablelm-2/1.6b/lora.yml",
    "chars": 1074,
    "preview": "base_model: stabilityai/stablelm-2-1_6b\n# optionally might have model_type or tokenizer_type\nmodel_type: AutoModelForCau"
  },
  {
    "path": "examples/archived/stablelm-2/README.md",
    "chars": 1577,
    "preview": "# StableLM 2\n\nThis repository contains examples for training and processing using StableLM-2. It also includes a section"
  },
  {
    "path": "examples/archived/starcoder2/qlora.yml",
    "chars": 921,
    "preview": "base_model: bigcode/starcoder2-3b\n# Automatically upload checkpoint and final model to HF\n# hub_model_id: username/custo"
  },
  {
    "path": "examples/archived/tiny-llama/README.md",
    "chars": 319,
    "preview": "# Overview\n\nThis is a simple example of how to finetune TinyLlama1.1B using either lora or qlora:\n\nLoRa:\n\n```\naccelerate"
  },
  {
    "path": "examples/archived/tiny-llama/lora-mps.yml",
    "chars": 1024,
    "preview": "base_model: TinyLlama/TinyLlama_v1.1\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\nt"
  },
  {
    "path": "examples/archived/tiny-llama/lora.yml",
    "chars": 982,
    "preview": "base_model: TinyLlama/TinyLlama_v1.1\n# optionally might have model_type or tokenizer_type\ntokenizer_type: AutoTokenizer\n"
  },
  {
    "path": "examples/archived/tiny-llama/pretrain.yml",
    "chars": 876,
    "preview": "base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaFor"
  },
  {
    "path": "examples/archived/tiny-llama/qlora.yml",
    "chars": 1018,
    "preview": "base_model: TinyLlama/TinyLlama_v1.1\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\nt"
  },
  {
    "path": "examples/archived/xgen-7b/xgen-7b-8k-qlora.yml",
    "chars": 2200,
    "preview": "# An example finetuning Saleforce's XGen-7b model with 8k context using qlora\n# on Tim Dettmer's Guanaco dataset.\nbase_m"
  },
  {
    "path": "examples/archived/yi-34B-chat/README.md",
    "chars": 348,
    "preview": "# Overview\n\nThis is an example of a Yi-34B-Chat configuration. It demonstrates that it is possible to finetune a 34B mod"
  },
  {
    "path": "examples/archived/yi-34B-chat/qlora.yml",
    "chars": 1146,
    "preview": "base_model: 01-ai/Yi-34B-Chat\n# optionally might have model_type or tokenizer_type\nmodel_type: LlamaForCausalLM\ntokenize"
  },
  {
    "path": "examples/cloud/baseten.yaml",
    "chars": 111,
    "preview": "provider: baseten\nproject_name:\n\nsecrets:\n  - HF_TOKEN\n  - WANDB_API_KEY\n\ngpu: h100\ngpu_count: 8\nnode_count: 1\n"
  },
  {
    "path": "examples/cloud/modal.yaml",
    "chars": 523,
    "preview": "project_name:\nvolumes:\n  - name: axolotl-data\n    mount: /workspace/data\n  - name: axolotl-artifacts\n    mount: /workspa"
  }
]

// ... and 870 more files (download for full content)

About this extraction

This page contains the full source code of the axolotl-ai-cloud/axolotl GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 1070 files (5.4 MB), approximately 1.5M tokens, and a symbol index with 3886 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo