Full Code of huggingface/peft for AI

main 3fb7842e8f51 cached

709 files

20.5 MB

4.0M tokens

295 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (15,958K chars total). Download the full file to get everything.

Repository: huggingface/peft
Branch: main
Commit: 3fb7842e8f51
Files: 709
Total size: 20.5 MB

Directory structure:
gitextract___y2fwgs/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   └── feature-request.yml
│   ├── dependabot.yml
│   ├── workflows/
│   │   ├── build_docker_images.yml
│   │   ├── build_documentation.yml
│   │   ├── build_pr_documentation.yml
│   │   ├── deploy_method_comparison_app.yml
│   │   ├── integrations_tests.yml
│   │   ├── nightly.yml
│   │   ├── stale.yml
│   │   ├── test-docker-build.yml
│   │   ├── tests-main.yml
│   │   ├── tests.yml
│   │   ├── torch_compile_tests.yml
│   │   ├── trufflehog.yml
│   │   ├── upload_pr_documentation.yml
│   │   └── zizmor.yaml
│   └── zizmor.yml
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── Makefile
├── README.md
├── docker/
│   ├── README.md
│   ├── peft-cpu/
│   │   └── Dockerfile
│   └── peft-gpu/
│       └── Dockerfile
├── docs/
│   ├── Makefile
│   ├── README.md
│   └── source/
│       ├── _config.py
│       ├── _toctree.yml
│       ├── accelerate/
│       │   ├── deepspeed.md
│       │   └── fsdp.md
│       ├── conceptual_guides/
│       │   ├── adapter.md
│       │   ├── ia3.md
│       │   ├── oft.md
│       │   └── prompting.md
│       ├── developer_guides/
│       │   ├── checkpoint.md
│       │   ├── contributing.md
│       │   ├── custom_models.md
│       │   ├── lora.md
│       │   ├── low_level_api.md
│       │   ├── mixed_models.md
│       │   ├── model_merging.md
│       │   ├── quantization.md
│       │   ├── torch_compile.md
│       │   └── troubleshooting.md
│       ├── index.md
│       ├── install.md
│       ├── package_reference/
│       │   ├── adalora.md
│       │   ├── adapter_utils.md
│       │   ├── auto_class.md
│       │   ├── boft.md
│       │   ├── c3a.md
│       │   ├── cartridges.md
│       │   ├── config.md
│       │   ├── cpt.md
│       │   ├── delora.md
│       │   ├── fourierft.md
│       │   ├── functional.md
│       │   ├── gralora.md
│       │   ├── helpers.md
│       │   ├── hotswap.md
│       │   ├── hra.md
│       │   ├── ia3.md
│       │   ├── layernorm_tuning.md
│       │   ├── lily.md
│       │   ├── llama_adapter.md
│       │   ├── loha.md
│       │   ├── lokr.md
│       │   ├── lora.md
│       │   ├── lora_conversion.md
│       │   ├── merge_utils.md
│       │   ├── miss.md
│       │   ├── multitask_prompt_tuning.md
│       │   ├── oft.md
│       │   ├── osf.md
│       │   ├── p_tuning.md
│       │   ├── peanut.md
│       │   ├── peft_model.md
│       │   ├── peft_types.md
│       │   ├── poly.md
│       │   ├── prefix_tuning.md
│       │   ├── prompt_tuning.md
│       │   ├── psoft.md
│       │   ├── pvera.md
│       │   ├── randlora.md
│       │   ├── road.md
│       │   ├── shira.md
│       │   ├── trainable_tokens.md
│       │   ├── tuners.md
│       │   ├── vblora.md
│       │   ├── vera.md
│       │   ├── waveft.md
│       │   └── xlora.md
│       ├── quicktour.md
│       ├── task_guides/
│       │   ├── ia3.md
│       │   ├── lora_based_methods.md
│       │   └── prompt_based_methods.md
│       └── tutorial/
│           ├── peft_integrations.md
│           └── peft_model_config.md
├── examples/
│   ├── alora_finetuning/
│   │   ├── README.md
│   │   └── alora_finetuning.py
│   ├── arrow_multitask/
│   │   ├── arrow_phi3_mini.py
│   │   └── requirements.txt
│   ├── bdlora_finetuning/
│   │   ├── README.md
│   │   ├── bdlora_peft_demo.ipynb
│   │   ├── chat.py
│   │   └── vllm_server.bash
│   ├── boft_controlnet/
│   │   ├── __init__.py
│   │   ├── boft_controlnet.md
│   │   ├── eval.py
│   │   ├── eval.sh
│   │   ├── requirements.txt
│   │   ├── test_controlnet.py
│   │   ├── test_controlnet.sh
│   │   ├── train_controlnet.py
│   │   ├── train_controlnet.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       ├── light_controlnet.py
│   │       ├── pipeline_controlnet.py
│   │       ├── tracemalloc.py
│   │       └── unet_2d_condition.py
│   ├── boft_dreambooth/
│   │   ├── .gitignore
│   │   ├── __init__.py
│   │   ├── boft_dreambooth.md
│   │   ├── dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   ├── train_dreambooth.py
│   │   ├── train_dreambooth.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       └── tracemalloc.py
│   ├── cartridge_self_study/
│   │   ├── README.md
│   │   ├── arxiv_synthesize.py
│   │   ├── arxiv_train.py
│   │   ├── requirements.txt
│   │   ├── synthesize.py
│   │   └── train_distill.py
│   ├── causal_language_modeling/
│   │   ├── accelerate_ds_zero3_cpu_offload_config.yaml
│   │   ├── peft_ln_tuning_clm.ipynb
│   │   ├── peft_lora_clm_accelerate_ds_zero3_offload.py
│   │   ├── peft_lora_clm_with_additional_tokens.ipynb
│   │   ├── peft_prefix_tuning_clm.ipynb
│   │   ├── peft_prompt_tuning_clm.ipynb
│   │   └── requirements.txt
│   ├── conditional_generation/
│   │   ├── accelerate_ds_zero3_cpu_offload_config.yaml
│   │   ├── multitask_prompt_tuning.ipynb
│   │   ├── peft_adalora_seq2seq.py
│   │   ├── peft_ia3_seq2seq.ipynb
│   │   ├── peft_lora_seq2seq.ipynb
│   │   ├── peft_lora_seq2seq_accelerate_ds_zero3_offload.py
│   │   ├── peft_lora_seq2seq_accelerate_fsdp.py
│   │   ├── peft_prefix_tuning_seq2seq.ipynb
│   │   ├── peft_prompt_tuning_seq2seq.ipynb
│   │   ├── peft_prompt_tuning_seq2seq_with_generate.ipynb
│   │   └── requirements.txt
│   ├── corda_finetuning/
│   │   ├── README.md
│   │   ├── corda_finetuning.py
│   │   ├── datautils.py
│   │   └── preprocess.py
│   ├── cpt_finetuning/
│   │   ├── README.md
│   │   └── cpt_train_and_inference.ipynb
│   ├── delora_finetuning/
│   │   ├── README.md
│   │   └── delora_finetuning.py
│   ├── dna_language_models/
│   │   └── dna_lm.ipynb
│   ├── dora_finetuning/
│   │   ├── QDoRA_finetuning.ipynb
│   │   ├── README.md
│   │   ├── dora-caching.py
│   │   └── dora_finetuning.py
│   ├── ephemeral_gpu_offloading/
│   │   └── load_with_dora.py
│   ├── eva_finetuning/
│   │   ├── README.md
│   │   ├── eva_finetuning.py
│   │   ├── eva_finetuning_multi_accelerator.py
│   │   └── utils.py
│   ├── evaluation/
│   │   └── lora-lm-eval.ipynb
│   ├── feature_extraction/
│   │   ├── peft_lora_embedding_semantic_search.py
│   │   ├── peft_lora_embedding_semantic_similarity_inference.ipynb
│   │   └── requirements.txt
│   ├── fp4_finetuning/
│   │   └── finetune_fp4_opt_bnb_peft.py
│   ├── gralora_finetuning/
│   │   ├── README.md
│   │   └── gralora_finetuning.py
│   ├── hra_dreambooth/
│   │   ├── README.md
│   │   ├── dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   ├── train_dreambooth.py
│   │   ├── train_dreambooth.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       └── tracemalloc.py
│   ├── image_classification/
│   │   ├── README.md
│   │   ├── image_classification_peft_lora.ipynb
│   │   └── image_classification_timm_peft_lora.ipynb
│   ├── int8_training/
│   │   ├── Finetune_flan_t5_large_bnb_peft.ipynb
│   │   ├── Finetune_opt_bnb_peft.ipynb
│   │   ├── config.yaml
│   │   ├── fine_tune_blip2_int8.py
│   │   ├── peft_adalora_whisper_large_training.py
│   │   ├── peft_bnb_whisper_large_v2_training.ipynb
│   │   ├── requirements.txt
│   │   └── run_adalora_whisper_int8.sh
│   ├── lily_finetuning/
│   │   ├── README.md
│   │   └── lily_finetuning.py
│   ├── loftq_finetuning/
│   │   ├── LoftQ_weight_replacement.ipynb
│   │   ├── README.md
│   │   ├── int8_correction.py
│   │   ├── quantize_save_load.py
│   │   └── train_gsm8k_llama.py
│   ├── lora_dreambooth/
│   │   ├── colab_notebook.ipynb
│   │   ├── convert_kohya_ss_sd_lora_to_peft.py
│   │   ├── convert_peft_sd_lora_to_kohya_ss.py
│   │   ├── lora_dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   └── train_dreambooth.py
│   ├── lora_finetuning_transformer_engine/
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── lora_finetuning_te.py
│   │   └── requirements.txt
│   ├── lora_ga_finetuning/
│   │   ├── README.md
│   │   └── lora_ga_finetuning.py
│   ├── lorafa_finetune/
│   │   ├── README.md
│   │   └── lorafa_finetuning.py
│   ├── miss_finetuning/
│   │   ├── README.md
│   │   └── miss_finetuning.py
│   ├── multi_adapter_examples/
│   │   ├── Lora_Merging.ipynb
│   │   ├── PEFT_Multi_LoRA_Inference.ipynb
│   │   └── multi_adapter_weighted_inference_diffusers.ipynb
│   ├── multilayer_perceptron/
│   │   ├── README.md
│   │   └── multilayer_perceptron_lora.ipynb
│   ├── oft_dreambooth/
│   │   ├── oft_dreambooth_inference.ipynb
│   │   └── train_dreambooth.py
│   ├── olora_finetuning/
│   │   ├── README.md
│   │   └── olora_finetuning.py
│   ├── orthogonal_subspace_learning/
│   │   ├── README.md
│   │   ├── osf_continual_learning.py
│   │   └── utils.py
│   ├── peanut_finetuning/
│   │   ├── README.md
│   │   └── peanut_finetuning.py
│   ├── pissa_finetuning/
│   │   ├── README.md
│   │   ├── pissa_finetuning.py
│   │   └── preprocess.py
│   ├── poly/
│   │   └── peft_poly_seq2seq_with_generate.ipynb
│   ├── psoft_finetuning/
│   │   ├── README.md
│   │   └── psoft_finetuning.py
│   ├── pvera/
│   │   ├── README.md
│   │   └── confidence_interval_generation.py
│   ├── qalora_finetuning/
│   │   ├── README.md
│   │   └── qalora_gptq_finetuning.py
│   ├── randlora_finetuning/
│   │   ├── README.md
│   │   ├── qrandlora_finetuning.ipynb
│   │   └── randlora_finetuning.py
│   ├── road_finetuning/
│   │   ├── README.md
│   │   └── road_finetuning.py
│   ├── semantic_segmentation/
│   │   ├── README.md
│   │   └── semantic_segmentation_peft_lora.ipynb
│   ├── sequence_classification/
│   │   ├── C3A.ipynb
│   │   ├── FourierFT.ipynb
│   │   ├── IA3.ipynb
│   │   ├── LoRA-torchao-8bit-dynamic-activation.ipynb
│   │   ├── LoRA-torchao-8bit.ipynb
│   │   ├── LoRA.ipynb
│   │   ├── P_Tuning.ipynb
│   │   ├── Prompt_Tuning.ipynb
│   │   ├── VBLoRA.ipynb
│   │   ├── VeRA.ipynb
│   │   ├── peft_no_lora_accelerate.py
│   │   ├── prefix_tuning.ipynb
│   │   └── requirements.txt
│   ├── sft/
│   │   ├── README.md
│   │   ├── configs/
│   │   │   ├── deepspeed_config.yaml
│   │   │   ├── deepspeed_config_z3_qlora.yaml
│   │   │   ├── fsdp_config.yaml
│   │   │   └── fsdp_config_qlora.yaml
│   │   ├── requirements.txt
│   │   ├── requirements_colab.txt
│   │   ├── requirements_xpu.txt
│   │   ├── run_peft.sh
│   │   ├── run_peft_deepspeed.sh
│   │   ├── run_peft_fsdp.sh
│   │   ├── run_peft_fsdp_gptq.sh
│   │   ├── run_peft_multigpu.sh
│   │   ├── run_peft_qlora_deepspeed_stage3.sh
│   │   ├── run_peft_qlora_fsdp.sh
│   │   ├── run_unsloth_peft.sh
│   │   ├── train.py
│   │   └── utils.py
│   ├── shira_finetuning/
│   │   ├── README.md
│   │   └── shira_finetuning.py
│   ├── stable_diffusion/
│   │   ├── convert_sd_adapter_to_peft.py
│   │   ├── inc_flux_lora_hpu.py
│   │   └── train_dreambooth.py
│   ├── token_classification/
│   │   ├── peft_lora_ner.ipynb
│   │   ├── peft_lora_token_cls.ipynb
│   │   └── requirements.txt
│   ├── waveft_finetuning/
│   │   ├── README.md
│   │   └── waveft_finetuning.py
│   └── xlora/
│       ├── README.md
│       └── xlora_inference_mistralrs.py
├── method_comparison/
│   ├── MetaMathQA/
│   │   ├── Makefile
│   │   ├── README.md
│   │   ├── data.py
│   │   ├── default_training_params.json
│   │   ├── experiments/
│   │   │   ├── adalora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── adaptionprompt/
│   │   │   │   └── llama-3.2-3B-lr_0.0005/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── boft/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── bone/
│   │   │   │   ├── llama-3.2-3B-bat/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── c3a/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── delora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── fourierft/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-n_frequency-5000/
│   │   │   │       └── adapter_config.json
│   │   │   ├── full-finetuning/
│   │   │   │   └── llama-3.2-3B-lr_0.00001/
│   │   │   │       └── training_params.json
│   │   │   ├── gralora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── ia3/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── lily/
│   │   │   │   ├── llama-3.2-3B-rank140-mlp-a2-b2-s8.0/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank896-a2-b2-s2.0/
│   │   │   │       └── adapter_config.json
│   │   │   ├── ln_tuning/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── loha/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── lokr/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── lora/
│   │   │   │   ├── llama-3.2-3B-rank10-target-mlp/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank14-target-mlp-bdlora/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32-dora/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32-lorafa/
│   │   │   │   │   ├── adapter_config.json
│   │   │   │   │   └── training_params.json
│   │   │   │   ├── llama-3.2-3B-rank64/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank64-rslora/
│   │   │   │       └── adapter_config.json
│   │   │   ├── miss/
│   │   │   │   ├── llama-3.2-3B-bat/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-mini/
│   │   │   │       └── adapter_config.json
│   │   │   ├── oft/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── osf/
│   │   │   │   └── llama-3.2-3B-rank128/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── peanut/
│   │   │   │   ├── llama-3.2-3B-rank1-relu-depth0-s32.0/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank32-relu-depth0-s2.0/
│   │   │   │       └── adapter_config.json
│   │   │   ├── prefixtuning/
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── prompt_tuning/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-lr_0.001/
│   │   │   │   │   ├── adapter_config.json
│   │   │   │   │   └── training_params.json
│   │   │   │   └── llama-3.2-3B-sample_vocab-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── psoft/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-fast/
│   │   │   │       └── adapter_config.json
│   │   │   ├── ptuning/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── pvera/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── randlora/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── road/
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── shira/
│   │   │   │   └── llama-3.2-3B-lr_0.0003-random_seed_42/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── trainable_tokens/
│   │   │   │   └── llama-3.2-3B-sos+eos/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── vblora/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── vera/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   └── waveft/
│   │   │       └── llama-3.2-3B-n_frequency-5000/
│   │   │           └── adapter_config.json
│   │   ├── requirements.txt
│   │   ├── results/
│   │   │   ├── .gitkeep
│   │   │   ├── adalora--llama-3.2-3B-rank32.json
│   │   │   ├── adaptionprompt--llama-3.2-3B-lr_0.0005.json
│   │   │   ├── boft--llama-3.2-3B-default.json
│   │   │   ├── bone--llama-3.2-3B-bat.json
│   │   │   ├── bone--llama-3.2-3B-default.json
│   │   │   ├── c3a--llama-3.2-3B-default.json
│   │   │   ├── delora--llama-3.2-3B-rank32.json
│   │   │   ├── fourierft--llama-3.2-3B-default.json
│   │   │   ├── fourierft--llama-3.2-3B-n_frequency-5000.json
│   │   │   ├── full-finetuning--llama-3.2-3B-lr_0.00001.json
│   │   │   ├── gralora--llama-3.2-3B-rank32.json
│   │   │   ├── ia3--llama-3.2-3B-default.json
│   │   │   ├── ia3--llama-3.2-3B-lr_0.001.json
│   │   │   ├── ln_tuning--llama-3.2-3B-default.json
│   │   │   ├── loha--llama-3.2-3B-rank32.json
│   │   │   ├── lokr--llama-3.2-3B-rank32.json
│   │   │   ├── lora--llama-3.2-3B-rank10-target-mlp.json
│   │   │   ├── lora--llama-3.2-3B-rank14-target-mlp-bdlora.json
│   │   │   ├── lora--llama-3.2-3B-rank32-dora.json
│   │   │   ├── lora--llama-3.2-3B-rank32-lorafa.json
│   │   │   ├── lora--llama-3.2-3B-rank32.json
│   │   │   ├── lora--llama-3.2-3B-rank64-rslora.json
│   │   │   ├── lora--llama-3.2-3B-rank64.json
│   │   │   ├── miss--llama-3.2-3B-bat.json
│   │   │   ├── miss--llama-3.2-3B-default.json
│   │   │   ├── miss--llama-3.2-3B-mini.json
│   │   │   ├── oft--llama-3.2-3B-rank32.json
│   │   │   ├── osf--llama-3.2-3B-rank128.json
│   │   │   ├── prefixtuning--llama-3.2-3B-lr_0.001.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-default.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-lr_0.001.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-sample_vocab-lr_0.001.json
│   │   │   ├── ptuning--llama-3.2-3B-default.json
│   │   │   ├── randlora--llama-3.2-3B-default.json
│   │   │   ├── road--llama-3.2-3B-lr_0.001.json
│   │   │   ├── shira--llama-3.2-3B-lr_0.0003-random_seed_42.json
│   │   │   ├── trainable_tokens--llama-3.2-3B-sos+eos.json
│   │   │   ├── vblora--llama-3.2-3B-default.json
│   │   │   ├── vera--llama-3.2-3B-default.json
│   │   │   └── waveft--llama-3.2-3B-n_frequency-5000.json
│   │   ├── run.py
│   │   └── utils.py
│   ├── README.md
│   ├── __init__.py
│   ├── app.py
│   ├── processing.py
│   ├── requirements-app.txt
│   ├── sanitizer.py
│   ├── test_sanitizer.py
│   └── text_generation_benchmark/
│       ├── README.md
│       ├── cancelled_results/
│       │   └── .gitkeep
│       ├── configs/
│       │   └── prompts.json
│       ├── data.py
│       ├── default_benchmark_params.json
│       ├── experiments/
│       │   └── lora/
│       │       └── lora_r8/
│       │           └── adapter_config.json
│       ├── results/
│       │   └── .gitkeep
│       ├── run.py
│       ├── run_base.py
│       ├── temporary_results/
│       │   └── .gitkeep
│       └── utils.py
├── pyproject.toml
├── requirements.txt
├── scripts/
│   ├── ci_clean_cache.py
│   ├── convert-bone-to-miss.py
│   ├── evaluate-lora-conversion.py
│   ├── launch_notebook_mp.py
│   ├── log_reports.py
│   ├── stale.py
│   └── train_memory.py
├── setup.py
├── src/
│   └── peft/
│       ├── __init__.py
│       ├── auto.py
│       ├── config.py
│       ├── functional.py
│       ├── helpers.py
│       ├── import_utils.py
│       ├── mapping.py
│       ├── mapping_func.py
│       ├── mixed_model.py
│       ├── optimizers/
│       │   ├── __init__.py
│       │   ├── lorafa.py
│       │   └── loraplus.py
│       ├── peft_model.py
│       ├── py.typed
│       ├── tuners/
│       │   ├── __init__.py
│       │   ├── _buffer_dict.py
│       │   ├── adalora/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── gptq.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── adaption_prompt/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── boft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── fbd/
│       │   │   │   ├── __init__.py
│       │   │   │   ├── fbd_cuda.cpp
│       │   │   │   └── fbd_cuda_kernel.cu
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── c3a/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── cartridge/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── cpt/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── delora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── fourierft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── gralora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── hra/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── ia3/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lily/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── ln_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── loha/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lokr/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lora/
│       │   │   ├── __init__.py
│       │   │   ├── aqlm.py
│       │   │   ├── arrow.py
│       │   │   ├── awq.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── conversion.py
│       │   │   ├── corda.py
│       │   │   ├── dora.py
│       │   │   ├── eetq.py
│       │   │   ├── eva.py
│       │   │   ├── gptq.py
│       │   │   ├── hqq.py
│       │   │   ├── inc.py
│       │   │   ├── intruders.py
│       │   │   ├── layer.py
│       │   │   ├── loraga.py
│       │   │   ├── model.py
│       │   │   ├── te.py
│       │   │   ├── torchao.py
│       │   │   ├── tp_layer.py
│       │   │   └── variants.py
│       │   ├── lycoris_utils.py
│       │   ├── miss/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── mixed/
│       │   │   ├── __init__.py
│       │   │   └── model.py
│       │   ├── multitask_prompt_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── oft/
│       │   │   ├── __init__.py
│       │   │   ├── aqlm.py
│       │   │   ├── awq.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── eetq.py
│       │   │   ├── gptq.py
│       │   │   ├── hqq.py
│       │   │   ├── inc.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── osf/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── p_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── peanut/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── poly/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── router.py
│       │   ├── prefix_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── prompt_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── psoft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── pvera/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── randlora/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── road/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── shira/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── mask_functions.py
│       │   │   └── model.py
│       │   ├── trainable_tokens/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── tuners_utils.py
│       │   ├── vblora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── vera/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── waveft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── constants.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   ├── wavelet.py
│       │   │   └── waverec2d.py
│       │   └── xlora/
│       │       ├── __init__.py
│       │       ├── classifier.py
│       │       ├── config.py
│       │       ├── layer.py
│       │       └── model.py
│       └── utils/
│           ├── __init__.py
│           ├── constants.py
│           ├── hotswap.py
│           ├── incremental_pca.py
│           ├── integrations.py
│           ├── loftq_utils.py
│           ├── merge_utils.py
│           ├── other.py
│           ├── peft_types.py
│           ├── save_and_load.py
│           └── warning.py
└── tests/
    ├── __init__.py
    ├── conftest.py
    ├── regression/
    │   ├── __init__.py
    │   └── test_regression.py
    ├── test_adaption_prompt.py
    ├── test_arrow.py
    ├── test_auto.py
    ├── test_boft.py
    ├── test_bufferdict.py
    ├── test_cartridge.py
    ├── test_common_gpu.py
    ├── test_config.py
    ├── test_cpt.py
    ├── test_custom_models.py
    ├── test_decoder_models.py
    ├── test_encoder_decoder_models.py
    ├── test_feature_extraction_models.py
    ├── test_gptqmodel.py
    ├── test_gpu_examples.py
    ├── test_helpers.py
    ├── test_hub_features.py
    ├── test_incremental_pca.py
    ├── test_initialization.py
    ├── test_integrations.py
    ├── test_lora_conversion.py
    ├── test_lora_ga.py
    ├── test_lora_intruders.py
    ├── test_lora_megatron.py
    ├── test_lora_variants.py
    ├── test_lorafa.py
    ├── test_loraplus.py
    ├── test_low_level_api.py
    ├── test_mapping.py
    ├── test_mixed.py
    ├── test_multitask_prompt_tuning.py
    ├── test_osf.py
    ├── test_other.py
    ├── test_poly.py
    ├── test_pvera.py
    ├── test_randlora.py
    ├── test_seq_classifier.py
    ├── test_shira.py
    ├── test_stablediffusion.py
    ├── test_target_parameters.py
    ├── test_torch_compile.py
    ├── test_trainable_tokens.py
    ├── test_tuners_utils.py
    ├── test_vblora.py
    ├── test_vera.py
    ├── test_vision_models.py
    ├── test_xlora.py
    ├── testing_common.py
    ├── testing_utils.py
    └── training/
        ├── adapters.py
        ├── deepspeed_config.yaml
        ├── fsdp2_config.yaml
        ├── fsdp_config.yaml
        ├── lora_tp.py
        ├── tp_config.yaml
        └── training.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yml
================================================
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve the library
body:
  - type: textarea
    id: system-info
    attributes:
      label: System Info
      description: Please share your relevant system information with us
      placeholder: peft & accelerate & transformers version, platform, python version, ...
    validations:
      required: true

  - type: textarea
    id: who-can-help
    attributes:
      label: Who can help?
      description: |
        Your issue will be replied to more quickly if you can figure out the right person to tag with @.
        If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.

        All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
        a core maintainer will ping the right person.

        Please tag fewer than 3 people.

        Library: @benjaminbossan @githubnemo

        diffusers integration: @benjaminbossan @sayakpaul

        Documentation: @stevhliu

      placeholder: "@Username ..."

  - type: textarea
    id: reproduction
    validations:
      required: true
    attributes:
      label: Reproduction
      description: |
        Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
        Please provide the simplest reproducer as possible so that we can quickly fix the issue. When you paste
        the error message, please include the full traceback.

      placeholder: |
        Reproducer:

  - type: textarea
    id: expected-behavior
    validations:
      required: true
    attributes:
      label: Expected behavior
      description: "A clear and concise description of what you would expect to happen."


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.yml
================================================
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new feature
labels: [ "feature" ]
body:
  - type: textarea
    id: feature-request
    validations:
      required: true
    attributes:
      label: Feature request
      description: |
        A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.

  - type: textarea
    id: contribution
    validations:
      required: true
    attributes:
      label: Your contribution
      description: |
        Is there any way that you could help, e.g. by submitting a PR?


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "monthly"
    groups:
      ci-actions:
        patterns:
          - "actions/*"
      third-party-actions:
        patterns:
          - "*"
        exclude-patterns:
          - "actions/*"


================================================
FILE: .github/workflows/build_docker_images.yml
================================================
name: Build Docker images (scheduled)

on:
  workflow_dispatch:
  workflow_call:
  schedule:
    - cron: "0 1 * * *"

concurrency:
  group: docker-image-builds
  cancel-in-progress: false

permissions: {}

env:
  CI_SLACK_CHANNEL: ${{ secrets.CI_DOCKER_CHANNEL }}

jobs:
  latest-cpu:
    name: "Latest Peft CPU [dev]"
    runs-on:
      group: aws-general-8-plus
    steps:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3.12.0
      - name: Check out code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Login to DockerHub
        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3.7.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_PASSWORD }}

      - name: Build and Push CPU
        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6.19.2
        with:
          context: ./docker/peft-cpu
          push: true
          tags: huggingface/peft-cpu

      - name: Post to Slack
        if: always()
        uses: huggingface/hf-workflows/.github/actions/post-slack@3f88d63d3761558a32e8e46fc2a8536e04bb2aea  # main from Feb 2025-02-24
        with:
          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
          title: 🤗 Results of the PEFT-CPU docker build
          status: ${{ job.status }}
          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

  latest-cuda:
    name: "Latest Peft GPU [dev]"
    runs-on:
      group: aws-general-8-plus
    steps:
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3.12.0
      - name: Check out code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Login to DockerHub
        uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9  # v3.7.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_PASSWORD }}

      - name: Build and Push GPU
        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6.19.2
        with:
          context: ./docker/peft-gpu
          push: true
          tags: huggingface/peft-gpu

      - name: Post to Slack
        if: always()
        uses: huggingface/hf-workflows/.github/actions/post-slack@3f88d63d3761558a32e8e46fc2a8536e04bb2aea  # main from Feb 2025-02-24
        with:
          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
          title: 🤗 Results of the PEFT-GPU docker build
          status: ${{ job.status }}
          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}


================================================
FILE: .github/workflows/build_documentation.yml
================================================
name: Build documentation

on:
  push:
    branches:
      - main
      - doc-builder*
      - v*-release

permissions: {}

jobs:
   build:
    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@607843a35ee10949e5bea9ca2f206831b0305b4c  # main
    with:
      commit_sha: ${{ github.sha }}
      package: peft
      notebook_folder: peft_docs
      custom_container: huggingface/transformers-doc-builder
    secrets:
      token: ${{ secrets.HUGGINGFACE_PUSH }}
      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}


================================================
FILE: .github/workflows/build_pr_documentation.yml
================================================
name: Build PR Documentation

on:
  pull_request:

concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
  cancel-in-progress: true

permissions: {}

jobs:
  build:
    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@607843a35ee10949e5bea9ca2f206831b0305b4c  # main
    with:
      commit_sha: ${{ github.event.pull_request.head.sha }}
      pr_number: ${{ github.event.number }}
      package: peft
      custom_container: huggingface/transformers-doc-builder


================================================
FILE: .github/workflows/deploy_method_comparison_app.yml
================================================
name: Deploy "method_comparison" Gradio to Spaces

on:
  push:
    branches: [ main ]
    paths:
      - "method_comparison/**"
  workflow_dispatch:

permissions: {}

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          fetch-depth: 0  # full history needed for subtree
          persist-credentials: false

      - name: Authenticate via ~/.netrc
        env:
          HF_TOKEN: ${{ secrets.PEFT_INTERNAL_REPO_READ_WRITE }}
        run: |
          # netrc needs BOTH login and password entries
          printf "machine huggingface.co\nlogin hf\npassword ${HF_TOKEN}\n" >> ~/.netrc
          chmod 600 ~/.netrc

      - name: Deploy method_comparison app to HF Spaces
        run: |
          cd method_comparison
          git init
          # Spaces expect requirements.txt
          mv requirements-app.txt requirements.txt
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git remote add gradio-app https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison
          git add .
          git commit -m "🚀 Deploy method comparison app from GH action"
          git push -f gradio-app HEAD:main


================================================
FILE: .github/workflows/integrations_tests.yml
================================================
name: integration tests

on:
  workflow_dispatch:
    inputs:
      branch:
        description: 'Branch to test on'
        required: true

permissions: {}

jobs:
  run_transformers_integration_tests:
    strategy:
      fail-fast: false
      matrix:
        transformers-version: ['main', 'latest']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          ref: ${{ github.event.inputs.branch }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          persist-credentials: false
      - name: Set up Python
        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: "3.10"
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: print environment variables
        run: |
          echo "env.CI_BRANCH = ${CI_BRANCH}"
          echo "env.CI_SHA = ${CI_SHA}"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install .[test]
          if [ "${{ matrix.transformers-version }}" == "main" ]; then
              pip install -U git+https://github.com/huggingface/transformers.git
          else
              echo "Nothing to do as transformers latest already installed"
          fi

      - name: Test transformers integration
        run: |
          cd .. && git clone https://github.com/huggingface/transformers.git && cd transformers/ && git rev-parse HEAD
          RUN_SLOW=1 pytest tests/peft_integration/test_peft_integration.py
  run_diffusers_integration_tests:
    strategy:
      fail-fast: false
      matrix:
        # For now diffusers integration is not on PyPI
        diffusers-version: ['main']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          ref: ${{ github.event.inputs.branch }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          persist-credentials: false
      - name: Set up Python
        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: "3.10"
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: print environment variables
        run: |
          echo "env.CI_BRANCH = ${CI_BRANCH}"
          echo "env.CI_SHA = ${CI_SHA}"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install .[test]

          if [ "${{ matrix.diffusers-version }}" == "main" ]; then
              pip install -U git+https://github.com/huggingface/diffusers.git
          else
              echo "Nothing to do as diffusers latest already installed"
          fi

      - name: Test diffusers integration
        run: |
          cd .. && git clone https://github.com/huggingface/diffusers.git && cd diffusers/ && git rev-parse HEAD
          pytest tests/lora/test_lora_layers_peft.py


================================================
FILE: .github/workflows/nightly.yml
================================================
name: Self-hosted runner with slow tests (scheduled)

on:
  workflow_dispatch:
  schedule:
    - cron: "0 2 * * *"

env:
  RUN_SLOW: "yes"
  IS_GITHUB_CI: "1"
  # To be able to run tests on CUDA 12.2
  NVIDIA_DISABLE_REQUIRE: "1"
  SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

permissions: {}

jobs:
  run_all_tests_single_gpu:
    strategy:
      fail-fast: false
    runs-on:
      group: aws-g6-4xlarge-plus
    env:
      CUDA_VISIBLE_DEVICES: "0"
      TEST_TYPE: "single_gpu"
    container:
      image: huggingface/peft-gpu:latest
      options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
    defaults:
      run:
        shell: bash
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Pip install
        run: |
          source activate peft
          pip install -e . --no-deps
          pip install pytest-reportlog

      - name: Run common tests on single GPU
        id: common_tests
        continue-on-error: true
        run: |
          source activate peft
          make tests_common_gpu

      - name: Run examples on single GPU
        id: examples
        continue-on-error: true
        run: |
          source activate peft
          make tests_examples_single_gpu

      - name: Run core tests on single GPU
        id: core_tests
        continue-on-error: true
        run: |
          source activate peft
          make tests_core_single_gpu

      - name: Run regression tests on single GPU
        id: regression
        continue-on-error: true
        run: |
          source activate peft
          make tests_regression

      - name: Generate Report
        if: always()
        run: |
          pip install slack_sdk tabulate
          python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY

      - name: Check for test failures
        if: |
          steps.common_tests.outcome == 'failure' ||
          steps.examples.outcome == 'failure' ||
          steps.core_tests.outcome == 'failure' ||
          steps.regression.outcome == 'failure'
        run: |
          echo "One or more test suites failed. Check the logs above."
          exit 1

  run_all_tests_multi_gpu:
    strategy:
      fail-fast: false
    runs-on:
      group: aws-g6-12xlarge-plus
    env:
      CUDA_VISIBLE_DEVICES: "0,1"
      TEST_TYPE: "multi_gpu"
    container:
      image: huggingface/peft-gpu:latest
      options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
    defaults:
      run:
        shell: bash
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Pip install
        run: |
          source activate peft
          pip install -e . --no-deps
          pip install pytest-reportlog

      - name: Run common tests on multi GPU
        id: common_tests
        continue-on-error: true
        run: |
          source activate peft
          make tests_common_gpu

      - name: Run examples on multi GPU
        id: examples
        continue-on-error: true
        run: |
          source activate peft
          make tests_examples_multi_gpu

      - name: Run core tests on multi GPU
        id: core_tests
        continue-on-error: true
        run: |
          source activate peft
          make tests_core_multi_gpu

      - name: Run training on multi GPU
        id: training
        continue-on-error: true
        run: |
          source activate peft
          make tests_training

      - name: Generate Report
        if: always()
        run: |
          pip install slack_sdk tabulate
          python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY

      - name: Check for test failures
        if: |
          steps.common_tests.outcome == 'failure' ||
          steps.examples.outcome == 'failure' ||
          steps.core_tests.outcome == 'failure' ||
          steps.training.outcome == 'failure'
        run: |
          echo "One or more test suites failed. Check the logs above."
          exit 1


================================================
FILE: .github/workflows/stale.yml
================================================
name: Stale Bot

on:
  schedule:
    - cron: "0 15 * * *"

permissions: {}

jobs:
  close_stale_issues:
    name: Close Stale Issues
    if: github.repository == 'huggingface/peft'
    runs-on: ubuntu-latest
    permissions:
      issues: write
      pull-requests: write
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    steps:
    - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
      with:
        persist-credentials: false

    - name: Setup Python
      uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
      with:
        python-version: 3.11

    - name: Install requirements
      run: |
        pip install PyGithub
    - name: Close stale issues
      run: |
        python scripts/stale.py


================================================
FILE: .github/workflows/test-docker-build.yml
================================================
name: Test Docker images (on PR)

on:
  pull_request:
    paths:
      # Run only when DockerFile files are modified
      - "docker/*/Dockerfile"

permissions: {}

jobs:
  get_changed_files:
    name: "Build all modified docker images"
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - name: Check out code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Get changed files
        id: changed-files
        uses: tj-actions/changed-files@7dee1b0c1557f278e5c7dc244927139d78c0e22a  # v47.0.4
        with:
          files: docker/*/Dockerfile
          json: "true"
      - name: Run step if only the files listed above change
        if: steps.changed-files.outputs.any_changed == 'true'
        id: set-matrix
        env:
          CHANGED_FILES: "${{ steps.changed-files.outputs.all_changed_files }}"
        run: |
          echo "matrix=$(echo ${CHANGED_FILES} | sed -e 's/\\\"/\"/g')" >> $GITHUB_OUTPUT
  build_modified_files:
    needs: get_changed_files
    name: Build Docker images on modified files
    runs-on: ubuntu-latest
    if: ${{ needs.get_changed_files.outputs.matrix != '[]' }}
    strategy:
      fail-fast: false
      matrix:
        docker-file: ${{ fromJson(needs.get_changed_files.outputs.matrix) }}
    steps:
      - name: Cleanup disk
        run: |
          sudo ls -l /usr/local/lib/
          sudo ls -l /usr/share/
          sudo du -sh /usr/local/lib/
          sudo du -sh /usr/share/
          sudo rm -rf /usr/local/lib/android
          sudo rm -rf /usr/share/dotnet
          sudo du -sh /usr/local/lib/
          sudo du -sh /usr/share/
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f  # v3.12.0
      - name: Check out code
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Build Docker image
        uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8  # v6.19.2
        with:
          file: ${{ matrix.docker-file }}
          context: .
          push: False


================================================
FILE: .github/workflows/tests-main.yml
================================================
name: tests on transformers main

on:
  push:
    branches: [main]
    paths-ignore:
        - 'docs/**'

permissions: {}

jobs:
  tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Make space for cache + models
        # Ubuntu runner have less space free which is problematic since the model
        # cache + dependencies fill up the disk, leaving no space for execution.
        # So we remove some of the stuff we don't need (Java, .NET, etc.)
        #
        # Idea: https://dev.to/mathio/squeezing-disk-space-from-github-actions-runners-an-engineers-guide-3pjg
        if: matrix.os != 'windows-latest'
        run: |
          df -h

          # Remove Java (JDKs)
          sudo rm -rf /usr/lib/jvm

          # Remove .NET SDKs
          sudo rm -rf /usr/share/dotnet

          # Remove Swift toolchain
          sudo rm -rf /usr/share/swift

          # Remove Haskell (GHC)
          sudo rm -rf /usr/local/.ghcup

          # Remove Julia
          sudo rm -rf /usr/local/julia*

          # Remove Android SDKs
          sudo rm -rf /usr/local/lib/android

          # Remove Chromium (optional if not using for browser tests)
          sudo rm -rf /usr/local/share/chromium

          # Remove Microsoft/Edge and Google Chrome builds
          sudo rm -rf /opt/microsoft /opt/google

          # Remove Azure CLI
          sudo rm -rf /opt/az

          # Remove PowerShell
          sudo rm -rf /usr/local/share/powershell

          # Remove CodeQL and other toolcaches
          sudo rm -rf /opt/hostedtoolcache

          df -h
      - name: Set up Python 3.11
        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: 3.11
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          # cpu version of pytorch
          pip install -U git+https://github.com/huggingface/transformers.git
          pip install -e .[test]
      - name: Test with pytest
        env:
          TRANSFORMERS_IS_CI: 1
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: |
          make test
      - name: Post to Slack
        if: always()
        uses: huggingface/hf-workflows/.github/actions/post-slack@3f88d63d3761558a32e8e46fc2a8536e04bb2aea  # main from Feb 2025-02-24
        with:
          slack_channel: ${{ secrets.SLACK_CHANNEL_ID }}
          title: 🤗 Results of transformers main tests
          status: ${{ job.status }}
          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}


================================================
FILE: .github/workflows/tests.yml
================================================
name: tests

on:
  push:
    branches: [main]
    paths-ignore:
      - 'docs/**'
  pull_request:
    paths-ignore:
      - 'docs/**'

env:
  HF_HOME: .cache/huggingface

permissions: {}

jobs:
  check_code_quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Set up Python
        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: "3.11"
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install .[dev]
      - name: Check quality
        run: |
          make quality

  tests:
    needs: check_code_quality
    # dependabot updates (which don't require approval for CI to run) shouldn't trigger unit tests
    if: ${{ !(github.event_name == 'pull_request' && github.event.pull_request.user.login == 'dependabot[bot]') }}
    strategy:
      fail-fast: false
      matrix:
        python-version: ["3.10", "3.11", "3.12", "3.13"]
        os: ["ubuntu-latest", "windows-latest"]
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Make space for cache + models
        # Ubuntu runner have less space free which is problematic since the model
        # cache + dependencies fill up the disk, leaving no space for execution.
        # So we remove some of the stuff we don't need (Java, .NET, etc.)
        #
        # Idea: https://dev.to/mathio/squeezing-disk-space-from-github-actions-runners-an-engineers-guide-3pjg
        if: matrix.os != 'windows-latest'
        run: |
          df -h

          # Remove Java (JDKs)
          sudo rm -rf /usr/lib/jvm

          # Remove .NET SDKs
          sudo rm -rf /usr/share/dotnet

          # Remove Swift toolchain
          sudo rm -rf /usr/share/swift

          # Remove Haskell (GHC)
          sudo rm -rf /usr/local/.ghcup

          # Remove Julia
          sudo rm -rf /usr/local/julia*

          # Remove Android SDKs
          sudo rm -rf /usr/local/lib/android

          # Remove Chromium (optional if not using for browser tests)
          sudo rm -rf /usr/local/share/chromium

          # Remove Microsoft/Edge and Google Chrome builds
          sudo rm -rf /opt/microsoft /opt/google

          # Remove Azure CLI
          sudo rm -rf /opt/az

          # Remove PowerShell
          sudo rm -rf /usr/local/share/powershell

          # Remove CodeQL and other toolcaches
          sudo rm -rf /opt/hostedtoolcache

          df -h
      - name: Model cache
        uses: actions/cache/restore@cdf6c1fa76f9f475f3d7449005a359c84ca0f306  # v5.0.3
        with:
          # Avoid caching HF_HOME/modules and Python cache files to prevent interoperability
          # issues and potential cache poisioning. We also avoid lock files to prevent runs
          # avoiding re-download because they see a lock file.
          path: |
            ${{ env.HF_HOME }}/hub/**
            !${{ env.HF_HOME }}/**/*.pyc
          key: model-cache-${{ github.run_id }}
          restore-keys: model-cache-
          enableCrossOsArchive: true

      - name: Dump cache content
        # TODO: remove this step after 2025-02-15
        if: matrix.os != 'windows-latest'
        run: |
          SHASUM=sha256sum
          [ -f "$(which shasum)" ] && SHASUM=shasum
          find "${{ env.HF_HOME }}/hub" -type f -exec "$SHASUM" {} \; > cache_content_initial || true
      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6.2.0
        with:
          python-version: ${{ matrix.python-version }}
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install setuptools
          # cpu version of pytorch
          pip install -e .[test]
      - name: Test with pytest
        shell: bash
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
          TRANSFORMERS_IS_CI: 1
          CI: 1
        run: |
          make test
          # clean up all pytest temporary directories that are kept due to retention since space
          # is a scarce resource on the runners and tasks like model cache creation (further below)
          # fail if there's not enough space available.
          (rm -r "/tmp/pytest-of-$(id -u -n)" || true)
      - name: Dump cache content and diff
        # This is just debug info so that we can monitor if the model cache diverges substantially
        # over time and what the diverging model is.
        # TODO: remove after 2025-02-15
        if: matrix.os != 'windows-latest'
        run: |
          SHASUM=sha256sum
          [ -f "$(which shasum)" ] && SHASUM=shasum
          find "${{ env.HF_HOME }}/hub" -type f -exec "$SHASUM" {} \; > cache_content_after || true
          diff -udp cache_content_initial cache_content_after || true
      - name: Delete old model cache entries
        run: |
          # make sure that cache cleaning doesn't break the pipeline
          python scripts/ci_clean_cache.py -d || true
      - name: Update model cache
        uses: actions/cache/save@cdf6c1fa76f9f475f3d7449005a359c84ca0f306  # v5.0.3
        # Only let one runner (preferably the one that covers most tests) update the model cache
        # after *every* run. This way we make sure that our cache is never outdated and we don't
        # have to keep track of hashes.
        if: always() && matrix.os == 'ubuntu-latest' && matrix.python-version == '3.10'
        with:
          path: |
            ${{ env.HF_HOME }}/hub/**
            !${{ env.HF_HOME }}/**/*.pyc
          key: model-cache-${{ github.run_id }}


================================================
FILE: .github/workflows/torch_compile_tests.yml
================================================
name: torch compile tests

on:
  workflow_dispatch:
    inputs:
      branch:
        description: 'Branch to test on'
        required: true
      pytorch_nightly:
        description: 'Whether to use PyTorch nightly (true/false)'
        required: false
        default: false

env:
  RUN_SLOW: "yes"
  IS_GITHUB_CI: "1"
  # To be able to run tests on CUDA 12.2
  NVIDIA_DISABLE_REQUIRE: "1"

permissions: {}

jobs:
  run_tests_with_compile:
    runs-on:
      group: aws-g6-4xlarge-plus
    env:
      PEFT_DEBUG_WITH_TORCH_COMPILE: 1
      CUDA_VISIBLE_DEVICES: "0"
      TEST_TYPE: "single_gpu_huggingface/peft-gpu:latest"
      USE_PYTORCH_NIGHTLY: "${{ github.event.inputs.pytorch_nightly }}"
    container:
      image: "huggingface/peft-gpu:latest"
      options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
    defaults:
      run:
        shell: bash
    steps:
      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          ref: ${{ github.event.inputs.branch }}
          repository: ${{ github.event.pull_request.head.repo.full_name }}
          persist-credentials: false
      - name: Pip install
        run: |
          source activate peft
          pip install -e . --no-deps
          pip install pytest-cov pytest-reportlog parameterized datasets scipy einops
          pip install "pytest>=7.2.0,<8.0.0" # see: https://github.com/huggingface/transformers/blob/ce4fff0be7f6464d713f7ac3e0bbaafbc6959ae5/setup.py#L148C6-L148C26
          if [ "${USE_PYTORCH_NIGHTLY}" = "true" ]; then
            python -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
          fi
      - name: Test compile with pytest
        run: |
          source activate peft
          echo "PEFT_DEBUG_WITH_TORCH_COMPILE=$PEFT_DEBUG_WITH_TORCH_COMPILE"
          make tests_torch_compile


================================================
FILE: .github/workflows/trufflehog.yml
================================================
on:
  push:

name: Secret Leaks

permissions: {}

jobs:
  trufflehog:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
      with:
        fetch-depth: 0
        persist-credentials: false
    - name: Secret Scanning
      uses: trufflesecurity/trufflehog@041f07e9df901a1038a528e5525b0226d04dd5ea  # v3.93.6


================================================
FILE: .github/workflows/upload_pr_documentation.yml
================================================
name: Upload PR Documentation

on:
  workflow_run:
    workflows: ["Build PR Documentation"]
    types:
      - completed

permissions: {}

jobs:
  build:
    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@607843a35ee10949e5bea9ca2f206831b0305b4c  # main
    with:
      package_name: peft
    secrets:
      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
      comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}


================================================
FILE: .github/workflows/zizmor.yaml
================================================
name: CI security linting

on:
  push:
    branches: ["main"]
  pull_request:
    branches: ["*"]
    paths:
      - '.github/**'

permissions: {}

jobs:
  zizmor:
    name: zizmor latest via Cargo
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write
    steps:
      - name: Checkout repository
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
        with:
          persist-credentials: false
      - name: Install zizmor
        run: cargo install --locked zizmor
      - name: Run zizmor
        run: zizmor .github/workflows


================================================
FILE: .github/zizmor.yml
================================================
rules:
  dangerous-triggers:
    ignore:
      # this workflow is only triggered after maintainer approval
      - upload_pr_documentation.yml:3:1
  cache-poisoning:
    ignore:
      # the docker buildx binary is cached and zizmor warns about a cache poisoning attack.
      # OTOH this cache would make us more resilient against an intrusion on docker-buildx' side.
      # There is no obvious benefit so we leave it as it is.
      - build_docker_images.yml:37:9
      - build_docker_images.yml:70:9
      - build_docker_images.yml:103:9
      - build_docker_images.yml:136:9
      - build_docker_images.yml:169:9
  unpinned-images:
    ignore:
      # We want to test these images with the latest version and we're not using them
      # to deploy anything so we deem it safe to use those, even if they are unpinned.
      - nightly-bnb.yml:30:7
      - nightly-bnb.yml:155:7
      - nightly.yml:27:7
      - nightly.yml:95:7
      - torch_compile_tests.yml:32:7


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# VSCode
.vscode

# IntelliJ
.idea

# Mac .DS_Store
.DS_Store

# More test things
wandb

# method_comparison logs
method_comparison/MetaMathQA/cancelled_results/
method_comparison/MetaMathQA/temporary_results/


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.12.8
    hooks:
      - id: ruff
        args:
          - --fix
      - id: ruff-format
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.6.0
    hooks:
      - id: check-merge-conflict
      - id: check-yaml


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: Makefile
================================================
.PHONY: quality style test docs

check_dirs := src tests examples docs scripts docker

# Check that source code meets quality standards

# this target runs checks on all files
quality:
	ruff check $(check_dirs)
	ruff format --check $(check_dirs)
	doc-builder style src/peft tests docs/source --max_len 119 --check_only

# Format source code automatically and check is there are any problems left that need manual fixing
style:
	ruff check --fix $(check_dirs)
	ruff format $(check_dirs)
	doc-builder style src/peft tests docs/source --max_len 119

test:
	python -m pytest -n 3 tests/ $(if $(IS_GITHUB_CI),--report-log "ci_tests.log",)

tests_examples_multi_gpu:
	python -m pytest -m multi_gpu_tests tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "multi_gpu_examples.log",)

tests_examples_single_gpu:
	python -m pytest -m single_gpu_tests tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "single_gpu_examples.log",)

tests_core_multi_gpu:
	python -m pytest -m multi_gpu_tests tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_multi_gpu.log",)

tests_core_single_gpu:
	python -m pytest -m single_gpu_tests tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)

# exclude gemma tests, as generation fails with torch.compile, these failures
# trigger side effects that make other tests fail with 'RuntimeError: Offset
# increment outside graph capture encountered unexpectedly.' 
# TODO re-enable gemma once/if it is fixed
tests_common_gpu:
	python -m pytest tests/test_decoder_models.py -k "not gemma" $(if $(IS_GITHUB_CI),--report-log "common_decoder.log",)
	python -m pytest tests/test_encoder_decoder_models.py $(if $(IS_GITHUB_CI),--report-log "common_encoder_decoder.log",)
	python -m pytest tests/test_gptqmodel.py $(if $(IS_GITHUB_CI),--report-log "gptqmodel_gpu.log",)

tests_examples_multi_gpu_bnb:
	python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "multi_gpu_examples.log",)

tests_examples_single_gpu_bnb:
	python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "single_gpu_examples.log",)

tests_core_multi_gpu_bnb:
	python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_multi_gpu.log",)

tests_core_single_gpu_bnb:
	python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)

# For testing transformers tests for bnb runners
transformers_tests:
	RUN_SLOW=1 python -m pytest transformers-clone/tests/quantization/bnb $(if $(IS_GITHUB_CI),--report-log "transformers_tests.log",)

tests_regression:
	python -m pytest -s --regression tests/regression/ $(if $(IS_GITHUB_CI),--report-log "regression_tests.log",)

tests_torch_compile:
	python -m pytest tests/test_torch_compile.py $(if $(IS_GITHUB_CI),--report-log "compile_tests.log",)

tests_training:
	accelerate launch --config_file tests/training/deepspeed_config.yaml tests/training/training.py -- $(if $(IS_GITHUB_CI),--report-log "training_deepspeed.log",)
	accelerate launch --config_file tests/training/deepspeed_config.yaml tests/training/training.py --quant 4bit -- $(if $(IS_GITHUB_CI),--report-log "training_deepspeed_4bit.log",)
	accelerate launch --config_file tests/training/deepspeed_config.yaml tests/training/training.py --quant 8bit -- $(if $(IS_GITHUB_CI),--report-log "training_deepspeed_8bit.log",)
	accelerate launch --config_file tests/training/fsdp_config.yaml tests/training/training.py -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp.log",)
	accelerate launch --config_file tests/training/fsdp_config.yaml tests/training/training.py --quant 4bit -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp_4bit.log",)
	accelerate launch --config_file tests/training/fsdp2_config.yaml tests/training/training.py -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp2.log",)
	accelerate launch --config_file tests/training/fsdp2_config.yaml tests/training/training.py --quant 4bit -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp2_4bit.log",)
	accelerate launch --config_file tests/training/fsdp2_config.yaml tests/training/training.py --quant 4bit --target_modules q_proj --target_parameters v_proj.weight -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp2_target_params.log",)
	accelerate launch --config_file tests/training/fsdp_config.yaml tests/training/adapters.py -- $(if $(IS_GITHUB_CI),--report-log "training_fsdp_adapters.log",)
	accelerate launch --config_file tests/training/tp_config.yaml tests/training/lora_tp.py


================================================
FILE: README.md
================================================
<!---
Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<h1 align="center"> <p>🤗 PEFT</p></h1>
<h3 align="center">
    <p>State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods</p>
</h3>

Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.

PEFT is integrated with Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference for really big models.

> [!TIP]
> Visit the [PEFT](https://huggingface.co/PEFT) organization to read about the PEFT methods implemented in the library and to see notebooks demonstrating how to apply these methods to a variety of downstream tasks. Click the "Watch repos" button on the organization page to be notified of newly implemented methods and notebooks!

Check the PEFT Adapters API Reference section for a list of supported PEFT methods, and read the [Adapters](https://huggingface.co/docs/peft/en/conceptual_guides/adapter), [Soft prompts](https://huggingface.co/docs/peft/en/conceptual_guides/prompting), and [IA3](https://huggingface.co/docs/peft/en/conceptual_guides/ia3) conceptual guides to learn more about how these methods work.

## Quickstart

Install PEFT from pip:

```bash
pip install peft
```

Prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with `get_peft_model`. For the bigscience/mt0-large model, you're only training 0.19% of the parameters!

```python
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    task_type=TaskType.CAUSAL_LM,
    # target_modules=["q_proj", "v_proj", ...]  # optionally indicate target modules
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# prints: trainable params: 3,686,400 || all params: 3,089,625,088 || trainable%: 0.1193

# now perform training on your dataset, e.g. using transformers Trainer, then save the model
model.save_pretrained("qwen2.5-3b-lora")
```

To load a PEFT model for inference:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
model = PeftModel.from_pretrained(model, "qwen2.5-3b-lora")

inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(**inputs.to(device), max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# prints something like: Preheat the oven to 350 degrees and place the cookie dough in a baking dish [...]
```

## Why you should use PEFT

There are many benefits of using PEFT but the main one is the huge savings in compute and storage, making PEFT applicable to many different use cases.

### High performance on consumer hardware

Consider the memory requirements for training the following models on the [ought/raft/twitter_complaints](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) dataset with an A100 80GB GPU with more than 64GB of CPU RAM.

|   Model         | Full Finetuning | PEFT-LoRA PyTorch  | PEFT-LoRA DeepSpeed with CPU Offloading |
| --------- | ---- | ---- | ---- |
| bigscience/T0_3B (3B params) | 47.14GB GPU / 2.96GB CPU  | 14.4GB GPU / 2.96GB CPU | 9.8GB GPU / 17.8GB CPU |
| bigscience/mt0-xxl (12B params) | OOM GPU | 56GB GPU / 3GB CPU | 22GB GPU / 52GB CPU |
| bigscience/bloomz-7b1 (7B params) | OOM GPU | 32GB GPU / 3.8GB CPU | 18.1GB GPU / 35GB CPU |

With LoRA you can fully finetune a 12B parameter model that would've otherwise run out of memory on the 80GB GPU, and comfortably fit and train a 3B parameter model. When you look at the 3B parameter model's performance, it is comparable to a fully finetuned model at a fraction of the GPU memory.

|   Submission Name        | Accuracy |
| --------- | ---- |
| Human baseline (crowdsourced) |	0.897 |
| Flan-T5 | 0.892 |
| lora-t0-3b | 0.863 |

> [!TIP]
> The bigscience/T0_3B model performance isn't optimized in the table above. You can squeeze even more performance out of it by playing around with the input instruction templates, LoRA hyperparameters, and other training related hyperparameters. The final checkpoint size of this model is just 19MB compared to 11GB of the full bigscience/T0_3B model. Learn more about the advantages of finetuning with PEFT in this [blog post](https://www.philschmid.de/fine-tune-flan-t5-peft).

### Quantization

Quantization is another method for reducing the memory requirements of a model by representing the data in a lower precision. It can be combined with PEFT methods to make it even easier to train and load LLMs for inference.

* Learn how to finetune [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) with QLoRA and the [TRL](https://huggingface.co/docs/trl/index) library on a 16GB GPU in the [Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem](https://pytorch.org/blog/finetune-llms/) blog post.
* Learn how to finetune a [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) model for multilingual automatic speech recognition with LoRA and 8-bit quantization in this [notebook](https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing) (see this [notebook](https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing) instead for an example of streaming a dataset).

### Save compute and storage

PEFT can help you save storage by avoiding full finetuning of models on each of downstream task or dataset. In many cases, you're only finetuning a very small fraction of a model's parameters and each checkpoint is only a few MBs in size (instead of GBs). These smaller PEFT adapters demonstrate performance comparable to a fully finetuned model. If you have many datasets, you can save a lot of storage with a PEFT model and not have to worry about catastrophic forgetting or overfitting the backbone or base model.

## PEFT integrations

PEFT is widely supported across the Hugging Face ecosystem because of the massive efficiency it brings to training and inference.

### Diffusers

The iterative diffusion process consumes a lot of memory which can make it difficult to train. PEFT can help reduce the memory requirements and reduce the storage size of the final model checkpoint. For example, consider the memory required for training a Stable Diffusion model with LoRA on an A100 80GB GPU with more than 64GB of CPU RAM. The final model checkpoint size is only 8.8MB!

|   Model         | Full Finetuning | PEFT-LoRA  | PEFT-LoRA with Gradient Checkpointing  |
| --------- | ---- | ---- | ---- |
| CompVis/stable-diffusion-v1-4 | 27.5GB GPU / 3.97GB CPU | 15.5GB GPU / 3.84GB CPU | 8.12GB GPU / 3.77GB CPU | 

> [!TIP]
> Take a look at the [examples/lora_dreambooth/train_dreambooth.py](examples/lora_dreambooth/train_dreambooth.py) training script to try training your own Stable Diffusion model with LoRA, and play around with the [smangrul/peft-lora-sd-dreambooth](https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth) Space which is running on a T4 instance. Learn more about the PEFT integration in Diffusers in this [tutorial](https://huggingface.co/docs/peft/main/en/tutorial/peft_integrations#diffusers).

### Transformers

PEFT is directly integrated with [Transformers](https://huggingface.co/docs/transformers/main/en/peft). After loading a model, call `add_adapter` to add a new PEFT adapter to the model:

```python
from peft import LoraConfig
model = ...  # transformers model
peft_config = LoraConfig(...)
model.add_adapter(lora_config, adapter_name="lora_1")
```

To load a trained PEFT adapter, call `load_adapter`:

```python
model = ...  # transformers model
model.load_adapter(<path-to-adapter>, adapter_name="lora_1")
```

And to switch between different adapters, call `set_adapter`:

```python
model.set_adapter("lora_2")
```

The Transformers integration doesn't include all the functionalities offered in PEFT, such as methods for merging the adapter into the base model.

### Accelerate

[Accelerate](https://huggingface.co/docs/accelerate/index) is a library for distributed training and inference on various training setups and hardware (GPUs, TPUs, Apple Silicon, etc.). PEFT models work with Accelerate out of the box, making it really convenient to train really large models or use them for inference on consumer hardware with limited resources.

### TRL

PEFT can also be applied to training LLMs with RLHF components such as the ranker and policy. Get started by reading:

* [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library to learn more about the Direct Preference Optimization (DPO) method and how to apply it to a LLM.
* [Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU](https://huggingface.co/blog/trl-peft) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library, and then try out the [gpt2-sentiment_peft.ipynb](https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment.ipynb) notebook to optimize GPT2 to generate positive movie reviews.
* [StackLLaMA: A hands-on guide to train LLaMA with RLHF](https://huggingface.co/blog/stackllama) with PEFT, and then try out the [stack_llama/scripts](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama/scripts) for supervised finetuning, reward modeling, and RL finetuning.

## Model support

Use this [Space](https://stevhliu-peft-methods.hf.space) or check out the [docs](https://huggingface.co/docs/peft/main/en/index) to find which models officially support a PEFT method out of the box. Even if you don't see a model listed below, you can manually configure the model config to enable PEFT for a model. Read the [New transformers architecture](https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures) guide to learn how.

## Contribute

If you would like to contribute to PEFT, please check out our [contribution guide](https://huggingface.co/docs/peft/developer_guides/contributing).

## Citing 🤗 PEFT

To use 🤗 PEFT in your publication, please cite it by using the following BibTeX entry.

```bibtex
@Misc{peft,
  title =        {{PEFT}: State-of-the-art Parameter-Efficient Fine-Tuning methods},
  author =       {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan and Marian Tietz},
  howpublished = {\url{https://github.com/huggingface/peft}},
  year =         {2022}
}
```


================================================
FILE: docker/README.md
================================================
# PEFT Docker images

Here we store all PEFT Docker images used in our testing infrastructure. We use python 3.11 for now on all our images.

- `peft-cpu`: PEFT compiled on CPU with all other HF libraries installed on main branch
- `peft-gpu`: PEFT complied for NVIDIA GPUs with all other HF libraries installed on main branch


================================================
FILE: docker/peft-cpu/Dockerfile
================================================
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.11
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
    apt-get install -y curl git wget git-lfs ffmpeg libsndfile1-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists*

RUN git lfs install

# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip

# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH=/opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
    python3 -m pip install --no-cache-dir \
    librosa \
    "soundfile>=0.12.1" \
    scipy \
    git+https://github.com/huggingface/transformers \
    git+https://github.com/huggingface/accelerate \
    peft[test]@git+https://github.com/huggingface/peft

# Install apt libs
RUN apt-get update && \
    apt-get install -y curl git wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists*

RUN echo "source activate peft" >> ~/.profile

# Activate the virtualenv
CMD ["/bin/bash"]


================================================
FILE: docker/peft-gpu/Dockerfile
================================================
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.11
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# Install audio-related libraries
RUN apt-get update && \
    apt-get install -y curl git wget git-lfs ffmpeg libsndfile1-dev && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists*

RUN git lfs install

# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip

# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH=/opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

# Stage 2
FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH=/opt/conda/bin:$PATH

# Install apt libs
RUN apt-get update && \
    apt-get install -y curl git wget && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists*

RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]

RUN conda run -n peft pip install --no-cache-dir bitsandbytes optimum

# GPTQmodel doesn't find torch without build isolation
#
# Note: we are hard-coding CUDA_ARCH_LIST here since `gptqmodel` requires either nvidia-smi
# or CUDA_ARCH_LIST for compute capability information. Since the docker build is unlikely
# to have compute hardware available we use the information from the CI runner (which hosts
# a NVIDIA L4). So we fix the compute capability to 8.9. In the future we might extend this
# to a list of compute capabilities (separated by ;).
RUN CUDA_ARCH_LIST=8.9 conda run -n peft pip install --no-build-isolation gptqmodel

RUN \
    # Add eetq for quantization testing; needs to run without build isolation since the setup
    # script directly imports torch from the environment which would fail with isolation.
    conda run -n peft pip install --no-build-isolation git+https://github.com/NetEase-FuXi/EETQ.git

RUN \
    conda run -n peft pip install --no-build-isolation "transformer_engine[pytorch]"

# Activate the conda env and install transformers + accelerate from source
RUN conda run -n peft pip install -U --no-cache-dir \
        librosa \
        "soundfile>=0.12.1" \
        scipy \
        torchao \
        fbgemm-gpu-genai>=1.2.0 \
        git+https://github.com/huggingface/transformers \
        git+https://github.com/huggingface/accelerate \
        peft[test]@git+https://github.com/huggingface/peft \
        # Add aqlm for quantization testing
        aqlm[gpu]>=1.0.2 \
        # Add HQQ for quantization testing
        hqq \
        deepspeed

RUN conda run -n peft pip freeze | grep transformers

RUN echo "source activate peft" >> ~/.profile

# Activate the virtualenv
CMD ["/bin/bash"]


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    =
SPHINXBUILD   = sphinx-build
SOURCEDIR     = source
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

================================================
FILE: docs/README.md
================================================
<!---
Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Generating the documentation

To generate the documentation, you first have to build it. Several packages are necessary to build the doc, 
you can install them with the following command, at the root of the code repository:

```bash
pip install -e ".[docs]"
```

Then you need to install our special tool that builds the documentation:

```bash
pip install git+https://github.com/huggingface/doc-builder
```

---
**NOTE**

You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look before committing for instance). You don't have to commit to the built documentation.

---

## Building the documentation

Once you have setup the `doc-builder` and additional packages, you can generate the documentation by 
typing the following command:

```bash
doc-builder build peft docs/source/ --build_dir ~/tmp/test-build
```

You can adapt the `--build_dir` to set any temporary folder you prefer. This command will create it and generate
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.

## Previewing the documentation

To preview the docs, first install the `watchdog` module with:

```bash
pip install watchdog
```

Then run the following command:

```bash
doc-builder preview {package_name} {path_to_docs}
```

For example:

```bash
doc-builder preview peft docs/source
```

The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.

---
**NOTE**

The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).

---

## Adding a new element to the navigation bar

Accepted files are Markdown (.md or .mdx).

Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/peft/blob/main/docs/source/_toctree.yml) file.

## Renaming section headers and moving sections

It helps to keep the old links working when renaming the section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums, and Social media and it'd make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.

Therefore, we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.

So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:

```
Sections that were moved:

[ <a href="#section-b">Section A</a><a id="section-a"></a> ]
```
and of course, if you moved it to another file, then:

```
Sections that were moved:

[ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
```

Use the relative style to link to the new file so that the versioned docs continue to work.


## Writing Documentation - Specification

The `huggingface/peft` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings,
although we can write them directly in Markdown.

### Adding a new tutorial

Adding a new tutorial or section is done in two steps:

- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/_toctree.yml` on the correct toc-tree.

Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users, or researchers) it should go into sections two, three, or
four.

### Writing source documentation

Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names
and objects like True, None, or any strings should usually be put in `code`.

When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or 
function to be in the main package.

If you want to create a link to some internal class or function, you need to
provide its path. For instance: \[\`utils.gather\`\]. This will be converted into a link with
`utils.gather` in the description. To get rid of the path and only keep the name of the object you are
linking to in the description, add a ~: \[\`~utils.gather\`\] will generate a link with `gather` in the description.

The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].

#### Defining arguments in a method

Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and
an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon, and its
description:

```
    Args:
        n_layers (`int`): The number of layers of the model.
```

If the description is too long to fit in one line (more than 119 characters in total), another indentation is necessary 
before writing the description after the argument.

Finally, to maintain uniformity if any *one* description is too long to fit on one line, the 
rest of the parameters should follow suit and have an indention before their description.

Here's an example showcasing everything so far:

```
    Args:
        gradient_accumulation_steps (`int`, *optional*, default to 1):
            The number of steps that should pass before gradients are accumulated. A number > 1 should be combined with `Accelerator.accumulate`.
        cpu (`bool`, *optional*):
            Whether or not to force the script to execute on CPU. Will ignore GPU available if set to `True` and force the execution on one process only.
```

For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:

```
def my_function(x: str = None, a: float = 1):
```

then its documentation should look like this:

```
    Args:
        x (`str`, *optional*):
            This argument controls ... and has a description longer than 119 chars.
        a (`float`, *optional*, defaults to 1):
            This argument is used to ... and has a description longer than 119 chars.
```

Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it into several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).

#### Writing a multi-line code block

Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown:


````
```python
# first line of code
# second line
# etc
```
````

#### Writing a return block

The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.

Here's an example of a single value return:

```
    Returns:
        `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```

Here's an example of a tuple return, comprising several objects:

```
    Returns:
        `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
        - ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
          Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
        - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
          Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```

## Styling the docstring

We have an automatic script running with the `make style` comment that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library

This script may have some weird failures if you make a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
easily.

## Writing documentation examples

The syntax, for example, docstrings can look as follows:

```
    Example:

    ```python
    >>> import time
    >>> from accelerate import Accelerator
    >>> accelerator = Accelerator()
    >>> if accelerator.is_main_process:
    ...     time.sleep(2)
    >>> else:
    ...     print("I'm waiting for the main process to finish its sleep...")
    >>> accelerator.wait_for_everyone()
    >>> # Should print on every process at the same time
    >>> print("Everyone is here")
    ```
```

The docstring should give a minimal, clear example of how the respective function 
is to be used in inference and also include the expected (ideally sensible)
output.
Often, readers will try out the example before even going through the function 
or class definitions. Therefore, it is of utmost importance that the example 
works as expected.


================================================
FILE: docs/source/_config.py
================================================
# docstyle-ignore
INSTALL_CONTENT = """
# PEFT installation
! pip install peft accelerate transformers
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/peft.git
"""


================================================
FILE: docs/source/_toctree.yml
================================================
- title: Get started
  sections:
  - local: index
    title: 🤗 PEFT
  - local: quicktour
    title: Quicktour
  - local: install
    title: Installation

- title: Tutorial
  sections:
  - local: tutorial/peft_model_config
    title: Configurations and models
  - local: tutorial/peft_integrations
    title: Integrations

- title: PEFT method guides
  sections:
  - local: task_guides/prompt_based_methods
    title: Prompt-based methods
  - local: task_guides/lora_based_methods
    title: LoRA methods
  - local: task_guides/ia3
    title: IA3

- title: Developer guides
  sections:
  - local: developer_guides/model_merging
    title: Model merging
  - local: developer_guides/quantization
    title: Quantization
  - local: developer_guides/lora
    title: LoRA
  - local: developer_guides/custom_models
    title: Custom models
  - local: developer_guides/low_level_api
    title: Adapter injection
  - local: developer_guides/mixed_models
    title: Mixed adapter types
  - local: developer_guides/torch_compile
    title: torch.compile
  - local: developer_guides/contributing
    title: Contribute to PEFT
  - local: developer_guides/troubleshooting
    title: Troubleshooting
  - local: developer_guides/checkpoint
    title: PEFT checkpoint format

- title: 🤗 Accelerate integrations
  sections:
  - local: accelerate/deepspeed
    title: DeepSpeed
  - local: accelerate/fsdp
    title: Fully Sharded Data Parallel

- title: Conceptual guides
  sections:
  - local: conceptual_guides/adapter
    title: Adapters
  - local: conceptual_guides/prompting
    title: Soft prompts
  - local: conceptual_guides/ia3
    title: IA3
  - local: conceptual_guides/oft
    title: OFT/BOFT

- sections:
  - sections:
    - local: package_reference/auto_class
      title: AutoPeftModel
    - local: package_reference/peft_model
      title: PEFT model
    - local: package_reference/peft_types
      title: PEFT types
    - local: package_reference/config
      title: Configuration
    - local: package_reference/tuners
      title: Tuner
    title: Main classes
  - sections:
    - local: package_reference/adalora
      title: AdaLoRA
    - local: package_reference/ia3
      title: IA3
    - local: package_reference/llama_adapter
      title: Llama-Adapter
    - local: package_reference/loha
      title: LoHa
    - local: package_reference/lokr
      title: LoKr
    - local: package_reference/lora
      title: LoRA
    - local: package_reference/osf
      title: OSF
    - local: package_reference/xlora
      title: X-LoRA
    - local: package_reference/adapter_utils
      title: LyCORIS
    - local: package_reference/multitask_prompt_tuning
      title: Multitask Prompt Tuning
    - local: package_reference/oft
      title: OFT
    - local: package_reference/boft
      title: BOFT
    - local: package_reference/psoft
      title: PSOFT
    - local: package_reference/poly
      title: Polytropon
    - local: package_reference/p_tuning
      title: P-tuning
    - local: package_reference/prefix_tuning
      title: Prefix tuning
    - local: package_reference/cartridges
      title: Cartridges
    - local: package_reference/prompt_tuning
      title: Prompt tuning
    - local: package_reference/layernorm_tuning
      title: Layernorm tuning
    - local: package_reference/vera
      title: VeRA
    - local: package_reference/pvera
      title: PVeRA
    - local: package_reference/fourierft
      title: FourierFT
    - local: package_reference/gralora
      title: GraLoRA
    - local: package_reference/vblora
      title: VB-LoRA
    - local: package_reference/hra
      title: HRA
    - local: package_reference/cpt
      title: CPT
    - local: package_reference/trainable_tokens
      title: Trainable Tokens
    - local: package_reference/randlora
      title: RandLora
    - local: package_reference/shira
      title: SHiRA
    - local: package_reference/c3a
      title: C3A
    - local: package_reference/miss
      title: MiSS
    - local: package_reference/road
      title: RoAd
    - local: package_reference/waveft
      title: WaveFT
    - local: package_reference/delora
      title: DeLoRA
    - local: package_reference/lily
      title: Lily
    - local: package_reference/peanut
      title: PEANuT

    title: Adapters
  - sections:
    - local: package_reference/merge_utils
      title: Model merge
    - local: package_reference/helpers
      title: Helpers
    - local: package_reference/hotswap
      title: Hotswapping adapters
    - local: package_reference/functional
      title: Functions for PEFT integration
    - local: package_reference/lora_conversion
      title: Converting non-LoRA adapters to LoRA
    title: Utilities
  title: API reference


================================================
FILE: docs/source/accelerate/deepspeed.md
================================================
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# DeepSpeed

[DeepSpeed](https://www.deepspeed.ai/) is a library designed for speed and scale for distributed training of large models with billions of parameters. At its core is the Zero Redundancy Optimizer (ZeRO) that shards optimizer states (ZeRO-1), gradients (ZeRO-2), and parameters (ZeRO-3) across data parallel processes. This drastically reduces memory usage, allowing you to scale your training to billion parameter models. To unlock even more memory efficiency, ZeRO-Offload reduces GPU compute and memory by leveraging CPU resources during optimization.

Both of these features are supported in 🤗 Accelerate, and you can use them with 🤗 PEFT. 

## Compatibility with `bitsandbytes` quantization + LoRA

Below is a table that summarizes the compatibility between PEFT's LoRA, [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) library and DeepSpeed Zero stages with respect to fine-tuning. DeepSpeed Zero-1 and 2 will have no effect at inference as stage 1 shards the optimizer states and stage 2 shards the optimizer states and gradients:

| DeepSpeed stage   | Is compatible? |
|---|---|
| Zero-1 |  🟢 |
| Zero-2   |  🟢 |
| Zero-3  |  🟢 |

For DeepSpeed Stage 3 + QLoRA, please refer to the section [Use PEFT QLoRA and DeepSpeed with ZeRO3 for finetuning large models on multiple GPUs](#use-peft-qlora-and-deepspeed-with-zero3-for-finetuning-large-models-on-multiple-gpus) below.

For confirming these observations, we ran the SFT (Supervised Fine-tuning) [offical example scripts](https://github.com/huggingface/trl/tree/main/examples) of the [Transformers Reinforcement Learning (TRL) library](https://github.com/huggingface/trl) using QLoRA + PEFT and the accelerate configs available [here](https://github.com/huggingface/trl/tree/main/examples/accelerate_configs). We ran these experiments on a 2x NVIDIA T4 GPU.

# Use PEFT and DeepSpeed with ZeRO3 for finetuning large models on multiple devices and multiple nodes

This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/sft/train.py) for performing SFT. You'll configure the script to do SFT (supervised fine-tuning) of Llama-70B model with LoRA and ZeRO-3 on 8xH100 80GB GPUs on a single machine. You can configure it to scale to multiple machines by changing the accelerate config.

## Configuration

Start by running the following command to [create a DeepSpeed configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.

The configuration file is used to set the default options when you launch the training script.

```bash
accelerate config --config_file deepspeed_config.yaml
```

You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll use ZeRO-3 so make sure you pick those options.

```bash
`zero_stage`: [0] Disabled, [1] optimizer state partitioning, [2] optimizer+gradient state partitioning and [3] optimizer+gradient+parameter partitioning
`gradient_accumulation_steps`: Number of training steps to accumulate gradients before averaging and applying them. Pass the same value as you would pass via cmd argument else you will encounter mismatch error.
`gradient_clipping`: Enable gradient clipping with value. Don't set this as you will be passing it via cmd arguments.
`offload_optimizer_device`: [none] Disable optimizer offloading, [cpu] offload optimizer to CPU, [nvme] offload optimizer to NVMe SSD. Only applicable with ZeRO >= Stage-2. Set this as `none` as don't want to enable offloading.
`offload_param_device`: [none] Disable parameter offloading, [cpu] offload parameters to CPU, [nvme] offload parameters to NVMe SSD. Only applicable with ZeRO Stage-3. Set this as `none` as don't want to enable offloading.
`zero3_init_flag`: Decides whether to enable `deepspeed.zero.Init` for constructing massive models. Only applicable with ZeRO Stage-3. Set this to `True`.
`zero3_save_16bit_model`: Decides whether to save 16-bit model weights when using ZeRO Stage-3. Set this to `True`.
`mixed_precision`: `no` for FP32 training, `fp16` for FP16 mixed-precision training and `bf16` for BF16 mixed-precision training. Set this to `True`.
```

Once this is done, the corresponding config should look like below and you can find it in config folder at [deepspeed_config.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/deepspeed_config.yaml):

```yml
compute_environment: LOCAL_MACHINE                                                                                                                                           
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  gradient_accumulation_steps: 4
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

## Launch command

The launch command is available at [run_peft_deepspeed.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_deepspeed.sh) and it is also shown below:
```bash
accelerate launch --config_file "configs/deepspeed_config.yaml"  train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_steps 0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-lora-deepspeed" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False
```

Notice that we are using LoRA with  rank=8, alpha=16 and targeting all linear layers. We are passing the deepspeed config file and finetuning 70B Llama model on a subset of the ultrachat dataset.

## The important parts

Let's dive a little deeper into the script so you can see what's going on, and understand how it works.

The first thing to know is that the script uses DeepSpeed for distributed training as the DeepSpeed config has been passed. The [`~trl.SFTTrainer`] class handles all the heavy lifting of creating the PEFT model using the peft config that is passed. After that, when you call `trainer.train()`, [`~trl.SFTTrainer`] internally uses 🤗 Accelerate to prepare the model, optimizer and trainer using the DeepSpeed config to create DeepSpeed engine which is then trained. The main code snippet is below:

```python
# trainer
trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
)
trainer.accelerator.print(f"{trainer.model}")

# train
checkpoint = None
if training_args.resume_from_checkpoint is not None:
    checkpoint = training_args.resume_from_checkpoint
trainer.train(resume_from_checkpoint=checkpoint)

# saving final model
trainer.save_model()
```

## Memory usage

In the above example, the memory consumed per GPU is 64 GB (80%) as seen in the screenshot below:

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_deepspeed_mem_usage.png"/>
</div>
<small>GPU memory usage for the training run</small>

## More resources
You can also refer this blog post [Falcon 180B Finetuning using 🤗 PEFT and DeepSpeed](https://medium.com/@sourabmangrulkar/falcon-180b-finetuning-using-peft-and-deepspeed-b92643091d99) on how to finetune 180B Falcon model on 16 A100 GPUs on 2 machines.


# Use PEFT QLoRA and DeepSpeed with ZeRO3 for finetuning large models on multiple GPUs

In this section, we will look at how to use QLoRA and DeepSpeed Stage-3 for finetuning 70B llama model on 2X40GB GPUs.
For this, we first need `bitsandbytes>=0.43.3`, `accelerate>=1.0.1`, `transformers>4.44.2`, `trl>0.11.4` and `peft>0.13.0`. We need to set `zero3_init_flag` to true when using Accelerate config. Below is the config which can be found at [deepspeed_config_z3_qlora.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/deepspeed_config_z3_qlora.yaml):

```yml
compute_environment: LOCAL_MACHINE                                                                                                                                           
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  offload_optimizer_device: none
  offload_param_device: none
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

Launch command is given below which is available at [run_peft_qlora_deepspeed_stage3.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_deepspeed_stage3.sh):
```
accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml"  train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_steps 0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-qlora-dsz3" \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 2 \
--gradient_checkpointing True \
--use_reentrant True \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization True \
--use_nested_quant True \
--bnb_4bit_compute_dtype "bfloat16" \
--bnb_4bit_quant_storage_dtype "bfloat16"
```

Notice the new argument being passed `bnb_4bit_quant_storage_dtype` which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **32/4 = 8** 4-bit params are packed together post quantization.

In terms of training code, the important code changes are: 

```diff
...

bnb_config = BitsAndBytesConfig(
    load_in_4bit=args.use_4bit_quantization,
    bnb_4bit_quant_type=args.bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=args.use_nested_quant,
+   bnb_4bit_quant_storage=quant_storage_dtype,
)

...

model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    quantization_config=bnb_config,
    trust_remote_code=True,
    attn_implementation="flash_attention_2" if args.use_flash_attn else "eager",
+   dtype=quant_storage_dtype or torch.float32,
)
```

Notice that `dtype` for `AutoModelForCausalLM` is same as the `bnb_4bit_quant_storage` data type. That's it. Everything else is handled by Trainer and TRL.

## Memory usage

In the above example, the memory consumed per GPU is **36.6 GB**. Therefore, what took 8X80GB GPUs with DeepSpeed Stage 3+LoRA and a couple of 80GB GPUs with DDP+QLoRA now requires 2X40GB GPUs. This makes finetuning of large models more accessible.

# Use PEFT and DeepSpeed with ZeRO3 and CPU Offloading for finetuning large models on a single GPU
This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You'll configure the script to train a large model for conditional generation with ZeRO-3 and CPU Offload.

> [!TIP]
> 💡 To help you get started, check out our example training scripts for [causal language modeling](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py) and [conditional generation](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You can adapt these scripts for your own applications or even use them out of the box if your task is similar to the one in the scripts.

## Configuration

Start by running the following command to [create a DeepSpeed configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.

The configuration file is used to set the default options when you launch the training script.

```bash
accelerate config --config_file ds_zero3_cpu.yaml
```

You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll use ZeRO-3 along with CPU-Offload so make sure you pick those options.

```bash
`zero_stage`: [0] Disabled, [1] optimizer state partitioning, [2] optimizer+gradient state partitioning and [3] optimizer+gradient+parameter partitioning
`gradient_accumulation_steps`: Number of training steps to accumulate gradients before averaging and applying them.
`gradient_clipping`: Enable gradient clipping with value.
`offload_optimizer_device`: [none] Disable optimizer offloading, [cpu] offload optimizer to CPU, [nvme] offload optimizer to NVMe SSD. Only applicable with ZeRO >= Stage-2.
`offload_param_device`: [none] Disable parameter offloading, [cpu] offload parameters to CPU, [nvme] offload parameters to NVMe SSD. Only applicable with ZeRO Stage-3.
`zero3_init_flag`: Decides whether to enable `deepspeed.zero.Init` for constructing massive models. Only applicable with ZeRO Stage-3.
`zero3_save_16bit_model`: Decides whether to save 16-bit model weights when using ZeRO Stage-3.
`mixed_precision`: `no` for FP32 training, `fp16` for FP16 mixed-precision training and `bf16` for BF16 mixed-precision training. 
```

An example [configuration file](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/accelerate_ds_zero3_cpu_offload_config.yaml) might look like the following. The most important thing to notice is that `zero_stage` is set to `3`, and `offload_optimizer_device` and `offload_param_device` are set to the `cpu`.

```yml
compute_environment: LOCAL_MACHINE
deepspeed_config:
  gradient_accumulation_steps: 1
  gradient_clipping: 1.0
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: true
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
use_cpu: false
```

## The important parts

Let's dive a little deeper into the script so you can see what's going on, and understand how it works.

Within the [`main`](https://github.com/huggingface/peft/blob/2822398fbe896f25d4dac5e468624dc5fd65a51b/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py#L103) function, the script creates an [`~accelerate.Accelerator`] class to initialize all the necessary requirements for distributed training.

> [!TIP]
> 💡 Feel free to change the model and dataset inside the `main` function. If your dataset format is different from the one in the script, you may also need to write your own preprocessing function.

The script also creates a configuration for the 🤗 PEFT method you're using, which in this case, is LoRA. The [`LoraConfig`] specifies the task type and important parameters such as the dimension of the low-rank matrices, the matrices scaling factor, and the dropout probability of the LoRA layers. If you want to use a different 🤗 PEFT method, make sure you replace `LoraConfig` with the appropriate [class](../package_reference/tuners).

```diff
 def main():
+    accelerator = Accelerator()
     model_name_or_path = "facebook/bart-large"
     dataset_name = "twitter_complaints"
+    peft_config = LoraConfig(
         task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
     )
```

Throughout the script, you'll see the [`~accelerate.Accelerator.main_process_first`] and [`~accelerate.Accelerator.wait_for_everyone`] functions which help control and synchronize when processes are executed.

The [`get_peft_model`] function takes a base model and the [`peft_config`] you prepared earlier to create a [`PeftModel`]:

```diff
  model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
+ model = get_peft_model(model, peft_config)
```

Pass all the relevant training objects to 🤗 Accelerate's [`~accelerate.Accelerator.prepare`] which makes sure everything is ready for training:

```py
model, train_dataloader, eval_dataloader, test_dataloader, optimizer, lr_scheduler = accelerator.prepare(
    model, train_dataloader, eval_dataloader, test_dataloader, optimizer, lr_scheduler
)
```

The next bit of code checks whether the DeepSpeed plugin is used in the `Accelerator`, and if the plugin exists, then we check if we are using ZeRO-3. This conditional flag is used when calling `generate` function call during inference for syncing GPUs when the model parameters are sharded:

```py
is_ds_zero_3 = False
if getattr(accelerator.state, "deepspeed_plugin", None):
    is_ds_zero_3 = accelerator.state.deepspeed_plugin.zero_stage == 3
```

Inside the training loop, the usual `loss.backward()` is replaced by 🤗 Accelerate's [`~accelerate.Accelerator.backward`] which uses the correct `backward()` method based on your configuration:

```diff
  for epoch in range(num_epochs):
      with TorchTracemalloc() as tracemalloc:
          model.train()
          total_loss = 0
          for step, batch in enumerate(tqdm(train_dataloader)):
              outputs = model(**batch)
              loss = outputs.loss
              total_loss += loss.detach().float()
+             accelerator.backward(loss)
              optimizer.step()
              lr_scheduler.step()
              optimizer.zero_grad()
```

That is all! The rest of the script handles the training loop, evaluation, and even pushes it to the Hub for you.

## Train

Run the following command to launch the training script. Earlier, you saved the configuration file to `ds_zero3_cpu.yaml`, so you'll need to pass the path to the launcher with the `--config_file` argument like this:

```bash
accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
```

You'll see some output logs that track memory usage during training, and once it's completed, the script returns the accuracy and compares the predictions to the labels:

```bash
GPU Memory before entering the train : 1916
GPU Memory consumed at the end of the train (end-begin): 66
GPU Peak Memory consumed during the train (max-begin): 7488
GPU Total Peak Memory consumed during the train (max): 9404
CPU Memory before entering the train : 19411
CPU Memory consumed at the end of the train (end-begin): 0
CPU Peak Memory consumed during the train (max-begin): 0
CPU Total Peak Memory consumed during the train (max): 19411
epoch=4: train_ppl=tensor(1.0705, device='cuda:0') train_epoch_loss=tensor(0.0681, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:27<00:00,  3.92s/it]
GPU Memory before entering the eval : 1982
GPU Memory consumed at the end of the eval (end-begin): -66
GPU Peak Memory consumed during the eval (max-begin): 672
GPU Total Peak Memory consumed during the eval (max): 2654
CPU Memory before entering the eval : 19411
CPU Memory consumed at the end of the eval (end-begin): 0
CPU Peak Memory consumed during the eval (max-begin): 0
CPU Total Peak Memory consumed during the eval (max): 19411
accuracy=100.0
eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
```

# Caveats
1. Merging when using PEFT and DeepSpeed is currently unsupported and will raise error.
2. When using CPU offloading, the major gains from using PEFT to shrink the optimizer states and gradients to that of the adapter weights would be realized on CPU RAM and there won't be savings with respect to GPU memory.
3. DeepSpeed Stage 3 and qlora when used with CPU offloading leads to more GPU memory usage when compared to disabling CPU offloading. 

> [!TIP]
> 💡 When you have code that requires merging (and unmerging) of weights, try to manually collect the parameters with DeepSpeed Zero-3 beforehand:
>
> ```python
> import deepspeed
>
> is_ds_zero_3 = ... # check if Zero-3
>
> with deepspeed.zero.GatheredParameters(list(model.parameters()), enabled= is_ds_zero_3):
>     model.merge_adapter()
>     # do whatever is needed, then unmerge in the same context if unmerging is required
>     ...
>     model.unmerge_adapter()
> ```


================================================
FILE: docs/source/accelerate/fsdp.md
================================================
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Fully Sharded Data Parallel

[Fully sharded data parallel](https://pytorch.org/docs/stable/fsdp.html) (FSDP) is developed for distributed training of large pretrained models up to 1T parameters. FSDP achieves this by sharding the model parameters, gradients, and optimizer states across data parallel processes and it can also offload sharded model parameters to a CPU. The memory efficiency afforded by FSDP allows you to scale training to larger batch or model sizes.

Both of these features are supported in 🤗 Accelerate, and you can use them with 🤗 PEFT. 

# Use PEFT and FSDP
This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/sft/train.py) for performing SFT. You'll configure the script to do SFT (supervised fine-tuning) of Llama-70B model with LoRA and FSDP on 8xH100 80GB GPUs on a single machine. You can configure it to scale to multiple machines by changing the accelerate config.

## Configuration

Start by running the following command to [create a FSDP configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.

The configuration file is used to set the default options when you launch the training script.

```bash
accelerate config --config_file fsdp_config.yaml
```

You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll answer the questionnaire as shown in the image below.
<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/fsdp-peft-config.png"/>
</div>
<small>Creating Accelerate's config to use FSDP</small>

Once this is done, the corresponding config should look like below and you can find it in config folder at [fsdp_config.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config.yaml):

```yml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

## Launch command

The launch command is available at [run_peft_fsdp.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_fsdp.sh) and it is also shown below:
```bash
accelerate launch --config_file "configs/fsdp_config.yaml"  train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_steps 0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-lora-fsdp" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False
```

Notice that we are using LoRA with  rank=8, alpha=16 and targeting all linear layers. We are passing the FSDP config file and finetuning the 70B Llama model on a subset of the [ultrachat dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k).

## The important parts

Let's dive a little deeper into the script so you can see what's going on, and understand how it works.

The first thing to know is that the script uses FSDP for distributed training as the FSDP config has been passed. The [`~trl.SFTTrainer`] class handles all the heavy lifting of creating PEFT model using the peft config that is passed. After that when you call `trainer.train()`, Trainer internally uses 🤗 Accelerate to prepare model, optimizer and trainer using the FSDP config to create FSDP wrapped model which is then trained. The main code snippet is below:

```python
# trainer
trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
)
trainer.accelerator.print(f"{trainer.model}")
if model_args.use_peft_lora:
    # handle PEFT+FSDP case
    trainer.model.print_trainable_parameters()
    if getattr(trainer.accelerator.state, "fsdp_plugin", None):
        from peft.utils.other import fsdp_auto_wrap_policy

        fsdp_plugin = trainer.accelerator.state.fsdp_plugin
        fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model)

# train
checkpoint = None
if training_args.resume_from_checkpoint is not None:
    checkpoint = training_args.resume_from_checkpoint
trainer.train(resume_from_checkpoint=checkpoint)

# saving final model
if trainer.is_fsdp_enabled:
    trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
trainer.save_model()
```


Here, one main thing to note currently when using FSDP with PEFT is that `use_orig_params` needs to be `False` to realize GPU memory savings. Due to `use_orig_params=False`, the auto wrap policy for FSDP needs to change so that trainable and non-trainable parameters are wrapped separately. This is done by the code snippt below which uses the util function `fsdp_auto_wrap_policy` from PEFT:

```
if getattr(trainer.accelerator.state, "fsdp_plugin", None):
    from peft.utils.other import fsdp_auto_wrap_policy

    fsdp_plugin = trainer.accelerator.state.fsdp_plugin
    fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model)
```

## Memory usage

In the above example, the memory consumed per GPU is  72-80 GB (90-98%) as seen in the screenshot below. The slight increase in GPU memory at the end is when saving the model using `FULL_STATE_DICT` state dict type instead of the `SHARDED_STATE_DICT` so that the model has adapter weights that can be loaded normally with `from_pretrained` method during inference:

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_fsdp_mem_usage.png"/>
</div>
<small>GPU memory usage for the training run</small>

# Use PEFT QLoRA and FSDP for finetuning large models on multiple GPUs

In this section, we will look at how to use QLoRA and FSDP for finetuning 70B llama model on 2X24GB GPUs. [Answer.AI](https://www.answer.ai/) in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost [You can now train a 70b language model at home](https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html). This is now integrated in Hugging Face ecosystem. 

For this, we first need `bitsandbytes>=0.43.3`, `accelerate>=1.0.1`, `transformers>4.44.2`, `trl>0.11.4` and `peft>0.13.0`. We need to set `fsdp_cpu_ram_efficient_loading=true`, `fsdp_use_orig_params=false` and `fsdp_offload_params=true`(cpu offloading) when using Accelerate config. When not using accelerate launcher, you can alternately set the environment variable `export FSDP_CPU_RAM_EFFICIENT_LOADING=true`.  Here, we will be using accelerate config and below is the config which can be found at [fsdp_config_qlora.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config_qlora.yaml):

```yml
compute_environment: LOCAL_MACHINE                                                                                                                                           
debug: false                                                                                                                                                                 
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: true
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: SHARDED_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```

Launch command is given below which is available at [run_peft_qlora_fsdp.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh):
```
accelerate launch --config_file "configs/fsdp_config_qlora.yaml"  train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_steps 0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-qlora-fsdp" \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 2 \
--gradient_checkpointing True \
--use_reentrant True \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization True \
--use_nested_quant True \
--bnb_4bit_compute_dtype "bfloat16" \
--bnb_4bit_quant_storage_dtype "bfloat16"
```

Notice the new argument being passed, `bnb_4bit_quant_storage_dtype`, which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **16/4 = 4** 4-bit params are packed together post quantization. When using mixed precision training with `bfloat16`, `bnb_4bit_quant_storage_dtype` can be either `bfloat16` for pure `bfloat16` finetuning, or `float32` for automatic mixed precision (this consumes more GPU memory). When using mixed precision training with `float16`, `bnb_4bit_quant_storage_dtype` should be set to `float32` for stable automatic mixed precision training.

In terms of training code, the important code changes are: 

```diff
...

bnb_config = BitsAndBytesConfig(
    load_in_4bit=args.use_4bit_quantization,
    bnb_4bit_quant_type=args.bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=args.use_nested_quant,
+   bnb_4bit_quant_storage=quant_storage_dtype,
)

...

model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    quantization_config=bnb_config,
    trust_remote_code=True,
    attn_implementation="flash_attention_2" if args.use_flash_attn else "eager",
+   dtype=quant_storage_dtype or torch.float32,
)
```

Notice that `dtype` for `AutoModelForCausalLM` is same as the `bnb_4bit_quant_storage` data type. That's it. Everything else is handled by Trainer and TRL.

## Memory usage

In the above example, the memory consumed per GPU is **19.6 GB** while CPU RAM usage is around **107 GB**. When disabling CPU offloading, the GPU memory usage is  **35.6 GB/ GPU**. Therefore, what took 16X80GB GPUs for full finetuning, 8X80GB GPUs with FSDP+LoRA, and a couple of 80GB GPUs with DDP+QLoRA, now requires 2X24GB GPUs. This makes finetuning of large models more accessible.

## More resources
You can also refer the [llama-recipes](https://github.com/facebookresearch/llama-recipes/?tab=readme-ov-file#fine-tuning) repo and [Getting started with Llama](https://llama.meta.com/get-started/#fine-tuning) guide on how to finetune using FSDP and PEFT.

## Caveats
1. Merging when using PEFT and FSDP is currently unsupported and will raise error.
2. Passing `modules_to_save` config parameter to is untested at present.
3. GPU Memory saving when using CPU Offloading is untested at present.
4. When using FSDP+QLoRA, `paged_adamw_8bit` currently results in an error when saving a checkpoint.
5. DoRA training with FSDP should work (albeit at lower speed than LoRA). If combined with bitsandbytes (QDoRA), 4-bit quantization should also work, but 8-bit quantization has known issues and is not recommended.


================================================
FILE: docs/source/conceptual_guides/adapter.md
================================================
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Adapters

Adapter-based methods add extra trainable parameters after the attention and fully-connected layers of a frozen pretrained model to reduce memory-usage and speed up training. The method varies depending on the adapter, it could simply be an extra added layer or it could be expressing the weight updates ∆W as a low-rank decomposition of the weight matrix. Either way, the adapters are typically small but demonstrate comparable performance to a fully finetuned model and enable training larger models with fewer resources.

This guide will give you a brief overview of the adapter methods supported by PEFT (if you're interested in learning more details about a specific method, take a look at the linked paper).

## Low-Rank Adaptation (LoRA)

> [!TIP]
> LoRA is one of the most popular PEFT methods and a good starting point if you're just getting started with PEFT. It was originally developed for large language models but it is a tremendously popular training method for diffusion models because of its efficiency and effectiveness.

As mentioned briefly earlier, [LoRA](https://hf.co/papers/2106.09685) is a technique that accelerates finetuning large models while consuming less memory.

LoRA represents the weight updates ∆W with two smaller matrices (called *update matrices*) through low-rank decomposition. These new matrices can be trained to adapt to the new data while keeping the overall number of parameters low. The original weight matrix remains frozen and doesn't receive any further updates. To produce the final results, the original and extra adapted weights are combined. You could also merge the adapter weights with the base model to eliminate inference latency.

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_animated.gif"/>
</div>

This approach has a number of advantages:

* LoRA makes finetuning more efficient by drastically reducing the number of trainable parameters.
* The original pretrained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
* LoRA is orthogonal to other parameter-efficient methods and can be combined with many of them.
* Performance of models finetuned using LoRA is comparable to the performance of fully finetuned models.

In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable parameters. However, for simplicity and further parameter efficiency, LoRA is typically only applied to the attention blocks in Transformer models. The resulting number of trainable parameters in a LoRA model depends on the size of the update matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix.

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation</a></small>

## Mixture of LoRA Experts (X-LoRA)

[X-LoRA](https://huggingface.co/papers/2402.07148) is a mixture of experts method for LoRA which works by using dense or sparse gating to dynamically activate LoRA experts. The LoRA experts as well as the base model are frozen during training, resulting in a low parameter count as only the gating layers must be trained. In particular, the gating layers output scalings which (depending on config) are granular on the layer and token level. Additionally, during inference, X-LoRA dynamically activates LoRA adapters to recall knowledge and effectively mix them:

The below graphic demonstrates how the scalings change for different prompts for each token. This highlights the activation of different adapters as the generation progresses and the sequence creates new context.

![Token-by-token scalings](https://github.com/EricLBuehler/xlora/raw/master/res/token_by_token_scalings.gif)

For each step, X-LoRA requires the base model to be run twice: first, to get hidden states without any LoRA adapters, and secondly, the hidden states are used to calculate scalings which are applied to the LoRA adapters and the model is run a second time. The output of the second run is the result of the model step.

Ultimately, X-LoRA allows the model to reflect upon its knowledge because of the dual forward pass scheme, and dynamically reconfigure the architecture.

## Low-Rank Hadamard Product (LoHa)

Low-rank decomposition can impact performance because the weight updates are limited to the low-rank space, which can constrain a model's expressiveness. However, you don't necessarily want to use a larger rank because it increases the number of trainable parameters. To address this, [LoHa](https://huggingface.co/papers/2108.06098) (a method originally developed for computer vision) was applied to diffusion models where the ability to generate diverse images is an important consideration. LoHa should also work with general model types, but the embedding layers aren't currently implemented in PEFT.

LoHa uses the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) (element-wise product) instead of the matrix product. ∆W is represented by four smaller matrices instead of two - like in LoRA - and each pair of these low-rank matrices are combined with the Hadamard product. As a result, ∆W can have the same number of trainable parameters but a higher rank and expressivity.

## Low-Rank Kronecker Product (LoKr)

[LoKr](https://hf.co/papers/2309.14859) is very similar to LoRA and LoHa, and it is also mainly applied to diffusion models, though you could also use it with other model types. LoKr replaces the matrix product with the [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product) instead. The Kronecker product decomposition creates a block matrix which preserves the rank of the original weight matrix. Another benefit of the Kronecker product is that it can be vectorized by stacking the matrix columns. This can speed up the process because you're avoiding fully reconstructing ∆W.

## Orthogonal Finetuning (OFT)

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/oft.png"/>
</div>
<small><a href="https://hf.co/papers/2306.07280">Controlling Text-to-Image Diffusion by Orthogonal Finetuning</a></small>

[OFT](https://hf.co/papers/2306.07280) is a method that primarily focuses on preserving a pretrained model's generative performance in the finetuned model. It tries to maintain the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer because this better captures the semantic information among neurons. This means OFT is more capable at preserving the subject and it is better for controllable generation (similar to [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet)).

OFT preserves the hyperspherical energy by learning an orthogonal transformation for neurons to keep the cosine similarity between them unchanged. In practice, this means taking the matrix product of an orthogonal matrix with the pretrained weight matrix. However, to be parameter-efficient, the orthogonal matrix is represented as a block-diagonal matrix with rank `r` blocks. Whereas LoRA reduces the number of trainable parameters with low-rank structures, OFT reduces the number of trainable parameters with a sparse block-diagonal matrix structure.

## Orthogonal Butterfly (BOFT)

[BOFT](https://hf.co/papers/2311.06243) is an improved orthogonal finetuning method that focuses on preserving a pretrained model's generative capabilities while being significantly more parameter-efficient than standard OFT. Like OFT, BOFT maintains the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer by applying an orthogonal transformation to the pretrained weight matrix, ensuring the semantic relationships among neurons are preserved.

Instead of using a block-diagonal orthogonal matrix, BOFT factorizes the orthogonal transformation into a product of **sparse butterfly matrices** (originally introduced in the [Cooley–Tukey FFT](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)). Unlike OFT's block-diagonal rotations, which only mix inputs within each block, the butterfly structure guarantees that every input can influence every output, producing a **dense connectivity** with just `O(d log d)` parameters. This factorization preserves expressivity while drastically reducing the parameter count compared to OFT (at the expense of computation time).

In practice, BOFT multiplies each pretrained weight matrix by a sequence of butterfly-structured orthogonal factors, enabling efficient and expressive neuron rotations. This makes BOFT well-suited for controllable generation and tasks where maintaining the pretrained model's subject representation is critical, while also scaling to larger models with lower memory and compute overhead.

## Adaptive Low-Rank Adaptation (AdaLoRA)

[AdaLoRA](https://hf.co/papers/2303.10512) manages the parameter budget introduced from LoRA by allocating more parameters - in other words, a higher rank `r` - for important weight matrices that are better adapted for a task and pruning less important ones. The rank is controlled by a method similar to singular value decomposition (SVD). The ∆W is parameterized with two orthogonal matrices and a diagonal matrix which contains singular values. This parametrization method avoids iteratively applying SVD which is computationally expensive. Based on this method, the rank of ∆W is adjusted according to an importance score. ∆W is divided into triplets and each triplet is scored according to its contribution to model performance. Triplets with low importance scores are pruned and triplets with high importance scores are kept for finetuning.

Training with AdaLoRA has three phases: the init phase, the budgeting phase and the final phase. In the initial phase, no budgeting is applied, therefore the ranks are not touched. During the budgeting phase the process described above is applied and the rank is redistributed according to a budget, aiming to give more important adapters more rank and less important layers less. When reaching the final phase, budgeting has ended, the ranks are redistributed but we may continue training for a while with the redistributed ranks to further improve performance.

## Llama-Adapter

[Llama-Adapter](https://hf.co/papers/2303.16199) is a method for adapting Llama into an instruction-following model. To help adapt the model for instruction-following, the adapter is trained with a 52K instruction-output dataset.

A set of learnable adaption prompts are prefixed to the input instruction tokens. These are inserted into the upper layers of the model because it is better to learn with the higher-level semantics of the pretrained model. The instruction-output tokens prefixed to the input guide the adaption prompt to generate a contextual response.

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/llama-adapter.png"/>
</div>
<small><a href="https://hf.co/papers/2303.16199">LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention</a></small>

To avoid adding noise to the tokens, the adapter uses zero-initialized attention. On top of this, the adapter adds a learnable gating factor (initialized with zeros) to progressively add information to the model during training. This prevents overwhelming the model's pretrained knowledge with the newly learned instructions.

## Householder Reflection Adaptation (HRA)

[HRA](https://huggingface.co/papers/2405.17484) provides a new perspective connecting LoRA to OFT, which means it can harness the advantages of both strategies, reduce parameters and computation costs while penalizing the loss of pre-training knowledge. 

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/hra.png"/>
</div>
<small><a href="https://huggingface.co/papers/2405.17484">Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation</a></small>

HRA constructs a chain of `r` trainable Householder reflections (HRs). Because the Householder reflection matrix is an orthogonal matrix and the product of orthogonal matrices is also an orthogonal matrix, HRA satisfies the theoretical guarantee of Orthogonal Finetuning (OFT). Meanwhile, HRA can also be viewed as a low-rank fine-tuning adapter by rewriting formula. 

The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight \\(\lambda\\) can control the strength of the regularizer. 

## Bone

Bone was deprecated and removed in PEFT v0.19.0 in favor of [MiSS](https://huggingface.co/papers/2409.15371) (new version of paper: "MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing"). If you already have a Bone checkpoint, you can use `/scripts/convert-bone-to-miss.py` to convert it into a MiSS checkpoint and proceed with training using MiSS.

## MiSS
[MiSS](https://huggingface.co/papers/2409.15371) MiSS (Matrix Shard Sharing) is a novel Parameter-Efficient Fine-Tuning (PEFT) method designed to address the trade-off between adaptability and efficiency in Large Language Models. The core approach of MiSS involves a simple shard-sharing mechanism. It achieves low-rank adaptation by decomposing a weight matrix into multiple fragments and then utilizing a shared, trainable "common fragment." The final low-rank update matrix is constructed by replicating these shared, partitioned shards. (MiSS is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.)

<small><a href="https://huggingface.co/papers/2409.15371">MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing</a></small>

Intuitively, the shape of a single trainable matrix in MiSS is consistent with `lora_B`, so the `r` parameter in MiSS is less than the `r` in LoRA by (`in_feature * r`).

Note: Bat's r (b) is special and requires that weight W satisfies the conditions `in_features % r == 0` and `out_features % r == 0`. Additionally, when `in_features == out_features` and MiSS-r equals LoRA-r, MiSS's number of trainable parameters is only half that of LoRA.

Although the nonlinear updates of Bat bring some performance improvements, they also increase computational overhead. Its main purpose is to provide researchers with a direction for improvement. Therefore, we recommend fine-tuning the comprehensive MiSS model instead.


================================================
FILE: docs/source/conceptual_guides/ia3.md
================================================
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# IA3 

This conceptual guide gives a brief overview of [IA3](https://huggingface.co/papers/2205.05638), a parameter-efficient fine tuning technique that is 
intended to improve over [LoRA](./lora).

To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) 
rescales inner activations with learned vectors. These learned vectors are injected in the attention and feedforward modules 
in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original 
weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA)
keeps the number of trainable parameters much smaller. 

Being similar to LoRA, IA3 carries many of the same advantages: 

* IA3 makes fine-tuning more efficient by drastically reducing the number of trainable parameters. (For T0, an IA3 model only has about 0.01% trainable parameters, while even LoRA has > 0.1%)
* The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable IA3 models for various downstream tasks built on top of them.
* Performance of models fine-tuned using IA3 is comparable to the performance of fully fine-tuned models.
* IA3 does not add any inference latency because adapter weights can be merged with the base model.

In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
of a Transformer model. To be specific, for transformer models, IA3 weights are added to the outputs of key and value layers, and to the input of the second feedforward layer
in each transformer block.

Given the target layers for injecting IA3 parameters, the number of trainable parameters
can be determined based on the size of the weight matrices.


## Common IA3 parameters in PEFT

As with other methods supported by PEFT, to fine-tune a model using IA3, you need to:

1. Instantiate a base model.
2. Create a configuration (`IA3Config`) where you define IA3-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.

`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:

- `target_modules`: The modules (for example, attention blocks) to apply the IA3 vectors.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers. Note that `feedforward_modules` must be a subset of `target_modules`.
- `modules_to_save`: List of modules apart from IA3 layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.

## Example Usage

For the task of sequence classification, one can initialize the IA3 config for a Llama model as follows:

```py
peft_config = IA3Config(
    task_type=TaskType.SEQ_CLS, target_modules=["k_proj", "v_proj", "down_proj"], feedforward_modules=["down_proj"]
)
```

================================================
FILE: docs/source/conceptual_guides/oft.md
================================================
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Orthogonal Finetuning (OFT and BOFT) 

This conceptual guide gives a brief overview of [OFT](https://huggingface.co/papers/2306.07280), [OFTv2](https://huggingface.co/papers/2506.19847) and [BOFT](https://huggingface.co/papers/2311.06243), a parameter-efficient fine-tuning technique that utilizes orthogonal matrix to multiplicatively transform the pretrained weight matrices.

To achieve efficient fine-tuning, OFT represents the weight updates with an orthogonal transformation. The orthogonal transformation is parameterized by an orthogonal matrix multiplied to the pretrained weight matrix. These new matrices can be trained to adapt to the new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesn't receive any further adjustments. To produce the final results, both the original and the adapted weights are multiplied togethor.

Orthogonal Butterfly (BOFT) generalizes OFT with Butterfly factorization and further improves its parameter efficiency and finetuning flexibility. In short, OFT can be viewed as a special case of BOFT. Different from LoRA that uses additive low-rank weight updates, BOFT uses multiplicative orthogonal weight updates. The comparison is shown below.

<div class="flex justify-center">
    <img src="https://raw.githubusercontent.com/wy1iu/butterfly-oft/main/assets/BOFT_comparison.png"/>
</div>


BOFT has some advantages compared to LoRA: 

* BOFT proposes a simple yet generic way to finetune pretrained models to downstream tasks, yielding a better preservation of pretraining knowledge and a better parameter efficiency.
* Through the orthogonality, BOFT introduces a structural constraint, i.e., keeping the [hyperspherical energy](https://huggingface.co/papers/1805.09298) unchanged during finetuning. This can effectively reduce the forgetting of pretraining knowledge.
* BOFT uses the butterfly factorization to efficiently parameterize the orthogonal matrix, which yields a compact yet expressive learning space (i.e., hypothesis class).
* The sparse matrix decomposition in BOFT brings in additional inductive biases that are beneficial to generalization.

In principle, BOFT can be applied to any subset of weight matrices in a neural network to reduce the number of trainable parameters. Given the target layers for injecting BOFT parameters, the number of trainable parameters can be determined based on the size of the weight matrices.

## Merge OFT/BOFT weights into the base model

Similar to LoRA, the weights learned by OFT/BOFT can be integrated into the pretrained weight matrices using the merge_and_unload() function. This function merges the adapter weights with the base model which allows you to effectively use the newly merged model as a standalone model.

<div class="flex justify-center">
    <img src="https://raw.githubusercontent.com/wy1iu/butterfly-oft/main/assets/boft_merge.png"/>
</div>

This works because during training, the orthogonal weight matrix (R in the diagram above) and the pretrained weight matrices are separate. But once training is complete, these weights can actually be merged (multiplied) into a new weight matrix that is equivalent.

## Utils for OFT / BOFT

### Common OFT / BOFT parameters in PEFT

As with other methods supported by PEFT, to fine-tune a model using OFT or BOFT, you need to:

1. Instantiate a base model.
2. Create a configuration (`OFTConfig` or `BOFTConfig`) where you define OFT/BOFT-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.


### OFT-specific parameters

`OFTConfig` allows you to control how OFT is applied to the base model through the following parameters:

- `r`: OFT rank, number of OFT blocks per injected layer. **Bigger** `r` results in more sparse update matrices with **fewer** trainable paramters. **Note**: You can only specify either `r` or `oft_block_size`, but not both simultaneously, because `r` × `oft_block_size` = layer dimension. For simplicity, we let the user speficy either `r` or `oft_block_size` and infer the other one. Default set to `r = 0`, the user is advised to set the `oft_block_size` instead for better clarity.
- `oft_block_size`: OFT block size across different layers. **Bigger** `oft_block_size` results in more dense update matrices with **more** trainable parameters. **Note**: Please choose `oft_block_size` to be divisible by layer's input dimension (`in_features`), e.g., 4, 8, 16. You can only specify either `r` or `oft_block_size`, but not both simultaneously, because `r` × `oft_block_size` = layer dimension. For simplicity, we let the user speficy either `r` or `oft_block_size` and infer the other one. Default set to `oft_block_size = 32`. 
- `use_cayley_neumann`: Specifies whether to use the Cayley-Neumann parameterization (efficient but approximate) or the vanilla Cayley parameterization (exact but computationally expensive because of matrix inverse). We recommend to set it to `True` for better efficiency, but performance may be slightly worse because of the approximation error. Please test both settings (`True` and `False`) depending on your needs. Default is `False`.
- `module_dropout`: The multiplicative dropout probability, by setting OFT blocks to identity during training, similar to the dropout layer in LoRA.
- `bias`: specify if the `bias` parameters should be trained. Can be `"none"`, `"all"` or `"oft_only"`.
- `target_modules`: The modules (for example, attention blocks) to inject the OFT matrices.
- `modules_to_save`: List of modules apart from OFT matrices to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.

### BOFT-specific parameters

`BOFTConfig` allows you to control how BOFT is applied to the base model through the following parameters:

- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. **Bigger** `boft_block_size` results in more dense update matrices with **more** trainable parameters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. **Bigger** `boft_block_num` result in sparser update matrices with **fewer** trainable parameters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
- `bias`: specify if the `bias` parameters should be trained. Can be `"none"`, `"all"` or `"boft_only"`.
- `boft_dropout`: specify the probability of multiplicative dropout.
- `target_modules`: The modules (for example, attention blocks) to inject the OFT/BOFT matrices.
- `modules_to_save`: List of modules apart from OFT/BOFT matrices to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.



## OFT Example Usage

For using OFT for quantized finetuning with [TRL](https://github.com/huggingface/trl) for `SFT`, `PPO`, or `DPO` fine-tuning, follow the following outline:

```py
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from trl import SFTTrainer
from peft import OFTConfig

if use_quantization:
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_storage=torch.bfloat16,
    )

model = AutoModelForCausalLM.from_pretrained(
    "model_name", 
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained("model_name")

# Configure OFT
peft_config = OFTConfig(
    oft_block_size=32,
    use_cayley_neumann=True,
    target_modules="all-linear",
    bias="none",
    task_type="CAUSAL_LM"
)

trainer = SFTTrainer(
    model=model,
    train_dataset=ds['train'],
    peft_config=peft_config,
    processing_class=tokenizer,
    args=training_arguments,
    data_collator=collator,
)

trainer.train()
```


## BOFT Example Usage

For an example of the BOFT method application to various downstream tasks, please refer to the following guides:

Take a look at the following step-by-step guides on how to finetune a model with BOFT:
- [Dreambooth finetuning with BOFT](https://github.com/huggingface/peft/blob/main/examples/boft_dreambooth/boft_dreambooth.md)
- [Controllable generation finetuning with BOFT (ControlNet)](https://github.com/huggingface/peft/blob/main/examples/boft_controlnet/boft_controlnet.md)

For the task of image classification, one can initialize the BOFT config for a DinoV2 model as follows:

```py
import transformers
from transformers import AutoModelForSeq2SeqLM, BOFTConfig
from peft import BOFTConfig, get_peft_model

config = BOFTConfig(
    boft_block_size=4,
    boft_n_butterfly_factor=2,
    target_modules=["query", "value", "key", "output.dense", "mlp.fc1", "mlp.fc2"],
    boft_dropout=0.1,
    bias="boft_only",
    modules_to_save=["classifier"],
)

model = transformers.Dinov2ForImageClassification.from_pretrained(
    "facebook/dinov2-large",
    num_labels=100,
)

boft_model = get_peft_model(model, config)
```


================================================
FILE: docs/source/conceptual_guides/prompting.md
================================================
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Soft prompts

Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as *prompting*. Prompting primes a frozen pretrained model for a specific downstream task by including a text prompt that describes the task or even demonstrates an example of the task. With prompting, you can avoid fully training a separate model for each downstream task, and use the same frozen pretrained model instead. This is a lot easier because you can use the same model for several different tasks, and it is significantly more efficient to train and store a smaller set of prompt parameters than to train all the model's parameters.

There are two categories of prompting methods:

- hard prompts are manually handcrafted text prompts with discrete input tokens; the downside is that it requires a lot of effort to create a good prompt
- soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren't human readable because you aren't matching these "virtual tokens" to the embeddings of a real word

This conceptual guide provides a brief overview of the soft prompt methods included in 🤗 PEFT: prompt tuning, prefix tuning, P-tuning, and multitask prompt tuning.

## Prompt tuning

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prompt-tuning.png"/>
</div>
<small>Only train and store a significantly smaller set of task-specific prompt parameters <a href="https://hf.co/papers/2104.08691">(image source)</a>.</small>

[Prompt tuning](https://hf.co/papers/2104.08691) was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are *generated*. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.

The key idea behind prompt tuning is that prompt tokens have their own parameters that are updated independently. This means you can keep the pretrained model's parameters frozen, and only update the gradients of the prompt token embeddings. The results are comparable to the traditional method of training the entire model, and prompt tuning performance scales as model size increases.

Take a look at [Prompt tuning for causal language modeling](../task_guides/clm-prompt-tuning) for a step-by-step guide on how to train a model with prompt tuning.

## Prefix tuning

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prefix-tuning.png"/>
</div>
<small>Optimize the prefix parameters for each task <a href="https://hf.co/papers/2101.00190">(image source)</a>.</small>

[Prefix tuning](https://hf.co/papers/2101.00190) was designed for natural language generation (NLG) tasks on GPT models. It is very similar to prompt tuning; prefix tuning also prepends a sequence of task-specific vectors to the input that can be trained and updated while keeping the rest of the pretrained model's parameters frozen. 

The main difference is that the prefix parameters are inserted in **all** of the model layers, whereas prompt tuning only adds the prompt parameters to the model input embeddings. The prefix parameters are also optimized by a separate feed-forward network (FFN) instead of training directly on the soft prompts because it causes instability and hurts performance. The FFN is discarded after updating the soft prompts.

As a result, the authors found that prefix tuning demonstrates comparable performance to fully finetuning a model, despite having 1000x fewer parameters, and it performs even better in low-data settings.

Take a look at [Prefix tuning for conditional generation](../task_guides/seq2seq-prefix-tuning) for a step-by-step guide on how to train a model with prefix tuning.

## P-tuning

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/p-tuning.png"/>
</div>
<small>Prompt tokens can be inserted anywhere in the input sequence, and they are optimized by a prompt encoder <a href="https://hf.co/papers/2103.10385">(image source)</a>.</small>

[P-tuning](https://hf.co/papers/2103.10385) is designed for natural language understanding (NLU) tasks and all language models. 
It is another variation of a soft prompt method; P-tuning also adds a trainable embedding tensor that can be optimized to find better prompts, and it uses a prompt encoder (a bidirectional long-short term memory network or LSTM) to optimize the prompt parameters. Unlike prefix tuning though:

- the prompt tokens can be inserted anywhere in the input sequence, and it isn't restricted to only the beginning
- the prompt tokens are only added to the input instead of adding them to every layer of the model
- introducing *anchor* tokens can improve performance because they indicate characteristics of a component in the input sequence

The results suggest that P-tuning is more efficient than manually crafting prompts, and it enables GPT-like models to compete with BERT-like models on NLU tasks.

Take a look at [P-tuning for sequence classification](../task_guides/ptuning-seq-classification) for a step-by-step guide on how to train a model with P-tuning.

## Multitask prompt tuning

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt.png"/>
</div>
<small><a href="https://hf.co/papers/2303.02861">Multitask prompt tuning enables parameter-efficient transfer learning</a>.</small>

[Multitask prompt tuning (MPT)](https://hf.co/papers/2303.02861) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:

1. source training - for each task, its soft prompt is decomposed into task-specific vectors. The task-specific vectors are multiplied together to form another matrix W, and the Hadamard product is used between W and a shared prompt matrix P to generate a task-specific prompt matrix. The task-specific prompts are distilled into a single prompt matrix that is shared across all tasks. This prompt is trained with multitask training.
2. target adaptation - to adapt the single prompt for a target task, a target prompt is initialized and expressed as the Hadamard product of the shared prompt matrix and the task-specific low-rank prompt matrix.

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt-decomposition.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Prompt decomposition</a>.</small>


## Context-Aware Prompt Tuning (CPT)

<div class="flex justify-center">
    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/cpt.png"/>
</div>
<small>CPT optimizing only specific token embeddings while keeping the rest of the model frozen <a href="https://huggingface.co/papers/2410.17222">(image source)</a>.</small>

[Context-Aware Prompt Tuning (CPT)](https://huggingface.co/papers/2410.17222) is designed to enhance few-shot classification by refining only context embeddings. 
This approach combines ideas from In-Context Learning (ICL), Prompt Tuning (PT), and adversarial optimization, focusing on making model adaptation both parameter-efficient and effective.
In CPT, only specific context token embeddings are optimized, while the rest of the model remains frozen. 
To prevent overfitting and maintain stability, CPT uses controlled perturbations to limit the allowed changes to context embeddings within a defined range. 
Additionally, to address the phenomenon of recency bias—where examples near the end of the context tend to be prioritized over earlier ones—CPT applies a decay loss factor.

Take a look at [Example](https://github.com/huggingface/peft/blob/main/examples/cpt_finetuning/README.md) for a step-by-step guide on how to train a model with CPT.


================================================
FILE: docs/source/developer_guides/checkpoint.md
================================================
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# PEFT checkpoint format

This document describes how PEFT's checkpoint files are structured and how to convert between the PEFT format and other formats.

## PEFT files

PEFT (parameter-efficient fine-tuning) methods only update a small subset of a model's parameters rather than all of them. This is nice because checkpoint files can generally be much smaller than the original model files and are easier to store and share. However, this also means that to load a PEFT model, you need to have the original model available as well.

When you call [`~PeftModel.save_pretrained`] on a PEFT model, the PEFT model saves three files, described below:

1. `adapter_model.safetensors` or `adapter_model.bin`

By default, the model is saved in the `safetensors` format, a secure alternative to the `bin` format, which is known to be susceptible to [security vulnerabilities](https://huggingface.co/docs/hub/security-pickle) because it uses the pickle utility under the hood. Both formats store the same `state_dict` though, and are interchangeable.

The `state_dict` only contains the parameters of the adapter module, not the base model. To illustrate the difference in size, a normal BERT model requires ~420MB of disk space, whereas an IA³ adapter on top of this BERT model only requires ~260KB.

2. `adapter_config.json`

The `adapter_config.json` file contains the configuration of the adapter module, which is necessary to load the model. Below is an example of an `adapter_config.json` for an IA³ adapter with standard settings applied to a BERT model:

```json
{
  "auto_mapping": {
    "base_model_class": "BertModel",
    "parent_library": "transformers.models.bert.modeling_bert"
  },
  "base_model_name_or_path": "bert-base-uncased",
  "fan_in_fan_out": false,
  "feedforward_modules": [
    "output.dense"
  ],
  "inference_mode": true,
  "init_ia3_weights": true,
  "modules_to_save": null,
  "peft_type": "IA3",
  "revision": null,
  "target_modules": [
    "key",
    "value",
    "output.dense"
  ],
  "task_type": null
}
```

The configuration file contains:

- the adapter module type stored, `"peft_type": "IA3"`
- information about the base model like `"base_model_name_or_path": "bert-base-uncased"`
- the revision of the model (if any), `"revision": null`

If the base model is not a pretrained Transformers model, the latter two entries will be `null`. Other than that, the settings are all related to the specific IA³ adapter that was used to fine-tune the model.

3. `README.md`

The generated `README.md` is the model card of a PEFT model and contains a few pre-filled entries. The intent of this is to make it easier to share the model with others and to provide some basic information about the model. This file is not needed to load the model.

## Convert to PEFT format

When converting from another format to the PEFT format, we require both the `adapter_model.safetensors` (or `adapter_model.bin`) file and the `adapter_config.json` file.

### adapter_model

For the model weights, it is important to use the correct mapping from parameter name to value for PEFT to load the file. Getting this mapping right is an exercise in checking the implementation details, as there is no generally agreed upon format for PEFT adapters.

Fortunately, figuring out this mapping is not overly complicated for common base cases. Let's look at a concrete example, the [`LoraLayer`](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py):

```python
# showing only part of the code

class LoraLayer(BaseTunerLayer):
    # All names of layers that may contain (trainable) adapter weights
    adapter_layer_names = ("lora_A", "lora_B", "lora_embedding_A", "lora_embedding_B")
    # All names of other parameters that may contain adapter-related parameters
    other_param_names = ("r", "lora_alpha", "scaling", "lora_dropout")

    def __init__(self, base_layer: nn.Module, **kwargs) -> None:
        self.base_layer = base_layer
        self.r = {}
        self.lora_alpha = {}
        self.scaling = {}
        self.lora_dropout = nn.ModuleDict({})
        self.lora_A = nn.ModuleDict({})
        self.lora_B = nn.ModuleDict({})
        # For Embedding layer
        self.lora_embedding_A = nn.ParameterDict({})
        self.lora_embedding_B = nn.ParameterDict({})
        # Mark the weight as unmerged
        self._disable_adapters = False
        self.merged_adapters = []
        self.use_dora: dict[str, bool] = {}
        self.lora_magnitude_vector: Optional[torch.nn.ParameterDict] = None  # for DoRA
        self._caches: dict[str, Any] = {}
        self.kwargs = kwargs
```

In the `__init__` code used by all `LoraLayer` classes in PEFT, there are a bunch of parameters used to initialize the model, but only a few are relevant for the checkpoint file: `lora_A`, `lora_B`, `lora_embedding_A`, and `lora_embedding_B`. These parameters are listed in the class attribute `adapter_layer_names` and contain the learnable parameters, so they must be included in the checkpoint file. All the other parameters, like the rank `r`, are derived from the `adapter_config.json` and must be included there (unless the default value is used).

Let's check the `state_dict` of a PEFT LoRA model applied to BERT. When printing the first five keys using the default LoRA settings (the remaining keys are the same, just with different layer numbers), we get:

- `base_model.model.encoder.layer.0.attention.self.query.lora_A.weight` 
- `base_model.model.encoder.layer.0.attention.self.query.lora_B.weight` 
- `base_model.model.encoder.layer.0.attention.self.value.lora_A.weight` 
- `base_model.model.encoder.layer.0.attention.self.value.lora_B.weight` 
- `base_model.model.encoder.layer.1.attention.self.query.lora_A.weight`
- etc.

Let's break this down:

- By default, for BERT models, LoRA is applied to the `query` and `value` layers of the attention module. This is why you see `attention.self.query` and `attention.self.value` in the key names for each layer.
- LoRA decomposes the weights into two low-rank matrices, `lora_A` and `lora_B`. This is where `lora_A` and `lora_B` come from in the key names.
- These LoRA matrices are implemented as `nn.Linear` layers, so the parameters are stored in the `.weight` attribute (`lora_A.weight`, `lora_B.weight`).
- By default, LoRA isn't applied to BERT's embedding layer, so there are _no entries_ for `lora_A_embedding` and `lora_B_embedding`.
- The keys of the `state_dict` always start with `"base_model.model."`. The reason is that, in PEFT, we wrap the base model inside a tuner-specific model (`LoraModel` in this case), which itself is wrapped in a general PEFT model (`PeftModel`). For this reason, these two prefixes are added to the keys. When converting to the PEFT format, it is required to add these prefixes.

> [!TIP]
> This last point is not true for prefix tuning techniques like prompt tuning. There, the extra embeddings are directly stored in the `state_dict` without any prefixes added to the keys.

When inspecting the parameter names in the loaded model, you might be surprised to find that they look a bit different, e.g. `base_model.model.encoder.layer.0.attention.self.query.lora_A.default.weight`. The difference is the *`.default`* part in the second to last segment. This part exists because PEFT generally allows the addition of multiple adapters at once (using an `nn.ModuleDict` or `nn.ParameterDict` to store them). For example, if you add another adapter called "other", the key for that adapter would be `base_model.model.encoder.layer.0.attention.self.query.lora_A.other.weight`.

When you call [`~PeftModel.save_pretrained`], the adapter name is stripped from the keys. The reason is that the adapter name is not an important part of the model architecture; it is just an arbitrary name. When loading the adapter, you could choose a totally different name, and the model would still work the same way. This is why the adapter name is not stored in the checkpoint file.

> [!TIP]
> If you call `save_pretrained("some/path")` and the adapter name is not `"default"`, the adapter is stored in a sub-directory with the same name as the adapter. So if the name is "other", it would be stored inside of `some/path/other`.

In some circumstances, deciding which values to add to the checkpoint file can become a bit more complicated. For example, in PEFT, DoRA is implemented as a special case of LoRA. If you want to convert a DoRA model to PEFT, you should create a LoRA checkpoint with extra entries for DoRA. You can see this in the `__init__` of the previous `LoraLayer` code:

```python
self.lora_magnitude_vector: Optional[torch.nn.ParameterDict] = None  # for DoRA
```

This indicates that there is an optional extra parameter per layer for DoRA.

### adapter_config

All the other information needed to load a PEFT model is contained in the `adapter_config.json` file. Let's check this file for a LoRA model applied to BERT:

```json
{
  "alpha_pattern": {},
  "auto_mapping": {
    "base_model_class": "BertModel",
    "parent_library": "transformers.models.bert.modeling_bert"
  },
  "base_model_name_or_path": "bert-base-uncased",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 8,
  "lora_dropout": 0.0,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 8,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "query",
    "value"
  ],
  "task_type": null,
  "use_dora": false,
  "use_rslora": false
}
```

This contains a lot of entries, and at first glance, it could feel overwhelming to figure out all the right values to put in there. However, most of the entries are not necessary to load the model. This is either because they use the default values and don't need to be added or because they only affect the initialization of the LoRA weights, which is irrelevant when it comes to loading the model. If you find that you don't know what a specific parameter does, e.g., `"use_rslora",` don't add it, and you should be fine. Also note that as more options are added, this file will get more entries in the future, but it should be backward compatible.

At the minimum, you should include the following entries:

```json
{
  "target_modules": ["query", "value"],
  "peft_type": "LORA"
}
```

However, adding as many entries as possible, like the rank `r` or the `base_model_name_or_path` (if it's a Transformers model) is recommended. This information can help others understand the model better and share it more easily. To check which keys and values are expected, check out the [config.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/config.py) file (as an example, this is the config file for LoRA) in the PEFT source code.

## Model storage

In some circumstances, you might want to store the whole PEFT model, including the base weights. This can be necessary if, for instance, the base model is not available to the users trying to load the PEFT model. You can merge the weights first or convert it into a Transformer model.

### Merge the weights

The most straightforward way to store the whole PEFT model is to merge the adapter weights into the base weights:

```python
merged_model = model.merge_and_unload()
merged_model.save_pretrained(...)
```

There are some disadvantages to this approach, though:

- Once [`~LoraModel.merge_and_unload`] is called, you get a basic model without any PEFT-specific functionality. This means you can't use any of the PEFT-specific methods anymore.
- You cannot unmerge the weights, load multiple adapters at once, disable the adapter, etc.
- Not all PEFT methods support merging weights.
- Some PEFT methods may generally allow merging, but not with specific settings (e.g. when using certain quantization techniques).
- The whole model will be much larger than the PEFT model, as it will contain all the base weights as well.

But inference with a merged model should be a bit faster.

### Convert to a Transformers model

Another way to save the whole model, assuming the base model is a Transformers model, is to use this hacky approach to directly insert the PEFT weights into the base model and save it, which only works if you "trick" Transformers into believing the PEFT model is not a PEFT model. This only works with LoRA because other adapters are not implemented in Transformers.

```python
model = ...  # the PEFT model
...
# after you finish training the model, save it in a temporary location
model.save_pretrained(<temp_location>)
# now load this model directly into a transformers model, without the PEFT wrapper
# the PEFT weights are directly injected into the base model
model_loaded = AutoModel.from_pretrained(<temp_location>)
# now make the loaded model believe that it is _not_ a PEFT model
model_loaded._hf_peft_config_loaded = False
# now when we save it, it will save the whole model
model_loaded.save_pretrained(<final_location>)
# or upload to Hugging Face Hub
model_loaded.push_to_hub(<final_location>)
```



================================================
FILE: docs/source/developer_guides/contributing.md
================================================
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Contribute to PEFT

We are happy to accept contributions to PEFT. If you plan to contribute, please read this to make the process as smooth as possible.

## Installation

For code contributions to PEFT, you should choose the ["source"](../install#source) installation method.

If you are new to creating a pull request, follow the [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) guide by GitHub.

## Tests and code quality checks

Regardless of the contribution type (unless it’s only about the docs), you should run tests and code quality checks before creating a PR to ensure your contribution doesn’t break anything and follows the project standards.

We provide a Makefile to execute the necessary tests. Run the code below for the unit test:

```sh
make test
```

Run one of the following to either only check or check and fix code quality and style:

```sh
make quality  # just check
make style  # check and fix
```

You can also set up [`pre-commit`](https://pre-commit.com/) to run these fixes
automatically as Git commit hooks.

```bash
$ pip install pre-commit
$ pre-commit install
```

Running all the tests can take a while, so during development it can be more efficient to only [run tests specific to your change](https://docs.pytest.org/en/6.2.x/usage.html#specifying-tests-selecting-tests), e.g. via:

```sh
pytest tests/<test-file-name> -k <name-of-test>
```

This should finish much quicker and allow for faster iteration.

If your change is specific to a hardware setting (e.g., it requires CUDA), take a look at [tests/test_gpu_examples.py](https://github.com/huggingface/peft/blob/1c1c7fdaa6e6abaa53939b865dee1eded82ad032/tests/test_gpu_examples.py) and [tests/test_common_gpu.py](https://github.com/huggingface/peft/blob/1c1c7fdaa6e6abaa53939b865dee1eded82ad032/tests/test_common_gpu.py) to see if it makes sense to add tests there. If your change could have an effect on saving and loading models, please run the tests with the `--regression` flag to trigger regression tests.

It can happen that while you’re working on your PR, the underlying code base changes due to other changes being merged. If that happens – especially when there is a merge conflict – please update your branch with the latest changes. This can be a merge or a rebase, and we'll squash and merge the PR once it’s ready. If possible, avoid force pushes to make reviews easier.

## PR description

When opening a PR, please provide a nice description of the change you're proposing. If it relates to other issues or PRs, please reference them. Providing a good description not only helps the reviewers review your code better and faster, it can also be used later (as a basis) for the commit message which helps with long term maintenance of the project.

If your code makes some non-trivial changes, it may also be a good idea to add comments to the code to explain those changes. For example, if you had to iterate on your implementation multiple times because the most obvious way didn’t work, it’s a good indication that a code comment is needed.

## Bugfixes

Please give a description of the circumstances that led to the bug. If there is an existing issue, please link to it (e.g., “Resolves #12345”).

Ideally when a bugfix is provided, it should be accompanied by a test for the bug. The test should fail with the current code and pass with the bugfix. Add a comment to the test that references the issue or PR. Without a test, it is more difficult to prevent regressions in the future.

## Documentation improvements

We are happy to have fixes for broken links and missing or unclear documentation. Taking care of examples, making
sure that they are up-to-date and running fine in this fast moving environment is also highly appreciated.

Please refrain from sending pull requests that *only* correct typing errors as these generally create more work
than they safe. Such changes are better combined with more substantial fixes (such as fixing broken links or
extending/updating documentation).

## Add a new fine-tuning method

New parameter-efficient fine-tuning methods are developed all the time. If you would like to add a new and promising method to PEFT, please follow these steps.

1. Before you start to implement the new method, please open a [GitHub issue](https://github.com/huggingface/peft/issues) with your proposal. This way, the maintainers can give you some early feedback.
2. Please add a link to the source (usually a paper) of the method. The paper should be in a final state to avoid changing requirements during development (e.g. due to reviewer feedback).
3. When implementing the method, it makes sense to look for existing implementations that already exist as a guide. Moreover, when you structure your code, please take inspiration from the other PEFT methods. For example, if your method is similar to LoRA, it makes sense to structure your code similarly or even reuse some functions or classes where it makes sense (some code duplication is okay, but don’t overdo it).
4. Ideally, in addition to the implementation of the new method, there should also be
   - [examples](https://github.com/huggingface/peft/tree/main/examples) (notebooks, scripts)
   - [documentation](https://github.com/huggingface/peft/tree/main/docs/source)
   - [extensive test suite](https://github.com/huggingface/peft/tree/main/tests) that proves the method correctly integrates with PEFT
   - [experimental setup](https://github.com/huggingface/peft/tree/main/method_comparison#creating-new-experiments) to run benchmarks
5. Once you have something that seems to be working, don’t hesitate to create a draft PR even if it’s not in a mergeable state yet. The maintainers are happy to give you feedback and guidance along the way.

## Add other features

It is best if you first open an issue on GitHub with a proposal to add the new feature. This way, you can discuss with the maintainers if it makes sense to add the feature before spending too much time on implementing it.

New features should generally be accompanied by tests and documentation or examples. Without the latter, users will have a hard time discovering your cool new feature.

Changes to the code should be implemented in a backward-compatible way. For example, existing code should continue to work the same way after the feature is merged.


================================================
FILE: docs/source/developer_guides/custom_models.md
================================================
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Custom models

Some fine-tuning techniques, such as prompt tuning, are specific to language models. That means in 🤗 PEFT, it is
assumed a 🤗 Transformers model is being used. However, other fine-tuning techniques - like
[LoRA](../conceptual_guides/lora) - are not restricted to specific model types.

In this guide, we will see how LoRA can be applied to a multilayer perceptron, a computer vision model from the [timm](https://huggingface.co/docs/timm/index) library, or a new 🤗 Transformers architecture.

## Multilayer perceptron

Let's assume that we want to fine-tune a multilayer perceptron with LoRA. Here is the definition:

```python
from torch import nn


class MLP(nn.Module):
    def __init__(self, num_units_hidden=2000):
        super().__init__()
        self.seq = nn.Sequential(
            nn.Linear(20, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, num_units_hidden),
            nn.ReLU(),
            nn.Linear(num_units_hidden, 2),
            nn.LogSoftmax(dim=-1),
        )

    def forward(self, X):
        return self.seq(X)
```

This is a straightforward multilayer perceptron with an input layer, a hidden layer, and an output layer.

> [!TIP]
> For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains
> from PEFT, but those gains are in line with more realistic examples.

There are a few linear layers in this model that could be tuned with LoRA. When working with common 🤗 Transformers
models, PEFT will know which layers to apply LoRA to, but in this case, it is up to us as a user to choose the layers.
To determine the names of the layers to tune:

```python
print([(n, type(m)) for n, m in MLP().named_modules()])
```

This should print:

```
[('', __main__.MLP),
 ('seq', torch.nn.modules.container.Sequential),
 ('seq.0', torch.nn.modules.linear.Linear),
 ('seq.1', torch.nn.modules.activation.ReLU),
 ('seq.2', torch.nn.modules.linear.Linear),
 ('seq.3', torch.nn.modules.activation.ReLU),
 ('seq.4', torch.nn.modules.linear.Linear),
 ('seq.5', torch.nn.modules.activation.LogSoftmax)]
```

Let's say we want to apply LoRA to the input layer and to the hidden layer, those are `'seq.0'` and `'seq.2'`. Moreover,
let's assume we want to update the output layer without LoRA, that would be `'seq.4'`. The corresponding config would
be:

```python
from peft import LoraConfig

config = LoraConfig(
    target_modules=["seq.0", "seq.2"],
    modules_to_save=["seq.4"],
)
```

With that, we can create our PEFT model and check the fraction of parameters trained:

```python
from peft import get_peft_model

model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922
```

Finally, we can use any training framework we like, or write our own fit loop, to train the `peft_model`.

For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/multilayer_perceptron/multilayer_perceptron_lora.ipynb).

## timm models

The [timm](https://huggingface.co/docs/timm/index) library contains a large number of pretrained computer vision models.
Those can also be fine-tuned with PEFT. Let's check out how this works in practice.

To start, ensure that timm is installed in the Python environment:

```bash
python -m pip install -U timm
```

Next we load a timm model for an image classification task:

```python
import timm

num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)
```

Again, we need to make a decision about what layers to apply LoRA to. Since LoRA supports 2D conv layers, and since
those are a major building block of this model, we should apply LoRA to the 2D conv layers. To identify the names of
those layers, let's look at all the layer names:

```python
print([(n, type(m)) for n, m in model.named_modules()])
```

This will print a very long list, we'll only show the first few:

```
[('', timm.models.metaformer.MetaFormer),
 ('stem', timm.models.metaformer.Stem),
 ('stem.conv', torch.nn.modules.conv.Conv2d),
 ('stem.norm', torch.nn.modules.linear.Identity),
 ('stages', torch.nn.modules.container.Sequential),
 ('stages.0', timm.models.metaformer.MetaFormerStage),
 ('stages.0.downsample', torch.nn.modules.linear.Identity),
 ('stages.0.blocks', torch.nn.modules.container.Sequential),
 ('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
 ('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
 ('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
 ('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
 ('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
 ('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
 ('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
 ('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
 ('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
 ('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
 ...
 ('head.global_pool.flatten', torch.nn.modules.linear.Identity),
 ('head.norm', timm.layers.norm.LayerNorm2d),
 ('head.flatten', torch.nn.modules.flatten.Flatten),
 ('head.drop', torch.nn.modules.linear.Identity),
 ('head.fc', torch.nn.modules.linear.Linear)]
 ]
```

Upon closer inspection, we see that the 2D conv layers have names such as `"stages.0.blocks.0.mlp.fc1"` and
`"stages.0.blocks.0.mlp.fc2"`. How can we match those layer names specifically? You can write a [regular
expressions](https://docs.python.org/3/library/re.html) to match the layer names. For our case, the regex
`r".*\.mlp\.fc\d"` should do the job.

Furthermore, as in the first example, we should ensure that the output layer, in this case the classification head, is
also updated. Looking at the end of the list printed above, we can see that it's named `'head.fc'`. With that in mind,
here is our LoRA config:

```python
config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])
```

Then we only need to create the PEFT model by passing our base model and the config to `get_peft_model`:

```python
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876
```

This shows us that we only need to train less than 2% of all parameters, which is a huge efficiency gain.

For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/image_classification/image_classification_timm_peft_lora.ipynb).

## New transformers architectures

When new popular transformers architectures are released, we do our best to quickly add them to PEFT. If you come across a transformers model that is not supported out of the box, don't worry, it will most likely still work if the config is set correctly. Specifically, you have to identify the layers that should be adapted and set them correctly when initializing the corresponding config class, e.g. `LoraConfig`. Here are some tips to help with this.

As a first step, it is a good idea to check the existing models for inspiration. You can find them inside of [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) in the PEFT repository. Often, you'll find a similar architecture that uses the same names. For example, if the new model architecture is a variation of the "mistral" model and you want to apply LoRA, you can see that the entry for "mistral" in `TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING` contains `["q_proj", "v_proj"]`. This tells you that for "mistral" models, the `target_modules` for LoRA should be `["q_proj", "v_proj"]`:

```python
from peft import LoraConfig, get_peft_model

my_mistral_model = ...
config = LoraConfig(
    target_modules=["q_proj", "v_proj"],
    ...,  # other LoRA arguments
)
peft_model = get_peft_model(my_mistral_model, config)
```

If that doesn't help, check the existing modules in your model architecture with the `named_modules` method and try to identify the attention layers, especially the key, query, and value layers. Those will often have names such as `c_attn`, `query`, `q_proj`, etc. The key layer is not always adapted, and ideally, you should check whether including it results in better performance.

Additionally, linear layers are common targets to be adapted (e.g. in [QLoRA paper](https://huggingface.co/papers/2305.14314), authors suggest to adapt them as well). Their names will often contain the strings `fc` or `dense`.

If you want to add a new model to PEFT, please create an entry in [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) and open a pull request on the [repository](https://github.com/huggingface/peft/pulls). Don't forget to update the [README](https://github.com/huggingface/peft#models-support-matrix) as well.

## Verify parameters and layers

You can verify whether you've correctly applied a PEFT method to your model in a few ways.

* Check the fraction of parameters that are trainable with the [`~PeftModel.print_trainable_parameters`] method. If this number is lower or higher than expected, check the model `repr` by printing the model. This shows the names of all the layer types in the model. Ensure that only the intended target layers are replaced by the adapter layers. For example, if LoRA is applied to `nn.Linear` layers, then you should only see `lora.Linear` layers being used.

```py
peft_model.print_trainable_parameters()
```

* Another way you can view the adapted layers is to use the `targeted_module_names` attribute to list the name of each module that was adapted.

```python
print(peft_model.targeted_module_names)
```

## Unsupported module types

Methods like LoRA only work if the target modules are supported by PEFT. For example, it's possible to apply LoRA to `nn.Linear` and `nn.Conv2d` layers, but not, for instance, to `nn.LSTM`. If you find a layer class you want to apply PEFT to is not supported, you can:

 - define a custom mapping to dynamically dispatch custom modules in LoRA
 -  open an [issue](https://github.com/huggingface/peft/issues) and request the feature where maintainers will implement it or guide you on how to implement it yourself if demand for this module type is sufficiently high

### Experimental support for dynamic dispatch of custom modules in LoRA

> [!WARNING]
> This feature is experimental and subject to change, depending on its reception by the community. We will introduce a public and stable API if there is significant demand for it.

PEFT supports an experimental API for custom module types for LoRA. Let's assume you have a LoRA implementation for LSTMs. Normally, you would not be able to tell PEFT to use it, even if it would theoretically work with PEFT. However, this is possible with dynamic dispatch of custom layers.

The experimental API currently looks like this:

```python
class MyLoraLSTMLayer:
    ...

base_model = ...  # load the base model that uses LSTMs

# add the LSTM layer names to target_modules
config = LoraConfig(..., target_modules=["lstm"])
# define a mapping from base layer type to LoRA layer type
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
# register the new mapping
config._register_custom_module(custom_module_mapping)
# after registration, create the PEFT model
peft_model = get_peft_model(base_model, config)
# do training
```

> [!TIP]
> When you call [`get_peft_model`], you will see a warning because PEFT does not recognize the targeted module type. In this case, you can ignore this warning.

By supplying a custom mapping, PEFT first checks the base model's layers against the custom mapping and dispatches to the custom LoRA layer type if there is a match. If there is no match, PEFT checks the built-in LoRA layer types for a match.

Therefore, this feature can also be used to override existing dispatch logic, e.g. if you want to use your own LoRA layer for `nn.Linear` instead of using the one provided by PEFT.

When creating your custom LoRA module, please follow the same rules as the [existing LoRA modules](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py). Some important constraints to consider:

- The custom module should inherit from `nn.Module` and `peft.tuners.lora.layer.LoraLayer`.
- The `__init__` method of the custom module should have the positional arguments `base_layer` and `adapter_name`. After this, there are additional `**kwargs` that you are free to use or ignore.
- The learnable parameters should be stored in an `nn.ModuleDict` or `nn.ParameterDict`, where the key corresponds to the name of the specific adapter (remember that a model can have more than one adapter at a time).
- The name of these learnable parameter attributes should start with `"lora_"`, e.g. `self.lora_new_param = ...`.
- Some methods are optional, e.g. you only need to implement `merge` and `unmerge` if you want to support weight merging.

Currently, the information about the custom module does not persist when you save the model. When loading the model, you have to register the custom modules again.

```python
# saving works as always and includes the parameters of the custom modules
peft_model.save_pretrained(<model-path>)

# loading the model later:
base_model = ...
# load the LoRA config that you saved earlier
config =

Download .txt

gitextract___y2fwgs/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.yml
│   │   └── feature-request.yml
│   ├── dependabot.yml
│   ├── workflows/
│   │   ├── build_docker_images.yml
│   │   ├── build_documentation.yml
│   │   ├── build_pr_documentation.yml
│   │   ├── deploy_method_comparison_app.yml
│   │   ├── integrations_tests.yml
│   │   ├── nightly.yml
│   │   ├── stale.yml
│   │   ├── test-docker-build.yml
│   │   ├── tests-main.yml
│   │   ├── tests.yml
│   │   ├── torch_compile_tests.yml
│   │   ├── trufflehog.yml
│   │   ├── upload_pr_documentation.yml
│   │   └── zizmor.yaml
│   └── zizmor.yml
├── .gitignore
├── .pre-commit-config.yaml
├── LICENSE
├── Makefile
├── README.md
├── docker/
│   ├── README.md
│   ├── peft-cpu/
│   │   └── Dockerfile
│   └── peft-gpu/
│       └── Dockerfile
├── docs/
│   ├── Makefile
│   ├── README.md
│   └── source/
│       ├── _config.py
│       ├── _toctree.yml
│       ├── accelerate/
│       │   ├── deepspeed.md
│       │   └── fsdp.md
│       ├── conceptual_guides/
│       │   ├── adapter.md
│       │   ├── ia3.md
│       │   ├── oft.md
│       │   └── prompting.md
│       ├── developer_guides/
│       │   ├── checkpoint.md
│       │   ├── contributing.md
│       │   ├── custom_models.md
│       │   ├── lora.md
│       │   ├── low_level_api.md
│       │   ├── mixed_models.md
│       │   ├── model_merging.md
│       │   ├── quantization.md
│       │   ├── torch_compile.md
│       │   └── troubleshooting.md
│       ├── index.md
│       ├── install.md
│       ├── package_reference/
│       │   ├── adalora.md
│       │   ├── adapter_utils.md
│       │   ├── auto_class.md
│       │   ├── boft.md
│       │   ├── c3a.md
│       │   ├── cartridges.md
│       │   ├── config.md
│       │   ├── cpt.md
│       │   ├── delora.md
│       │   ├── fourierft.md
│       │   ├── functional.md
│       │   ├── gralora.md
│       │   ├── helpers.md
│       │   ├── hotswap.md
│       │   ├── hra.md
│       │   ├── ia3.md
│       │   ├── layernorm_tuning.md
│       │   ├── lily.md
│       │   ├── llama_adapter.md
│       │   ├── loha.md
│       │   ├── lokr.md
│       │   ├── lora.md
│       │   ├── lora_conversion.md
│       │   ├── merge_utils.md
│       │   ├── miss.md
│       │   ├── multitask_prompt_tuning.md
│       │   ├── oft.md
│       │   ├── osf.md
│       │   ├── p_tuning.md
│       │   ├── peanut.md
│       │   ├── peft_model.md
│       │   ├── peft_types.md
│       │   ├── poly.md
│       │   ├── prefix_tuning.md
│       │   ├── prompt_tuning.md
│       │   ├── psoft.md
│       │   ├── pvera.md
│       │   ├── randlora.md
│       │   ├── road.md
│       │   ├── shira.md
│       │   ├── trainable_tokens.md
│       │   ├── tuners.md
│       │   ├── vblora.md
│       │   ├── vera.md
│       │   ├── waveft.md
│       │   └── xlora.md
│       ├── quicktour.md
│       ├── task_guides/
│       │   ├── ia3.md
│       │   ├── lora_based_methods.md
│       │   └── prompt_based_methods.md
│       └── tutorial/
│           ├── peft_integrations.md
│           └── peft_model_config.md
├── examples/
│   ├── alora_finetuning/
│   │   ├── README.md
│   │   └── alora_finetuning.py
│   ├── arrow_multitask/
│   │   ├── arrow_phi3_mini.py
│   │   └── requirements.txt
│   ├── bdlora_finetuning/
│   │   ├── README.md
│   │   ├── bdlora_peft_demo.ipynb
│   │   ├── chat.py
│   │   └── vllm_server.bash
│   ├── boft_controlnet/
│   │   ├── __init__.py
│   │   ├── boft_controlnet.md
│   │   ├── eval.py
│   │   ├── eval.sh
│   │   ├── requirements.txt
│   │   ├── test_controlnet.py
│   │   ├── test_controlnet.sh
│   │   ├── train_controlnet.py
│   │   ├── train_controlnet.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       ├── light_controlnet.py
│   │       ├── pipeline_controlnet.py
│   │       ├── tracemalloc.py
│   │       └── unet_2d_condition.py
│   ├── boft_dreambooth/
│   │   ├── .gitignore
│   │   ├── __init__.py
│   │   ├── boft_dreambooth.md
│   │   ├── dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   ├── train_dreambooth.py
│   │   ├── train_dreambooth.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       └── tracemalloc.py
│   ├── cartridge_self_study/
│   │   ├── README.md
│   │   ├── arxiv_synthesize.py
│   │   ├── arxiv_train.py
│   │   ├── requirements.txt
│   │   ├── synthesize.py
│   │   └── train_distill.py
│   ├── causal_language_modeling/
│   │   ├── accelerate_ds_zero3_cpu_offload_config.yaml
│   │   ├── peft_ln_tuning_clm.ipynb
│   │   ├── peft_lora_clm_accelerate_ds_zero3_offload.py
│   │   ├── peft_lora_clm_with_additional_tokens.ipynb
│   │   ├── peft_prefix_tuning_clm.ipynb
│   │   ├── peft_prompt_tuning_clm.ipynb
│   │   └── requirements.txt
│   ├── conditional_generation/
│   │   ├── accelerate_ds_zero3_cpu_offload_config.yaml
│   │   ├── multitask_prompt_tuning.ipynb
│   │   ├── peft_adalora_seq2seq.py
│   │   ├── peft_ia3_seq2seq.ipynb
│   │   ├── peft_lora_seq2seq.ipynb
│   │   ├── peft_lora_seq2seq_accelerate_ds_zero3_offload.py
│   │   ├── peft_lora_seq2seq_accelerate_fsdp.py
│   │   ├── peft_prefix_tuning_seq2seq.ipynb
│   │   ├── peft_prompt_tuning_seq2seq.ipynb
│   │   ├── peft_prompt_tuning_seq2seq_with_generate.ipynb
│   │   └── requirements.txt
│   ├── corda_finetuning/
│   │   ├── README.md
│   │   ├── corda_finetuning.py
│   │   ├── datautils.py
│   │   └── preprocess.py
│   ├── cpt_finetuning/
│   │   ├── README.md
│   │   └── cpt_train_and_inference.ipynb
│   ├── delora_finetuning/
│   │   ├── README.md
│   │   └── delora_finetuning.py
│   ├── dna_language_models/
│   │   └── dna_lm.ipynb
│   ├── dora_finetuning/
│   │   ├── QDoRA_finetuning.ipynb
│   │   ├── README.md
│   │   ├── dora-caching.py
│   │   └── dora_finetuning.py
│   ├── ephemeral_gpu_offloading/
│   │   └── load_with_dora.py
│   ├── eva_finetuning/
│   │   ├── README.md
│   │   ├── eva_finetuning.py
│   │   ├── eva_finetuning_multi_accelerator.py
│   │   └── utils.py
│   ├── evaluation/
│   │   └── lora-lm-eval.ipynb
│   ├── feature_extraction/
│   │   ├── peft_lora_embedding_semantic_search.py
│   │   ├── peft_lora_embedding_semantic_similarity_inference.ipynb
│   │   └── requirements.txt
│   ├── fp4_finetuning/
│   │   └── finetune_fp4_opt_bnb_peft.py
│   ├── gralora_finetuning/
│   │   ├── README.md
│   │   └── gralora_finetuning.py
│   ├── hra_dreambooth/
│   │   ├── README.md
│   │   ├── dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   ├── train_dreambooth.py
│   │   ├── train_dreambooth.sh
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── args_loader.py
│   │       ├── dataset.py
│   │       └── tracemalloc.py
│   ├── image_classification/
│   │   ├── README.md
│   │   ├── image_classification_peft_lora.ipynb
│   │   └── image_classification_timm_peft_lora.ipynb
│   ├── int8_training/
│   │   ├── Finetune_flan_t5_large_bnb_peft.ipynb
│   │   ├── Finetune_opt_bnb_peft.ipynb
│   │   ├── config.yaml
│   │   ├── fine_tune_blip2_int8.py
│   │   ├── peft_adalora_whisper_large_training.py
│   │   ├── peft_bnb_whisper_large_v2_training.ipynb
│   │   ├── requirements.txt
│   │   └── run_adalora_whisper_int8.sh
│   ├── lily_finetuning/
│   │   ├── README.md
│   │   └── lily_finetuning.py
│   ├── loftq_finetuning/
│   │   ├── LoftQ_weight_replacement.ipynb
│   │   ├── README.md
│   │   ├── int8_correction.py
│   │   ├── quantize_save_load.py
│   │   └── train_gsm8k_llama.py
│   ├── lora_dreambooth/
│   │   ├── colab_notebook.ipynb
│   │   ├── convert_kohya_ss_sd_lora_to_peft.py
│   │   ├── convert_peft_sd_lora_to_kohya_ss.py
│   │   ├── lora_dreambooth_inference.ipynb
│   │   ├── requirements.txt
│   │   └── train_dreambooth.py
│   ├── lora_finetuning_transformer_engine/
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   ├── lora_finetuning_te.py
│   │   └── requirements.txt
│   ├── lora_ga_finetuning/
│   │   ├── README.md
│   │   └── lora_ga_finetuning.py
│   ├── lorafa_finetune/
│   │   ├── README.md
│   │   └── lorafa_finetuning.py
│   ├── miss_finetuning/
│   │   ├── README.md
│   │   └── miss_finetuning.py
│   ├── multi_adapter_examples/
│   │   ├── Lora_Merging.ipynb
│   │   ├── PEFT_Multi_LoRA_Inference.ipynb
│   │   └── multi_adapter_weighted_inference_diffusers.ipynb
│   ├── multilayer_perceptron/
│   │   ├── README.md
│   │   └── multilayer_perceptron_lora.ipynb
│   ├── oft_dreambooth/
│   │   ├── oft_dreambooth_inference.ipynb
│   │   └── train_dreambooth.py
│   ├── olora_finetuning/
│   │   ├── README.md
│   │   └── olora_finetuning.py
│   ├── orthogonal_subspace_learning/
│   │   ├── README.md
│   │   ├── osf_continual_learning.py
│   │   └── utils.py
│   ├── peanut_finetuning/
│   │   ├── README.md
│   │   └── peanut_finetuning.py
│   ├── pissa_finetuning/
│   │   ├── README.md
│   │   ├── pissa_finetuning.py
│   │   └── preprocess.py
│   ├── poly/
│   │   └── peft_poly_seq2seq_with_generate.ipynb
│   ├── psoft_finetuning/
│   │   ├── README.md
│   │   └── psoft_finetuning.py
│   ├── pvera/
│   │   ├── README.md
│   │   └── confidence_interval_generation.py
│   ├── qalora_finetuning/
│   │   ├── README.md
│   │   └── qalora_gptq_finetuning.py
│   ├── randlora_finetuning/
│   │   ├── README.md
│   │   ├── qrandlora_finetuning.ipynb
│   │   └── randlora_finetuning.py
│   ├── road_finetuning/
│   │   ├── README.md
│   │   └── road_finetuning.py
│   ├── semantic_segmentation/
│   │   ├── README.md
│   │   └── semantic_segmentation_peft_lora.ipynb
│   ├── sequence_classification/
│   │   ├── C3A.ipynb
│   │   ├── FourierFT.ipynb
│   │   ├── IA3.ipynb
│   │   ├── LoRA-torchao-8bit-dynamic-activation.ipynb
│   │   ├── LoRA-torchao-8bit.ipynb
│   │   ├── LoRA.ipynb
│   │   ├── P_Tuning.ipynb
│   │   ├── Prompt_Tuning.ipynb
│   │   ├── VBLoRA.ipynb
│   │   ├── VeRA.ipynb
│   │   ├── peft_no_lora_accelerate.py
│   │   ├── prefix_tuning.ipynb
│   │   └── requirements.txt
│   ├── sft/
│   │   ├── README.md
│   │   ├── configs/
│   │   │   ├── deepspeed_config.yaml
│   │   │   ├── deepspeed_config_z3_qlora.yaml
│   │   │   ├── fsdp_config.yaml
│   │   │   └── fsdp_config_qlora.yaml
│   │   ├── requirements.txt
│   │   ├── requirements_colab.txt
│   │   ├── requirements_xpu.txt
│   │   ├── run_peft.sh
│   │   ├── run_peft_deepspeed.sh
│   │   ├── run_peft_fsdp.sh
│   │   ├── run_peft_fsdp_gptq.sh
│   │   ├── run_peft_multigpu.sh
│   │   ├── run_peft_qlora_deepspeed_stage3.sh
│   │   ├── run_peft_qlora_fsdp.sh
│   │   ├── run_unsloth_peft.sh
│   │   ├── train.py
│   │   └── utils.py
│   ├── shira_finetuning/
│   │   ├── README.md
│   │   └── shira_finetuning.py
│   ├── stable_diffusion/
│   │   ├── convert_sd_adapter_to_peft.py
│   │   ├── inc_flux_lora_hpu.py
│   │   └── train_dreambooth.py
│   ├── token_classification/
│   │   ├── peft_lora_ner.ipynb
│   │   ├── peft_lora_token_cls.ipynb
│   │   └── requirements.txt
│   ├── waveft_finetuning/
│   │   ├── README.md
│   │   └── waveft_finetuning.py
│   └── xlora/
│       ├── README.md
│       └── xlora_inference_mistralrs.py
├── method_comparison/
│   ├── MetaMathQA/
│   │   ├── Makefile
│   │   ├── README.md
│   │   ├── data.py
│   │   ├── default_training_params.json
│   │   ├── experiments/
│   │   │   ├── adalora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── adaptionprompt/
│   │   │   │   └── llama-3.2-3B-lr_0.0005/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── boft/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── bone/
│   │   │   │   ├── llama-3.2-3B-bat/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── c3a/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── delora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── fourierft/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-n_frequency-5000/
│   │   │   │       └── adapter_config.json
│   │   │   ├── full-finetuning/
│   │   │   │   └── llama-3.2-3B-lr_0.00001/
│   │   │   │       └── training_params.json
│   │   │   ├── gralora/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── ia3/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── lily/
│   │   │   │   ├── llama-3.2-3B-rank140-mlp-a2-b2-s8.0/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank896-a2-b2-s2.0/
│   │   │   │       └── adapter_config.json
│   │   │   ├── ln_tuning/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── loha/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── lokr/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── lora/
│   │   │   │   ├── llama-3.2-3B-rank10-target-mlp/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank14-target-mlp-bdlora/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32-dora/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-rank32-lorafa/
│   │   │   │   │   ├── adapter_config.json
│   │   │   │   │   └── training_params.json
│   │   │   │   ├── llama-3.2-3B-rank64/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank64-rslora/
│   │   │   │       └── adapter_config.json
│   │   │   ├── miss/
│   │   │   │   ├── llama-3.2-3B-bat/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-mini/
│   │   │   │       └── adapter_config.json
│   │   │   ├── oft/
│   │   │   │   └── llama-3.2-3B-rank32/
│   │   │   │       └── adapter_config.json
│   │   │   ├── osf/
│   │   │   │   └── llama-3.2-3B-rank128/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── peanut/
│   │   │   │   ├── llama-3.2-3B-rank1-relu-depth0-s32.0/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-rank32-relu-depth0-s2.0/
│   │   │   │       └── adapter_config.json
│   │   │   ├── prefixtuning/
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── prompt_tuning/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   ├── llama-3.2-3B-lr_0.001/
│   │   │   │   │   ├── adapter_config.json
│   │   │   │   │   └── training_params.json
│   │   │   │   └── llama-3.2-3B-sample_vocab-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── psoft/
│   │   │   │   ├── llama-3.2-3B-default/
│   │   │   │   │   └── adapter_config.json
│   │   │   │   └── llama-3.2-3B-fast/
│   │   │   │       └── adapter_config.json
│   │   │   ├── ptuning/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── pvera/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── randlora/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── road/
│   │   │   │   └── llama-3.2-3B-lr_0.001/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── shira/
│   │   │   │   └── llama-3.2-3B-lr_0.0003-random_seed_42/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── trainable_tokens/
│   │   │   │   └── llama-3.2-3B-sos+eos/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   ├── vblora/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       └── adapter_config.json
│   │   │   ├── vera/
│   │   │   │   └── llama-3.2-3B-default/
│   │   │   │       ├── adapter_config.json
│   │   │   │       └── training_params.json
│   │   │   └── waveft/
│   │   │       └── llama-3.2-3B-n_frequency-5000/
│   │   │           └── adapter_config.json
│   │   ├── requirements.txt
│   │   ├── results/
│   │   │   ├── .gitkeep
│   │   │   ├── adalora--llama-3.2-3B-rank32.json
│   │   │   ├── adaptionprompt--llama-3.2-3B-lr_0.0005.json
│   │   │   ├── boft--llama-3.2-3B-default.json
│   │   │   ├── bone--llama-3.2-3B-bat.json
│   │   │   ├── bone--llama-3.2-3B-default.json
│   │   │   ├── c3a--llama-3.2-3B-default.json
│   │   │   ├── delora--llama-3.2-3B-rank32.json
│   │   │   ├── fourierft--llama-3.2-3B-default.json
│   │   │   ├── fourierft--llama-3.2-3B-n_frequency-5000.json
│   │   │   ├── full-finetuning--llama-3.2-3B-lr_0.00001.json
│   │   │   ├── gralora--llama-3.2-3B-rank32.json
│   │   │   ├── ia3--llama-3.2-3B-default.json
│   │   │   ├── ia3--llama-3.2-3B-lr_0.001.json
│   │   │   ├── ln_tuning--llama-3.2-3B-default.json
│   │   │   ├── loha--llama-3.2-3B-rank32.json
│   │   │   ├── lokr--llama-3.2-3B-rank32.json
│   │   │   ├── lora--llama-3.2-3B-rank10-target-mlp.json
│   │   │   ├── lora--llama-3.2-3B-rank14-target-mlp-bdlora.json
│   │   │   ├── lora--llama-3.2-3B-rank32-dora.json
│   │   │   ├── lora--llama-3.2-3B-rank32-lorafa.json
│   │   │   ├── lora--llama-3.2-3B-rank32.json
│   │   │   ├── lora--llama-3.2-3B-rank64-rslora.json
│   │   │   ├── lora--llama-3.2-3B-rank64.json
│   │   │   ├── miss--llama-3.2-3B-bat.json
│   │   │   ├── miss--llama-3.2-3B-default.json
│   │   │   ├── miss--llama-3.2-3B-mini.json
│   │   │   ├── oft--llama-3.2-3B-rank32.json
│   │   │   ├── osf--llama-3.2-3B-rank128.json
│   │   │   ├── prefixtuning--llama-3.2-3B-lr_0.001.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-default.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-lr_0.001.json
│   │   │   ├── prompt_tuning--llama-3.2-3B-sample_vocab-lr_0.001.json
│   │   │   ├── ptuning--llama-3.2-3B-default.json
│   │   │   ├── randlora--llama-3.2-3B-default.json
│   │   │   ├── road--llama-3.2-3B-lr_0.001.json
│   │   │   ├── shira--llama-3.2-3B-lr_0.0003-random_seed_42.json
│   │   │   ├── trainable_tokens--llama-3.2-3B-sos+eos.json
│   │   │   ├── vblora--llama-3.2-3B-default.json
│   │   │   ├── vera--llama-3.2-3B-default.json
│   │   │   └── waveft--llama-3.2-3B-n_frequency-5000.json
│   │   ├── run.py
│   │   └── utils.py
│   ├── README.md
│   ├── __init__.py
│   ├── app.py
│   ├── processing.py
│   ├── requirements-app.txt
│   ├── sanitizer.py
│   ├── test_sanitizer.py
│   └── text_generation_benchmark/
│       ├── README.md
│       ├── cancelled_results/
│       │   └── .gitkeep
│       ├── configs/
│       │   └── prompts.json
│       ├── data.py
│       ├── default_benchmark_params.json
│       ├── experiments/
│       │   └── lora/
│       │       └── lora_r8/
│       │           └── adapter_config.json
│       ├── results/
│       │   └── .gitkeep
│       ├── run.py
│       ├── run_base.py
│       ├── temporary_results/
│       │   └── .gitkeep
│       └── utils.py
├── pyproject.toml
├── requirements.txt
├── scripts/
│   ├── ci_clean_cache.py
│   ├── convert-bone-to-miss.py
│   ├── evaluate-lora-conversion.py
│   ├── launch_notebook_mp.py
│   ├── log_reports.py
│   ├── stale.py
│   └── train_memory.py
├── setup.py
├── src/
│   └── peft/
│       ├── __init__.py
│       ├── auto.py
│       ├── config.py
│       ├── functional.py
│       ├── helpers.py
│       ├── import_utils.py
│       ├── mapping.py
│       ├── mapping_func.py
│       ├── mixed_model.py
│       ├── optimizers/
│       │   ├── __init__.py
│       │   ├── lorafa.py
│       │   └── loraplus.py
│       ├── peft_model.py
│       ├── py.typed
│       ├── tuners/
│       │   ├── __init__.py
│       │   ├── _buffer_dict.py
│       │   ├── adalora/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── gptq.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── adaption_prompt/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── boft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── fbd/
│       │   │   │   ├── __init__.py
│       │   │   │   ├── fbd_cuda.cpp
│       │   │   │   └── fbd_cuda_kernel.cu
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── c3a/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── cartridge/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── cpt/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── delora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── fourierft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── gralora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── hra/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── ia3/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lily/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── ln_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── loha/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lokr/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── lora/
│       │   │   ├── __init__.py
│       │   │   ├── aqlm.py
│       │   │   ├── arrow.py
│       │   │   ├── awq.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── conversion.py
│       │   │   ├── corda.py
│       │   │   ├── dora.py
│       │   │   ├── eetq.py
│       │   │   ├── eva.py
│       │   │   ├── gptq.py
│       │   │   ├── hqq.py
│       │   │   ├── inc.py
│       │   │   ├── intruders.py
│       │   │   ├── layer.py
│       │   │   ├── loraga.py
│       │   │   ├── model.py
│       │   │   ├── te.py
│       │   │   ├── torchao.py
│       │   │   ├── tp_layer.py
│       │   │   └── variants.py
│       │   ├── lycoris_utils.py
│       │   ├── miss/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── mixed/
│       │   │   ├── __init__.py
│       │   │   └── model.py
│       │   ├── multitask_prompt_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── oft/
│       │   │   ├── __init__.py
│       │   │   ├── aqlm.py
│       │   │   ├── awq.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── eetq.py
│       │   │   ├── gptq.py
│       │   │   ├── hqq.py
│       │   │   ├── inc.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── osf/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── utils.py
│       │   ├── p_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── peanut/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── poly/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   └── router.py
│       │   ├── prefix_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── prompt_tuning/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   └── model.py
│       │   ├── psoft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── pvera/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── randlora/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── road/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── shira/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   ├── mask_functions.py
│       │   │   └── model.py
│       │   ├── trainable_tokens/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── tuners_utils.py
│       │   ├── vblora/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── vera/
│       │   │   ├── __init__.py
│       │   │   ├── bnb.py
│       │   │   ├── config.py
│       │   │   ├── layer.py
│       │   │   └── model.py
│       │   ├── waveft/
│       │   │   ├── __init__.py
│       │   │   ├── config.py
│       │   │   ├── constants.py
│       │   │   ├── layer.py
│       │   │   ├── model.py
│       │   │   ├── wavelet.py
│       │   │   └── waverec2d.py
│       │   └── xlora/
│       │       ├── __init__.py
│       │       ├── classifier.py
│       │       ├── config.py
│       │       ├── layer.py
│       │       └── model.py
│       └── utils/
│           ├── __init__.py
│           ├── constants.py
│           ├── hotswap.py
│           ├── incremental_pca.py
│           ├── integrations.py
│           ├── loftq_utils.py
│           ├── merge_utils.py
│           ├── other.py
│           ├── peft_types.py
│           ├── save_and_load.py
│           └── warning.py
└── tests/
    ├── __init__.py
    ├── conftest.py
    ├── regression/
    │   ├── __init__.py
    │   └── test_regression.py
    ├── test_adaption_prompt.py
    ├── test_arrow.py
    ├── test_auto.py
    ├── test_boft.py
    ├── test_bufferdict.py
    ├── test_cartridge.py
    ├── test_common_gpu.py
    ├── test_config.py
    ├── test_cpt.py
    ├── test_custom_models.py
    ├── test_decoder_models.py
    ├── test_encoder_decoder_models.py
    ├── test_feature_extraction_models.py
    ├── test_gptqmodel.py
    ├── test_gpu_examples.py
    ├── test_helpers.py
    ├── test_hub_features.py
    ├── test_incremental_pca.py
    ├── test_initialization.py
    ├── test_integrations.py
    ├── test_lora_conversion.py
    ├── test_lora_ga.py
    ├── test_lora_intruders.py
    ├── test_lora_megatron.py
    ├── test_lora_variants.py
    ├── test_lorafa.py
    ├── test_loraplus.py
    ├── test_low_level_api.py
    ├── test_mapping.py
    ├── test_mixed.py
    ├── test_multitask_prompt_tuning.py
    ├── test_osf.py
    ├── test_other.py
    ├── test_poly.py
    ├── test_pvera.py
    ├── test_randlora.py
    ├── test_seq_classifier.py
    ├── test_shira.py
    ├── test_stablediffusion.py
    ├── test_target_parameters.py
    ├── test_torch_compile.py
    ├── test_trainable_tokens.py
    ├── test_tuners_utils.py
    ├── test_vblora.py
    ├── test_vera.py
    ├── test_vision_models.py
    ├── test_xlora.py
    ├── testing_common.py
    ├── testing_utils.py
    └── training/
        ├── adapters.py
        ├── deepspeed_config.yaml
        ├── fsdp2_config.yaml
        ├── fsdp_config.yaml
        ├── lora_tp.py
        ├── tp_config.yaml
        └── training.py

Download .txt

SYMBOL INDEX (295 symbols across 63 files)

FILE: examples/alora_finetuning/alora_finetuning.py
  function train_model (line 17) | def train_model(
  function model_inference (line 162) | def model_inference(model_path: str, adapter_path: str, prompt: str = No...

FILE: examples/arrow_multitask/arrow_phi3_mini.py
  function parse_args (line 99) | def parse_args():
  function read_test_dataset (line 120) | def read_test_dataset(ds_name):
  function extract_input_content (line 139) | def extract_input_content(ds_name, row):
  function create_multi_choice_options (line 152) | def create_multi_choice_options(row, ds_name):
  function extract_multi_choice_target_index (line 172) | def extract_multi_choice_target_index(row, ds_name):
  function set_seed (line 185) | def set_seed(seed: int):
  function compute_loglike_loss (line 195) | def compute_loglike_loss(logits, labels, reduction="none"):
  function evaluate_on_multi_choice_batched (line 220) | def evaluate_on_multi_choice_batched(

FILE: examples/bdlora_finetuning/chat.py
  function chat (line 22) | def chat(

FILE: examples/boft_controlnet/eval.py
  function count_txt_files (line 50) | def count_txt_files(directory):
  function plot_kpts (line 56) | def plot_kpts(image, kpts, color="g"):
  function generate_landmark2d (line 86) | def generate_landmark2d(dataset, input_dir, pred_lmk_dir, gt_lmk_dir, vi...
  function landmark_comparison (line 141) | def landmark_comparison(val_dataset, lmk_dir, gt_lmk_dir):
  function main (line 165) | def main(args):

FILE: examples/boft_controlnet/test_controlnet.py
  function main (line 52) | def main(args):

FILE: examples/boft_controlnet/train_controlnet.py
  function save_adaptor (line 68) | def save_adaptor(accelerator, output_dir, nets_dict):
  function main (line 87) | def main(args):

FILE: examples/boft_controlnet/utils/args_loader.py
  function get_full_repo_name (line 9) | def get_full_repo_name(model_id: str, organization: Optional[str] = None...
  function import_model_class_from_model_name_or_path (line 19) | def import_model_class_from_model_name_or_path(pretrained_model_name_or_...
  function parse_args (line 41) | def parse_args(input_args=None):

FILE: examples/boft_controlnet/utils/dataset.py
  function image_grid (line 13) | def image_grid(imgs, rows, cols):
  function log_validation (line 24) | def log_validation(val_dataset, text_encoder, unet, controlnet, args, ac...
  function make_dataset (line 82) | def make_dataset(args, tokenizer, accelerator, split="train"):
  function collate_fn (line 194) | def collate_fn(examples):

FILE: examples/boft_controlnet/utils/light_controlnet.py
  class ControlNetOutput (line 36) | class ControlNetOutput(BaseOutput):
  class ControlNetConditioningEmbedding (line 41) | class ControlNetConditioningEmbedding(nn.Module):
    method __init__ (line 51) | def __init__(
    method forward (line 73) | def forward(self, conditioning):
  class ControlNetModel (line 86) | class ControlNetModel(ModelMixin, ConfigMixin):
    method __init__ (line 90) | def __init__(
    method attn_processors (line 107) | def attn_processors(self) -> dict[str, AttentionProcessor]:
    method set_attn_processor (line 131) | def set_attn_processor(self, processor: Union[AttentionProcessor, dict...
    method set_default_attn_processor (line 162) | def set_default_attn_processor(self):
    method set_attention_slice (line 169) | def set_attention_slice(self, slice_size):
    method _set_gradient_checkpointing (line 234) | def _set_gradient_checkpointing(self, module, value=False):
    method forward (line 238) | def forward(
  function zero_module (line 260) | def zero_module(module):

FILE: examples/boft_controlnet/utils/pipeline_controlnet.py
  class LightControlNetPipelineOutput (line 33) | class LightControlNetPipelineOutput(BaseOutput):
  class LightControlNetPipeline (line 50) | class LightControlNetPipeline(StableDiffusionControlNetPipeline):
    method check_inputs (line 53) | def check_inputs(
    method __call__ (line 166) | def __call__(

FILE: examples/boft_controlnet/utils/tracemalloc.py
  function b2mb (line 9) | def b2mb(x):
  class TorchTracemalloc (line 14) | class TorchTracemalloc:
    method __enter__ (line 15) | def __enter__(self):
    method cpu_mem_used (line 31) | def cpu_mem_used(self):
    method peak_monitor_func (line 35) | def peak_monitor_func(self):
    method __exit__ (line 47) | def __exit__(self, *exc):

FILE: examples/boft_controlnet/utils/unet_2d_condition.py
  class UNet2DConditionOutput (line 27) | class UNet2DConditionOutput(BaseOutput):
  class UNet2DConditionNewModel (line 37) | class UNet2DConditionNewModel(UNet2DConditionModel):
    method forward (line 38) | def forward(

FILE: examples/boft_dreambooth/train_dreambooth.py
  function save_adaptor (line 70) | def save_adaptor(accelerator, step, unet, text_encoder, args):
  function main (line 83) | def main(args):

FILE: examples/boft_dreambooth/utils/args_loader.py
  function import_model_class_from_model_name_or_path (line 10) | def import_model_class_from_model_name_or_path(pretrained_model_name_or_...
  function get_full_repo_name (line 30) | def get_full_repo_name(model_id: str, organization: Optional[str] = None...
  function parse_args (line 40) | def parse_args(input_args=None):

FILE: examples/boft_dreambooth/utils/dataset.py
  class DreamBoothDataset (line 9) | class DreamBoothDataset(Dataset):
    method __init__ (line 15) | def __init__(
    method __len__ (line 57) | def __len__(self):
    method __getitem__ (line 60) | def __getitem__(self, index):
  function collate_fn (line 90) | def collate_fn(examples, with_prior_preservation=False):
  class PromptDataset (line 112) | class PromptDataset(Dataset):
    method __init__ (line 115) | def __init__(self, prompt, num_samples):
    method __len__ (line 119) | def __len__(self):
    method __getitem__ (line 122) | def __getitem__(self, index):

FILE: examples/boft_dreambooth/utils/tracemalloc.py
  function b2mb (line 9) | def b2mb(x):
  class TorchTracemalloc (line 14) | class TorchTracemalloc:
    method __enter__ (line 15) | def __enter__(self):
    method cpu_mem_used (line 31) | def cpu_mem_used(self):
    method peak_monitor_func (line 35) | def peak_monitor_func(self):
    method __exit__ (line 47) | def __exit__(self, *exc):

FILE: examples/cartridge_self_study/arxiv_synthesize.py
  function main (line 22) | def main():

FILE: examples/cartridge_self_study/arxiv_train.py
  function main (line 26) | def main():

FILE: examples/cartridge_self_study/synthesize.py
  function synthesize_self_study_jsonl (line 54) | def synthesize_self_study_jsonl(
  function _synthesize_vllm (line 116) | def _synthesize_vllm(
  function _synthesize_hf (line 209) | def _synthesize_hf(
  function main (line 294) | def main():

FILE: examples/cartridge_self_study/train_distill.py
  class DistillJsonlDataset (line 28) | class DistillJsonlDataset(Dataset):
    method __init__ (line 29) | def __init__(self, path: str | Path):
    method __len__ (line 36) | def __len__(self) -> int:
    method __getitem__ (line 39) | def __getitem__(self, idx: int):
  class DistillationCollator (line 48) | class DistillationCollator:
    method __init__ (line 49) | def __init__(self, tokenizer):
    method __call__ (line 52) | def __call__(self, features):
  class DistillationTrainer (line 67) | class DistillationTrainer(Trainer):
    method __init__ (line 68) | def __init__(self, *args, top_k: int = 20, teacher_temperature: float ...
    method compute_loss (line 73) | def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
  function main (line 131) | def main():

FILE: examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py
  function levenshtein_distance (line 23) | def levenshtein_distance(str1, str2):
  function get_closest_label (line 44) | def get_closest_label(eval_pred, classes):
  function b2mb (line 56) | def b2mb(x):
  class TorchTracemalloc (line 61) | class TorchTracemalloc:
    method __enter__ (line 62) | def __enter__(self):
    method cpu_mem_used (line 78) | def cpu_mem_used(self):
    method peak_monitor_func (line 82) | def peak_monitor_func(self):
    method __exit__ (line 94) | def __exit__(self, *exc):
  function main (line 110) | def main():

FILE: examples/conditional_generation/peft_adalora_seq2seq.py
  function preprocess_function (line 68) | def preprocess_function(examples):

FILE: examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
  function levenshtein_distance (line 17) | def levenshtein_distance(str1, str2):
  function get_closest_label (line 38) | def get_closest_label(eval_pred, classes):
  function b2mb (line 50) | def b2mb(x):
  class TorchTracemalloc (line 55) | class TorchTracemalloc:
    method __enter__ (line 56) | def __enter__(self):
    method cpu_mem_used (line 72) | def cpu_mem_used(self):
    method peak_monitor_func (line 76) | def peak_monitor_func(self):
    method __exit__ (line 88) | def __exit__(self, *exc):
  function main (line 104) | def main():

FILE: examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py
  function main (line 14) | def main():

FILE: examples/corda_finetuning/corda_finetuning.py
  function get_nb_trainable_parameters (line 38) | def get_nb_trainable_parameters(model) -> tuple[int, int]:
  class TrainingArguments (line 65) | class TrainingArguments(transformers.TrainingArguments):
  function safe_save_model_for_hf_trainer (line 89) | def safe_save_model_for_hf_trainer(trainer: transformers.Trainer, output...
  function smart_tokenizer_and_embedding_resize (line 98) | def smart_tokenizer_and_embedding_resize(
  function _tokenize_fn (line 121) | def _tokenize_fn(strings: Sequence[str], tokenizer: transformers.PreTrai...
  function preprocess (line 145) | def preprocess(
  class DataCollatorForSupervisedDataset (line 164) | class DataCollatorForSupervisedDataset:
    method __call__ (line 169) | def __call__(self, instances: Sequence[dict]) -> dict[str, torch.Tensor]:
  function train_tokenize_function (line 184) | def train_tokenize_function(examples, tokenizer, query, response):
  function train (line 198) | def train():

FILE: examples/corda_finetuning/datautils.py
  function set_seed (line 30) | def set_seed(seed):
  function sample_train_loaders (line 35) | def sample_train_loaders(name, tokenizer, nsamples=128, seed=0, seqlen=2...
  function get_redpajama_train (line 66) | def get_redpajama_train(tokenizer, percent=10, seed=3, batch_size=128, m...
  function get_english_quote (line 80) | def get_english_quote(dataset_name, tokenizer):
  function get_qat_dataset (line 86) | def get_qat_dataset(name, tokenizer, data_percent):
  function get_calib_data (line 106) | def get_calib_data(name, tokenizer, model_id, nsamples, seqlen=2048, see...
  function get_eval_loaders (line 209) | def get_eval_loaders(name, tokenizer):

FILE: examples/corda_finetuning/preprocess.py
  function run_model (line 30) | def run_model(model, calib_loader):
  function main (line 37) | def main(args):

FILE: examples/delora_finetuning/delora_finetuning.py
  function train_model (line 17) | def train_model(

FILE: examples/dora_finetuning/dora-caching.py
  function timeit (line 39) | def timeit(logs):
  function run_benchmark (line 47) | def run_benchmark(model, num_runs):
  function main (line 66) | def main(model_id, num_runs):

FILE: examples/dora_finetuning/dora_finetuning.py
  function train_model (line 17) | def train_model(

FILE: examples/ephemeral_gpu_offloading/load_with_dora.py
  function main (line 51) | def main():

FILE: examples/eva_finetuning/utils.py
  class TokenizerMetaMath (line 19) | class TokenizerMetaMath:
    method format_prompt (line 30) | def format_prompt(self, query):
    method __init__ (line 37) | def __init__(self, tokenizer_path):
    method __call__ (line 40) | def __call__(self, examples):
    method _tokenize_fn (line 45) | def _tokenize_fn(self, prompts, completions):
  class DataCollator (line 56) | class DataCollator:
    method __init__ (line 57) | def __init__(self, eos_token_id, max_length=None):
    method __call__ (line 61) | def __call__(self, batch):

FILE: examples/feature_extraction/peft_lora_embedding_semantic_search.py
  function parse_args (line 42) | def parse_args():
  function save_model_hook (line 154) | def save_model_hook(models, weights, output_dir):
  function load_model_hook (line 161) | def load_model_hook(models, input_dir):
  class AutoModelForSentenceEmbedding (line 169) | class AutoModelForSentenceEmbedding(nn.Module):
    method __init__ (line 170) | def __init__(self, model_name, tokenizer, normalize=True):
    method forward (line 179) | def forward(self, **kwargs):
    method mean_pooling (line 187) | def mean_pooling(self, model_output, attention_mask):
    method __getattr__ (line 192) | def __getattr__(self, name: str):
  function get_cosing_embeddings (line 202) | def get_cosing_embeddings(query_embs, product_embs):
  function get_loss (line 206) | def get_loss(cosine_score, labels):
  function main (line 210) | def main():

FILE: examples/fp4_finetuning/finetune_fp4_opt_bnb_peft.py
  class CastOutputToFloat (line 80) | class CastOutputToFloat(nn.Sequential):
    method forward (line 81) | def forward(self, x):
  function print_trainable_parameters (line 93) | def print_trainable_parameters(model):

FILE: examples/gralora_finetuning/gralora_finetuning.py
  function train_model (line 17) | def train_model(

FILE: examples/hra_dreambooth/train_dreambooth.py
  function save_adaptor (line 70) | def save_adaptor(accelerator, step, unet, text_encoder, args):
  function main (line 83) | def main(args):

FILE: examples/hra_dreambooth/utils/args_loader.py
  function import_model_class_from_model_name_or_path (line 12) | def import_model_class_from_model_name_or_path(pretrained_model_name_or_...
  function get_full_repo_name (line 32) | def get_full_repo_name(model_id: str, organization: Optional[str] = None...
  function parse_args (line 42) | def parse_args(input_args=None):

FILE: examples/hra_dreambooth/utils/dataset.py
  class DreamBoothDataset (line 11) | class DreamBoothDataset(Dataset):
    method __init__ (line 17) | def __init__(
    method __len__ (line 59) | def __len__(self):
    method __getitem__ (line 62) | def __getitem__(self, index):
  function collate_fn (line 92) | def collate_fn(examples, with_prior_preservation=False):
  class PromptDataset (line 114) | class PromptDataset(Dataset):
    method __init__ (line 117) | def __init__(self, prompt, num_samples):
    method __len__ (line 121) | def __len__(self):
    method __getitem__ (line 124) | def __getitem__(self, index):

FILE: examples/hra_dreambooth/utils/tracemalloc.py
  function b2mb (line 11) | def b2mb(x):
  class TorchTracemalloc (line 16) | class TorchTracemalloc:
    method __enter__ (line 17) | def __enter__(self):
    method cpu_mem_used (line 33) | def cpu_mem_used(self):
    method peak_monitor_func (line 37) | def peak_monitor_func(self):
    method __exit__ (line 49) | def __exit__(self, *exc):

FILE: examples/int8_training/fine_tune_blip2_int8.py
  class ImageCaptioningDataset (line 44) | class ImageCaptioningDataset(Dataset):
    method __init__ (line 45) | def __init__(self, dataset, processor):
    method __len__ (line 49) | def __len__(self):
    method __getitem__ (line 52) | def __getitem__(self, idx):
  function collator (line 61) | def collator(batch):

FILE: examples/int8_training/peft_adalora_whisper_large_training.py
  function parse_args (line 49) | def parse_args():
  function load_streaming_dataset (line 280) | def load_streaming_dataset(dataset_name, dataset_config_name, split, **k...
  function prepare_dataset_wrapper (line 296) | def prepare_dataset_wrapper(do_lower_case, do_remove_punctuation, proces...
  function save_model_hook (line 322) | def save_model_hook(models, weights, output_dir):
  function load_model_hook (line 329) | def load_model_hook(models, input_dir):
  class DataCollatorSpeechSeq2SeqWithPadding (line 337) | class DataCollatorSpeechSeq2SeqWithPadding:
    method __call__ (line 340) | def __call__(self, features: list[dict[str, Union[list[int], torch.Ten...
  function get_audio_length_processor (line 364) | def get_audio_length_processor(max_input_length):
  function evaluation_loop (line 371) | def evaluation_loop(model, eval_dataloader, processor, normalizer, metri...
  function main (line 423) | def main():

FILE: examples/lily_finetuning/lily_finetuning.py
  function train_model (line 17) | def train_model(

FILE: examples/loftq_finetuning/int8_correction.py
  class MyLinear8bitLt (line 68) | class MyLinear8bitLt(peft.tuners.lora.bnb.Linear8bitLt):
    method forward (line 69) | def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor:
  function get_logits (line 204) | def get_logits(model, inputs):
  function mse (line 213) | def mse(a, b, attention_mask=None):
  function get_model (line 225) | def get_model(*args, **kwargs):

FILE: examples/loftq_finetuning/quantize_save_load.py
  class Shell (line 30) | class Shell(nn.Module):
    method __init__ (line 31) | def __init__(self, weight, bias=None):
  function unwrap_model (line 38) | def unwrap_model(model, sub_module_name=".base_layer"):
  function print_model (line 59) | def print_model(model, name):
  function arg_parse (line 78) | def arg_parse():
  function quantize_and_save (line 121) | def quantize_and_save():

FILE: examples/loftq_finetuning/train_gsm8k_llama.py
  function parse_args (line 62) | def parse_args():
  function main (line 297) | def main():
  function extract_answer_number (line 817) | def extract_answer_number(sentence: str) -> float:
  function compute_accuracy (line 841) | def compute_accuracy(pred: list, gold: list):

FILE: examples/lora_dreambooth/convert_kohya_ss_sd_lora_to_peft.py
  class LoRAInfo (line 25) | class LoRAInfo:
    method peft_state_dict (line 33) | def peft_state_dict(self) -> dict[str, torch.Tensor]:
  function construct_peft_loraconfig (line 39) | def construct_peft_loraconfig(info: dict[str, LoRAInfo]) -> LoraConfig:
  function combine_peft_state_dict (line 78) | def combine_peft_state_dict(info: dict[str, LoRAInfo]) -> dict[str, torc...

FILE: examples/lora_dreambooth/convert_peft_sd_lora_to_kohya_ss.py
  function get_module_kohya_state_dict (line 19) | def get_module_kohya_state_dict(

FILE: examples/lora_dreambooth/train_dreambooth.py
  function import_model_class_from_model_name_or_path (line 53) | def import_model_class_from_model_name_or_path(pretrained_model_name_or_...
  function parse_args (line 73) | def parse_args(input_args=None):
  function b2mb (line 406) | def b2mb(x):
  class TorchTracemalloc (line 411) | class TorchTracemalloc:
    method __enter__ (line 412) | def __enter__(self):
    method cpu_mem_used (line 428) | def cpu_mem_used(self):
    method peak_monitor_func (line 432) | def peak_monitor_func(self):
    method __exit__ (line 444) | def __exit__(self, *exc):
  class DreamBoothDataset (line 460) | class DreamBoothDataset(Dataset):
    method __init__ (line 466) | def __init__(
    method __len__ (line 508) | def __len__(self):
    method __getitem__ (line 511) | def __getitem__(self, index):
  function collate_fn (line 541) | def collate_fn(examples, with_prior_preservation=False):
  class PromptDataset (line 563) | class PromptDataset(Dataset):
    method __init__ (line 566) | def __init__(self, prompt, num_samples):
    method __len__ (line 570) | def __len__(self):
    method __getitem__ (line 573) | def __getitem__(self, index):
  function main (line 580) | def main(args):

FILE: examples/lora_finetuning_transformer_engine/lora_finetuning_te.py
  function parse_args (line 46) | def parse_args():
  function set_seed (line 101) | def set_seed(seed: int):
  function build_synthetic_sequences (line 110) | def build_synthetic_sequences(num_samples: int, min_len: int, max_len: i...
  function ss_char_to_label (line 119) | def ss_char_to_label(char: str) -> int:
  function tokenize_and_align_labels (line 124) | def tokenize_and_align_labels(sequences, label_strings, tokenizer, max_l...
  function load_parquet_dataset (line 156) | def load_parquet_dataset(train_path: str, val_path: str, tokenizer, max_...
  function compute_metrics (line 174) | def compute_metrics(eval_pred):
  function residue_to_ss_char (line 185) | def residue_to_ss_char(aa: str) -> str:
  function sequence_to_synthetic_labels (line 194) | def sequence_to_synthetic_labels(sequence: str) -> str:
  function make_synthetic_dataset (line 199) | def make_synthetic_dataset(
  function main (line 227) | def main():

FILE: examples/lora_ga_finetuning/lora_ga_finetuning.py
  function parse_args (line 34) | def parse_args():
  function prepare_dataset (line 94) | def prepare_dataset(dataset_name, dataset_config, tokenizer, max_length):
  function main (line 116) | def main():

FILE: examples/lorafa_finetune/lorafa_finetuning.py
  function train_model (line 33) | def train_model(

FILE: examples/miss_finetuning/miss_finetuning.py
  class ScriptArguments (line 28) | class ScriptArguments(SFTConfig):

FILE: examples/oft_dreambooth/train_dreambooth.py
  function import_model_class_from_model_name_or_path (line 54) | def import_model_class_from_model_name_or_path(pretrained_model_name_or_...
  function parse_args (line 74) | def parse_args(input_args=None):
  function b2mb (line 416) | def b2mb(x):
  class TorchTracemalloc (line 421) | class TorchTracemalloc:
    method __enter__ (line 422) | def __enter__(self):
    method cpu_mem_used (line 438) | def cpu_mem_used(self):
    method peak_monitor_func (line 442) | def peak_monitor_func(self):
    method __exit__ (line 454) | def __exit__(self, *exc):
  class DreamBoothDataset (line 470) | class DreamBoothDataset(Dataset):
    method __init__ (line 476) | def __init__(
    method __len__ (line 518) | def __len__(self):
    method __getitem__ (line 521) | def __getitem__(self, index):
  function collate_fn (line 551) | def collate_fn(examples, with_prior_preservation=False):
  class PromptDataset (line 573) | class PromptDataset(Dataset):
    method __init__ (line 576) | def __init__(self, prompt, num_samples):
    method __len__ (line 580) | def __len__(self):
    method __getitem__ (line 583) | def __getitem__(self, index):
  function main (line 590) | def main(args):

FILE: examples/olora_finetuning/olora_finetuning.py
  function train (line 30) | def train(
  function generate_prompt (line 145) | def generate_prompt(example):

FILE: examples/orthogonal_subspace_learning/osf_continual_learning.py
  function compute_accuracy_scienceqa (line 51) | def compute_accuracy_scienceqa(model, eval_dataset, tokenizer, data_coll...
  function compute_accuracy_numglue (line 99) | def compute_accuracy_numglue(model, eval_dataset, tokenizer, data_collat...
  function compute_accuracy_fomc (line 142) | def compute_accuracy_fomc(model, eval_dataset, tokenizer, data_collator):
  function evaluate_model (line 190) | def evaluate_model(model, eval_dataset, data_collator, tokenizer, task_n...
  function train_with_osf (line 218) | def train_with_osf(
  function train_full_finetuning (line 383) | def train_full_finetuning(
  function print_results_comparison (line 511) | def print_results_comparison(osf_history, full_history):
  function main (line 597) | def main():

FILE: examples/orthogonal_subspace_learning/utils.py
  function load_scienceqa (line 20) | def load_scienceqa(num_train=1000, num_eval=200, seed=42):
  function load_numglue (line 42) | def load_numglue(num_train=1000, num_eval=200, seed=42):
  function load_fomc (line 94) | def load_fomc(num_train=1000, num_eval=200, seed=42):
  function format_scienceqa_for_llama (line 119) | def format_scienceqa_for_llama(examples, tokenizer, max_length=512):
  function format_numglue_for_llama (line 180) | def format_numglue_for_llama(examples, tokenizer, max_length=512):
  function format_fomc_for_llama (line 224) | def format_fomc_for_llama(examples, tokenizer, max_length=512):
  class DataCollatorForCompletionOnly (line 269) | class DataCollatorForCompletionOnly:
    method __init__ (line 272) | def __init__(self, tokenizer, max_length=512):
    method __call__ (line 276) | def __call__(self, features):

FILE: examples/peanut_finetuning/peanut_finetuning.py
  function train_model (line 17) | def train_model(

FILE: examples/pissa_finetuning/pissa_finetuning.py
  class ScriptArguments (line 28) | class ScriptArguments(SFTConfig):

FILE: examples/psoft_finetuning/psoft_finetuning.py
  class ScriptArguments (line 28) | class ScriptArguments(SFTConfig):
  function _dtype_from_bits (line 87) | def _dtype_from_bits(bits: str) -> torch.dtype:
  function main (line 98) | def main():

FILE: examples/pvera/confidence_interval_generation.py
  function mean_confidence_interval (line 67) | def mean_confidence_interval(data, confidence=0.95):

FILE: examples/qalora_finetuning/qalora_gptq_finetuning.py
  function load_or_quantize_model (line 24) | def load_or_quantize_model(
  function tokenize_and_preprocess (line 109) | def tokenize_and_preprocess(examples, tokenizer, max_length: int = 128):
  function train_model (line 132) | def train_model(

FILE: examples/randlora_finetuning/randlora_finetuning.py
  function train_model (line 18) | def train_model(

FILE: examples/road_finetuning/road_finetuning.py
  function train_model (line 31) | def train_model(

Copy disabled (too large) Download .json

Condensed preview — 709 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (16,357K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yml",
    "chars": 1805,
    "preview": "name: \"\\U0001F41B Bug Report\"\ndescription: Submit a bug report to help us improve the library\nbody:\n  - type: textarea\n "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.yml",
    "chars": 614,
    "preview": "name: \"\\U0001F680 Feature request\"\ndescription: Submit a proposal/request for a new feature\nlabels: [ \"feature\" ]\nbody:\n"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 302,
    "preview": "version: 2\nupdates:\n  - package-ecosystem: \"github-actions\"\n    directory: \"/\"\n    schedule:\n      interval: \"monthly\"\n "
  },
  {
    "path": ".github/workflows/build_docker_images.yml",
    "chars": 2827,
    "preview": "name: Build Docker images (scheduled)\n\non:\n  workflow_dispatch:\n  workflow_call:\n  schedule:\n    - cron: \"0 1 * * *\"\n\nco"
  },
  {
    "path": ".github/workflows/build_documentation.yml",
    "chars": 537,
    "preview": "name: Build documentation\n\non:\n  push:\n    branches:\n      - main\n      - doc-builder*\n      - v*-release\n\npermissions: "
  },
  {
    "path": ".github/workflows/build_pr_documentation.yml",
    "chars": 519,
    "preview": "name: Build PR Documentation\n\non:\n  pull_request:\n\nconcurrency:\n  group: ${{ github.workflow }}-${{ github.head_ref || g"
  },
  {
    "path": ".github/workflows/deploy_method_comparison_app.yml",
    "chars": 1343,
    "preview": "name: Deploy \"method_comparison\" Gradio to Spaces\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - \"method_compari"
  },
  {
    "path": ".github/workflows/integrations_tests.yml",
    "chars": 3063,
    "preview": "name: integration tests\n\non:\n  workflow_dispatch:\n    inputs:\n      branch:\n        description: 'Branch to test on'\n   "
  },
  {
    "path": ".github/workflows/nightly.yml",
    "chars": 4111,
    "preview": "name: Self-hosted runner with slow tests (scheduled)\n\non:\n  workflow_dispatch:\n  schedule:\n    - cron: \"0 2 * * *\"\n\nenv:"
  },
  {
    "path": ".github/workflows/stale.yml",
    "chars": 770,
    "preview": "name: Stale Bot\n\non:\n  schedule:\n    - cron: \"0 15 * * *\"\n\npermissions: {}\n\njobs:\n  close_stale_issues:\n    name: Close "
  },
  {
    "path": ".github/workflows/test-docker-build.yml",
    "chars": 2253,
    "preview": "name: Test Docker images (on PR)\n\non:\n  pull_request:\n    paths:\n      # Run only when DockerFile files are modified\n   "
  },
  {
    "path": ".github/workflows/tests-main.yml",
    "chars": 2728,
    "preview": "name: tests on transformers main\n\non:\n  push:\n    branches: [main]\n    paths-ignore:\n        - 'docs/**'\n\npermissions: {"
  },
  {
    "path": ".github/workflows/tests.yml",
    "chars": 5984,
    "preview": "name: tests\n\non:\n  push:\n    branches: [main]\n    paths-ignore:\n      - 'docs/**'\n  pull_request:\n    paths-ignore:\n    "
  },
  {
    "path": ".github/workflows/torch_compile_tests.yml",
    "chars": 1913,
    "preview": "name: torch compile tests\n\non:\n  workflow_dispatch:\n    inputs:\n      branch:\n        description: 'Branch to test on'\n "
  },
  {
    "path": ".github/workflows/trufflehog.yml",
    "chars": 403,
    "preview": "on:\n  push:\n\nname: Secret Leaks\n\npermissions: {}\n\njobs:\n  trufflehog:\n    runs-on: ubuntu-latest\n    steps:\n    - name: "
  },
  {
    "path": ".github/workflows/upload_pr_documentation.yml",
    "chars": 439,
    "preview": "name: Upload PR Documentation\n\non:\n  workflow_run:\n    workflows: [\"Build PR Documentation\"]\n    types:\n      - complete"
  },
  {
    "path": ".github/workflows/zizmor.yaml",
    "chars": 605,
    "preview": "name: CI security linting\n\non:\n  push:\n    branches: [\"main\"]\n  pull_request:\n    branches: [\"*\"]\n    paths:\n      - '.g"
  },
  {
    "path": ".github/zizmor.yml",
    "chars": 967,
    "preview": "rules:\n  dangerous-triggers:\n    ignore:\n      # this workflow is only triggered after maintainer approval\n      - uploa"
  },
  {
    "path": ".gitignore",
    "chars": 2010,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 303,
    "preview": "repos:\n  - repo: https://github.com/astral-sh/ruff-pre-commit\n    rev: v0.12.8\n    hooks:\n      - id: ruff\n        args:"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "Makefile",
    "chars": 4653,
    "preview": ".PHONY: quality style test docs\n\ncheck_dirs := src tests examples docs scripts docker\n\n# Check that source code meets qu"
  },
  {
    "path": "README.md",
    "chars": 12208,
    "preview": "<!---\nCopyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "docker/README.md",
    "chars": 327,
    "preview": "# PEFT Docker images\n\nHere we store all PEFT Docker images used in our testing infrastructure. We use python 3.11 for no"
  },
  {
    "path": "docker/peft-cpu/Dockerfile",
    "chars": 1689,
    "preview": "# Builds GPU docker image of PyTorch\n# Uses multi-staged approach to reduce size\n# Stage 1\n# Use base conda image to red"
  },
  {
    "path": "docker/peft-gpu/Dockerfile",
    "chars": 3144,
    "preview": "# Builds GPU docker image of PyTorch\n# Uses multi-staged approach to reduce size\n# Stage 1\n# Use base conda image to red"
  },
  {
    "path": "docs/Makefile",
    "chars": 585,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    =\nSPHI"
  },
  {
    "path": "docs/README.md",
    "chars": 10429,
    "preview": "<!---\nCopyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "docs/source/_config.py",
    "chars": 280,
    "preview": "# docstyle-ignore\nINSTALL_CONTENT = \"\"\"\n# PEFT installation\n! pip install peft accelerate transformers\n# To install from"
  },
  {
    "path": "docs/source/_toctree.yml",
    "chars": 4703,
    "preview": "- title: Get started\n  sections:\n  - local: index\n    title: 🤗 PEFT\n  - local: quicktour\n    title: Quicktour\n  - local:"
  },
  {
    "path": "docs/source/accelerate/deepspeed.md",
    "chars": 22579,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/accelerate/fsdp.md",
    "chars": 13379,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/conceptual_guides/adapter.md",
    "chars": 16016,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/conceptual_guides/ia3.md",
    "chars": 4135,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/conceptual_guides/oft.md",
    "chars": 10772,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/conceptual_guides/prompting.md",
    "chars": 8684,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/developer_guides/checkpoint.md",
    "chars": 13944,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/contributing.md",
    "chars": 7204,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/custom_models.md",
    "chars": 15659,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/lora.md",
    "chars": 52974,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/low_level_api.md",
    "chars": 6578,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/mixed_models.md",
    "chars": 2779,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/model_merging.md",
    "chars": 8496,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/quantization.md",
    "chars": 17681,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/torch_compile.md",
    "chars": 3593,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/developer_guides/troubleshooting.md",
    "chars": 25276,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/index.md",
    "chars": 3720,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/install.md",
    "chars": 1581,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/adalora.md",
    "chars": 2769,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/adapter_utils.md",
    "chars": 1272,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/auto_class.md",
    "chars": 1684,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/boft.md",
    "chars": 2638,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/c3a.md",
    "chars": 2953,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/cartridges.md",
    "chars": 4054,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/config.md",
    "chars": 844,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/cpt.md",
    "chars": 3033,
    "preview": "<!-- Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "docs/source/package_reference/delora.md",
    "chars": 3106,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/fourierft.md",
    "chars": 2674,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/functional.md",
    "chars": 1186,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/gralora.md",
    "chars": 2583,
    "preview": "# GraLoRA\n\n[**Granular Low-Rank Adaptation (GraLoRA)**](https://huggingface.co/papers/2505.20355) is a PEFT method desig"
  },
  {
    "path": "docs/source/package_reference/helpers.md",
    "chars": 724,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/hotswap.md",
    "chars": 3079,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/hra.md",
    "chars": 2799,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/ia3.md",
    "chars": 2600,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/layernorm_tuning.md",
    "chars": 2832,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/lily.md",
    "chars": 3337,
    "preview": "<!--Copyright 2026 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/llama_adapter.md",
    "chars": 2486,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/loha.md",
    "chars": 2223,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/lokr.md",
    "chars": 1181,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/lora.md",
    "chars": 5559,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/lora_conversion.md",
    "chars": 11391,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/merge_utils.md",
    "chars": 1213,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/miss.md",
    "chars": 2792,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/multitask_prompt_tuning.md",
    "chars": 2264,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/oft.md",
    "chars": 2546,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/osf.md",
    "chars": 10380,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/p_tuning.md",
    "chars": 2191,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/peanut.md",
    "chars": 3474,
    "preview": "<!--Copyright 2026 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/peft_model.md",
    "chars": 1671,
    "preview": "<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not "
  },
  {
    "path": "docs/source/package_reference/peft_types.md",
    "chars": 971,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/poly.md",
    "chars": 4677,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/prefix_tuning.md",
    "chars": 3528,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/prompt_tuning.md",
    "chars": 2390,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/psoft.md",
    "chars": 6974,
    "preview": "<!--Copyright 2026 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/pvera.md",
    "chars": 3838,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/randlora.md",
    "chars": 6128,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/road.md",
    "chars": 2389,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/shira.md",
    "chars": 3315,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/trainable_tokens.md",
    "chars": 2938,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/tuners.md",
    "chars": 1296,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/vblora.md",
    "chars": 3563,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/vera.md",
    "chars": 3344,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/waveft.md",
    "chars": 2494,
    "preview": "<!--Copyright 2025 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/package_reference/xlora.md",
    "chars": 6181,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/quicktour.md",
    "chars": 7855,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/task_guides/ia3.md",
    "chars": 9787,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/task_guides/lora_based_methods.md",
    "chars": 15447,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/task_guides/prompt_based_methods.md",
    "chars": 13812,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/tutorial/peft_integrations.md",
    "chars": 7233,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "docs/source/tutorial/peft_model_config.md",
    "chars": 8259,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "examples/alora_finetuning/README.md",
    "chars": 3030,
    "preview": "# Activated LoRA (aLoRA)\n\n## Introduction\nActivated LoRA (aLoRA) is an adapter that selectively activates its weights on"
  },
  {
    "path": "examples/alora_finetuning/alora_finetuning.py",
    "chars": 9566,
    "preview": "import os\n\nimport torch\nfrom datasets import load_dataset\nfrom transformers import (\n    AutoModelForCausalLM,\n    AutoT"
  },
  {
    "path": "examples/arrow_multitask/arrow_phi3_mini.py",
    "chars": 14358,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/arrow_multitask/requirements.txt",
    "chars": 76,
    "preview": "torch\ntransformers\naccelerate\ndatasets\nscikit-learn\ntqdm\nnumpy\nbitsandbytes\n"
  },
  {
    "path": "examples/bdlora_finetuning/README.md",
    "chars": 1481,
    "preview": "# BD-LoRA Finetuning\n\nBlock-Diagonal LoRA (BD-LoRA) is a LoRA variant in which some LoRA factors are constrained to be b"
  },
  {
    "path": "examples/bdlora_finetuning/bdlora_peft_demo.ipynb",
    "chars": 17032,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e2b69494\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Block-Diagon"
  },
  {
    "path": "examples/bdlora_finetuning/chat.py",
    "chars": 1908,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/bdlora_finetuning/vllm_server.bash",
    "chars": 351,
    "preview": "#!/bin/bash\n\nADAPTER_PATH=\"example_bd_lora_adapter\"\n\npython -m vllm.entrypoints.openai.api_server \\\n    --model meta-lla"
  },
  {
    "path": "examples/boft_controlnet/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "examples/boft_controlnet/boft_controlnet.md",
    "chars": 7205,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "examples/boft_controlnet/eval.py",
    "chars": 7308,
    "preview": "# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/boft_controlnet/eval.sh",
    "chars": 833,
    "preview": "PEFT_TYPE=\"boft\"\nBLOCK_NUM=8\nBLOCK_SIZE=0\nN_BUTTERFLY_FACTOR=1\nITER_NUM=50000\n\nexport RUN_NAME=\"${PEFT_TYPE}_${BLOCK_NUM"
  },
  {
    "path": "examples/boft_controlnet/requirements.txt",
    "chars": 219,
    "preview": "datasets==2.16.1\ndiffusers==0.34.0\ntransformers==4.54.0\naccelerate==1.9.0\nwandb==0.16.1\nscikit-image==0.22.0\nopencv-pyth"
  },
  {
    "path": "examples/boft_controlnet/test_controlnet.py",
    "chars": 4507,
    "preview": "# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/boft_controlnet/test_controlnet.sh",
    "chars": 859,
    "preview": "PEFT_TYPE=\"boft\"\nBLOCK_NUM=8\nBLOCK_SIZE=0\nN_BUTTERFLY_FACTOR=1\nITER_NUM=50000\n\nexport RUN_NAME=\"${PEFT_TYPE}_${BLOCK_NUM"
  },
  {
    "path": "examples/boft_controlnet/train_controlnet.py",
    "chars": 22927,
    "preview": "#!/usr/bin/env python\n# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version"
  },
  {
    "path": "examples/boft_controlnet/train_controlnet.sh",
    "chars": 1286,
    "preview": "PEFT_TYPE=\"boft\"\nBLOCK_NUM=8\nBLOCK_SIZE=0\nN_BUTTERFLY_FACTOR=1\n\nexport DATASET_NAME=\"oftverse/control-celeba-hq\"\nexport "
  },
  {
    "path": "examples/boft_controlnet/utils/__init__.py",
    "chars": 1,
    "preview": "\n"
  },
  {
    "path": "examples/boft_controlnet/utils/args_loader.py",
    "chars": 17824,
    "preview": "import argparse\nimport os\nfrom typing import Optional\n\nfrom huggingface_hub import HfFolder, whoami\nfrom transformers im"
  },
  {
    "path": "examples/boft_controlnet/utils/dataset.py",
    "chars": 7735,
    "preview": "import random\n\nimport numpy as np\nimport torch\nimport wandb\nfrom datasets import load_dataset\nfrom diffusers import DDIM"
  },
  {
    "path": "examples/boft_controlnet/utils/light_controlnet.py",
    "chars": 10906,
    "preview": "# Copyright 2023 The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "examples/boft_controlnet/utils/pipeline_controlnet.py",
    "chars": 22640,
    "preview": "# Copyright 2023 The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "examples/boft_controlnet/utils/tracemalloc.py",
    "chars": 2025,
    "preview": "import gc\nimport threading\n\nimport psutil\nimport torch\n\n\n# Converting Bytes to Megabytes\ndef b2mb(x):\n    return int(x /"
  },
  {
    "path": "examples/boft_controlnet/utils/unet_2d_condition.py",
    "chars": 13382,
    "preview": "# Copyright 2023 The HuggingFace Team. All rights reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "examples/boft_dreambooth/.gitignore",
    "chars": 6,
    "preview": "data/\n"
  },
  {
    "path": "examples/boft_dreambooth/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "examples/boft_dreambooth/boft_dreambooth.md",
    "chars": 8784,
    "preview": "<!--Copyright 2023 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "examples/boft_dreambooth/dreambooth_inference.ipynb",
    "chars": 5666,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"acab479f\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/boft_dreambooth/requirements.txt",
    "chars": 174,
    "preview": "transformers==4.54.0\naccelerate==1.9.0\nevaluate\ntqdm\ndatasets==4.0.0\ndiffusers==0.34.0\nPillow\nhuggingface_hub\nsafetensor"
  },
  {
    "path": "examples/boft_dreambooth/train_dreambooth.py",
    "chars": 28218,
    "preview": "#!/usr/bin/env python\n# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version"
  },
  {
    "path": "examples/boft_dreambooth/train_dreambooth.sh",
    "chars": 8044,
    "preview": "IDX=$1\nPROMPT_IDX=$((IDX % 25))\nCLASS_IDX=$((IDX % 30))\n\n# Define the UNIQUE_TOKEN, CLASS_TOKENs, and SUBJECT_NAMES\nUNIQ"
  },
  {
    "path": "examples/boft_dreambooth/utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "examples/boft_dreambooth/utils/args_loader.py",
    "chars": 14137,
    "preview": "import argparse\nimport os\nimport warnings\nfrom typing import Optional\n\nfrom huggingface_hub import HfFolder, whoami\nfrom"
  },
  {
    "path": "examples/boft_dreambooth/utils/dataset.py",
    "chars": 4410,
    "preview": "from pathlib import Path\n\nimport torch\nfrom PIL import Image\nfrom torch.utils.data import Dataset\nfrom torchvision impor"
  },
  {
    "path": "examples/boft_dreambooth/utils/tracemalloc.py",
    "chars": 2025,
    "preview": "import gc\nimport threading\n\nimport psutil\nimport torch\n\n\n# Converting Bytes to Megabytes\ndef b2mb(x):\n    return int(x /"
  },
  {
    "path": "examples/cartridge_self_study/README.md",
    "chars": 3324,
    "preview": "# CARTRIDGE self-study distillation (example)\n\nThis folder shows an **example** workflow for training a `CARTRIDGE` adap"
  },
  {
    "path": "examples/cartridge_self_study/arxiv_synthesize.py",
    "chars": 3423,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/cartridge_self_study/arxiv_train.py",
    "chars": 4212,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/cartridge_self_study/requirements.txt",
    "chars": 29,
    "preview": "torch\ntransformers\npeft\nvllm\n"
  },
  {
    "path": "examples/cartridge_self_study/synthesize.py",
    "chars": 12515,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/cartridge_self_study/train_distill.py",
    "chars": 8737,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/causal_language_modeling/accelerate_ds_zero3_cpu_offload_config.yaml",
    "chars": 506,
    "preview": "compute_environment: LOCAL_MACHINE\ndeepspeed_config:\n  gradient_accumulation_steps: 1\n  gradient_clipping: 1.0\n  offload"
  },
  {
    "path": "examples/causal_language_modeling/peft_ln_tuning_clm.ipynb",
    "chars": 51574,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"71fbfca2\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py",
    "chars": 16046,
    "preview": "import gc\nimport os\nimport sys\nimport threading\n\nimport psutil\nimport torch\nfrom accelerate import Accelerator\nfrom data"
  },
  {
    "path": "examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb",
    "chars": 39242,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"5f239612-620e-4430-8685-9fdc6b179b41\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "examples/causal_language_modeling/peft_prefix_tuning_clm.ipynb",
    "chars": 53703,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"71fbfca2\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/causal_language_modeling/peft_prompt_tuning_clm.ipynb",
    "chars": 50024,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"71fbfca2\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/causal_language_modeling/requirements.txt",
    "chars": 84,
    "preview": "transformers<4.54.0\naccelerate\nevaluate\ndeepspeed\ntqdm\ndataclass-csv\ndatasets==3.6.0"
  },
  {
    "path": "examples/conditional_generation/accelerate_ds_zero3_cpu_offload_config.yaml",
    "chars": 506,
    "preview": "compute_environment: LOCAL_MACHINE\ndeepspeed_config:\n  gradient_accumulation_steps: 1\n  gradient_clipping: 1.0\n  offload"
  },
  {
    "path": "examples/conditional_generation/multitask_prompt_tuning.ipynb",
    "chars": 13195,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"58ff91ca-ce92-43d0-ae8b-4e9e89e193f6\",\n "
  },
  {
    "path": "examples/conditional_generation/peft_adalora_seq2seq.py",
    "chars": 5739,
    "preview": "import os\n\nimport torch\nfrom datasets import load_dataset\nfrom torch.utils.data import DataLoader\nfrom tqdm import tqdm\n"
  },
  {
    "path": "examples/conditional_generation/peft_ia3_seq2seq.ipynb",
    "chars": 93603,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"0c152fc8\",\n   \"metadata\": {\n    \"id\": \"5"
  },
  {
    "path": "examples/conditional_generation/peft_lora_seq2seq.ipynb",
    "chars": 12520,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5f93b7d1\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py",
    "chars": 13573,
    "preview": "import gc\nimport os\nimport sys\nimport threading\n\nimport psutil\nimport torch\nfrom accelerate import Accelerator\nfrom data"
  },
  {
    "path": "examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py",
    "chars": 5713,
    "preview": "import os\n\nimport torch\nfrom accelerate import Accelerator\nfrom datasets import load_dataset\nfrom torch.utils.data impor"
  },
  {
    "path": "examples/conditional_generation/peft_prefix_tuning_seq2seq.ipynb",
    "chars": 13299,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5f93b7d1\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "examples/conditional_generation/peft_prompt_tuning_seq2seq.ipynb",
    "chars": 22821,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5f93b7d1\",\n   \"metadata\": {\n    \"Execute"
  },
  {
    "path": "examples/conditional_generation/peft_prompt_tuning_seq2seq_with_generate.ipynb",
    "chars": 24614,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"5f93b7d1\",\n   \"metadata\": {\n    \"Execute"
  },
  {
    "path": "examples/conditional_generation/requirements.txt",
    "chars": 81,
    "preview": "transformers\naccelerate\nevaluate\ndeepspeed\ntqdm\ndatasets\nsafetensors\nscikit-learn"
  },
  {
    "path": "examples/corda_finetuning/README.md",
    "chars": 12849,
    "preview": "# CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuni"
  },
  {
    "path": "examples/corda_finetuning/corda_finetuning.py",
    "chars": 10343,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/corda_finetuning/datautils.py",
    "chars": 9049,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/corda_finetuning/preprocess.py",
    "chars": 4627,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/cpt_finetuning/README.md",
    "chars": 4867,
    "preview": "\n# Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods\n## Introduction ([Paper](https://"
  },
  {
    "path": "examples/cpt_finetuning/cpt_train_and_inference.ipynb",
    "chars": 78438,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"source\": [\n        \"# CPT Training and Inference\\n\",\n        "
  },
  {
    "path": "examples/delora_finetuning/README.md",
    "chars": 4361,
    "preview": "# DeLoRA: Decoupled Low-Rank Adaptation \n\n## Introduction\n[DeLoRA](https://huggingface.co/papers/2503.18225) tackles fin"
  },
  {
    "path": "examples/delora_finetuning/delora_finetuning.py",
    "chars": 6688,
    "preview": "# This script is based on examples/randlora_finetuning/randlora_finetuning.py\nimport os\n\nimport torch\nfrom datasets impo"
  },
  {
    "path": "examples/dna_language_models/dna_lm.ipynb",
    "chars": 108057,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"id\": \"db4dc272-88fe-47ad-98fd-b94d4f840dca\",\n      \"metadata\""
  },
  {
    "path": "examples/dora_finetuning/QDoRA_finetuning.ipynb",
    "chars": 264661,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"CV_gQs58bsvM\"\n   },\n   \"source\": [\n    \"# Fine"
  },
  {
    "path": "examples/dora_finetuning/README.md",
    "chars": 4374,
    "preview": "# DoRA: Weight-Decomposed Low-Rank Adaptation\n\n![dora](https://i.ytimg.com/vi/m7KQdGSr0Dg/maxresdefault.jpg)\n\n\n## Introd"
  },
  {
    "path": "examples/dora_finetuning/dora-caching.py",
    "chars": 4789,
    "preview": "# Copyright 2025-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/dora_finetuning/dora_finetuning.py",
    "chars": 7595,
    "preview": "import os\n\nimport torch\nfrom datasets import load_dataset\nfrom transformers import (\n    AutoModelForCausalLM,\n    AutoT"
  },
  {
    "path": "examples/ephemeral_gpu_offloading/load_with_dora.py",
    "chars": 4124,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/eva_finetuning/README.md",
    "chars": 9850,
    "preview": "# EVA: Explained Variance Adaptation\n## Introduction ([Paper](https://huggingface.co/papers/2410.07170), [code](https://"
  },
  {
    "path": "examples/eva_finetuning/eva_finetuning.py",
    "chars": 2893,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/eva_finetuning/eva_finetuning_multi_accelerator.py",
    "chars": 4000,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/eva_finetuning/utils.py",
    "chars": 3513,
    "preview": "# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/evaluation/lora-lm-eval.ipynb",
    "chars": 131516,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"qAkXdLL2D25p\"\n   },\n   \"source\": [\n    \"## Pef"
  },
  {
    "path": "examples/feature_extraction/peft_lora_embedding_semantic_search.py",
    "chars": 20972,
    "preview": "# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/feature_extraction/peft_lora_embedding_semantic_similarity_inference.ipynb",
    "chars": 173821,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"id\": \"3e7b6247\",\n   \"metadata\": {},\n   \"outputs\":"
  },
  {
    "path": "examples/feature_extraction/requirements.txt",
    "chars": 96,
    "preview": "peft\naccelerate\ntransformers\ndatasets==2.18.0\nevaluate\nhnswlib\npandas\ntqdm\nhuggingface_hub\nwandb"
  },
  {
    "path": "examples/fp4_finetuning/finetune_fp4_opt_bnb_peft.py",
    "chars": 6655,
    "preview": "import os\n\nimport torch\nimport torch.nn as nn\nimport transformers\nfrom datasets import load_dataset\nfrom transformers im"
  },
  {
    "path": "examples/gralora_finetuning/README.md",
    "chars": 2807,
    "preview": "# GraLoRA: Granular Low-Rank Adaptation\n\n![GraLoRA Overview](https://github.com/SqueezeBits/GraLoRA/raw/main/figure/gral"
  },
  {
    "path": "examples/gralora_finetuning/gralora_finetuning.py",
    "chars": 6819,
    "preview": "# This script is based on examples/dora_finetuning/dora_finetuning.py\nimport os\n\nimport torch\nfrom datasets import load_"
  },
  {
    "path": "examples/hra_dreambooth/README.md",
    "chars": 4570,
    "preview": "<!--Copyright 2024 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "examples/hra_dreambooth/dreambooth_inference.ipynb",
    "chars": 1276616,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"id\": \"acab479f\",\n   \"metadata\": {},\n   \"outputs\""
  },
  {
    "path": "examples/hra_dreambooth/requirements.txt",
    "chars": 156,
    "preview": "transformers==4.55.0\naccelerate==1.9.0\nevaluate\ntqdm\ndatasets==4.0.0\ndiffusers==0.34.0\nPillow\nhuggingface_hub\nsafetensor"
  },
  {
    "path": "examples/hra_dreambooth/train_dreambooth.py",
    "chars": 27872,
    "preview": "#!/usr/bin/env python\n# Copyright 2024-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version"
  },
  {
    "path": "examples/hra_dreambooth/train_dreambooth.sh",
    "chars": 7738,
    "preview": "\nCLASS_IDX=$1\n\n# Define the UNIQUE_TOKEN, CLASS_TOKENs, and SUBJECT_NAMES\nUNIQUE_TOKEN=\"qwe\"\n\nSUBJECT_NAMES=(\n    \"backp"
  },
  {
    "path": "examples/hra_dreambooth/utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "examples/hra_dreambooth/utils/args_loader.py",
    "chars": 14359,
    "preview": "# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)\n\nimport "
  },
  {
    "path": "examples/hra_dreambooth/utils/dataset.py",
    "chars": 4523,
    "preview": "# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)\n\nfrom pa"
  },
  {
    "path": "examples/hra_dreambooth/utils/tracemalloc.py",
    "chars": 2138,
    "preview": "# adapted from [peft's boft_dreambooth](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth)\n\nimport "
  },
  {
    "path": "examples/image_classification/README.md",
    "chars": 1470,
    "preview": "# Fine-tuning for image classification using LoRA and 🤗 PEFT\n\n## Vision Transformer model from transformers\n\n[![Open In "
  },
  {
    "path": "examples/image_classification/image_classification_peft_lora.ipynb",
    "chars": 590750,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"71GTxOD71mEn\"\n   },\n   \"source\": [\n    \"## Int"
  },
  {
    "path": "examples/image_classification/image_classification_timm_peft_lora.ipynb",
    "chars": 825943,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"4ef57047\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Using PEFT w"
  },
  {
    "path": "examples/int8_training/Finetune_flan_t5_large_bnb_peft.ipynb",
    "chars": 254126,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"lw1cWgq-DI5k\",\n   \"metadata\": {\n    \"id\": \"lw1cWgq-DI5k\"\n   },\n "
  },
  {
    "path": "examples/int8_training/Finetune_opt_bnb_peft.ipynb",
    "chars": 285598,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"WE5GJ6s7y0Xo\"\n   },\n   \"source\": [\n    \"## Fin"
  },
  {
    "path": "examples/int8_training/config.yaml",
    "chars": 373,
    "preview": "compute_environment: LOCAL_MACHINE\ndebug: false\ndistributed_type: MULTI_XPU\ndowncast_bf16: 'no'\nenable_cpu_affinity: fal"
  },
  {
    "path": "examples/int8_training/fine_tune_blip2_int8.py",
    "chars": 3545,
    "preview": "# Copyright 2023-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  }
]

// ... and 509 more files (download for full content)

About this extraction

This page contains the full source code of the huggingface/peft GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 709 files (20.5 MB), approximately 4.0M tokens, and a symbol index with 295 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo