Full Code of Lightning-AI/litgpt for AI

main 162ad9bee317 cached

233 files

1.8 MB

484.2k tokens

1090 symbols

1 requests

Download .txt

Showing preview only (1,896K chars total). Download the full file or copy to clipboard to get everything.

Repository: Lightning-AI/litgpt
Branch: main
Commit: 162ad9bee317
Files: 233
Total size: 1.8 MB

Directory structure:
gitextract_ctr2cg_x/

├── .devcontainer/
│   ├── Dockerfile
│   └── devcontainer.json
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── ask-a-question.md
│   │   ├── bug-report.yaml
│   │   └── feature-request.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── check-links.yml
│       ├── cpu-tests.yml
│       ├── mkdocs-deploy.yml
│       └── publish-pkg.yml
├── .gitignore
├── .lightning/
│   └── workflows/
│       └── tests.yaml
├── .pre-commit-config.yaml
├── CITATION.cff
├── LICENSE
├── README.md
├── config_hub/
│   ├── finetune/
│   │   ├── README.md
│   │   ├── falcon-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma-2b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma2-2b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma2-9b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-2-7b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3-8b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.1-8b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.2-1B/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.2-3B/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── mistral-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── mistral-7b-v0.2/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── phi-2/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── phi-3/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── stablelm-base-alpha-3b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   └── tiny-llama/
│   │       ├── full.yaml
│   │       ├── lora.yaml
│   │       └── qlora.yaml
│   └── pretrain/
│       ├── debug.yaml
│       ├── microllama.yaml
│       ├── tinyllama.yaml
│       └── tinystories.yaml
├── extensions/
│   ├── thunder/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── pretrain.py
│   │   ├── strategies/
│   │   │   ├── __init__.py
│   │   │   ├── thunder_ddp.py
│   │   │   └── thunder_fsdp.py
│   │   └── unsloth/
│   │       ├── __init__.py
│   │       ├── executor.py
│   │       └── kernels/
│   │           ├── __init__.py
│   │           ├── cross_entropy_loss.py
│   │           ├── rope_embedding.py
│   │           ├── swiglu.py
│   │           └── utils.py
│   └── xla/
│       ├── README.md
│       ├── __init__
│       ├── finetune/
│       │   ├── __init__
│       │   └── adapter.py
│       ├── generate/
│       │   ├── __init__
│       │   ├── adapter.py
│       │   └── base.py
│       ├── scripts/
│       │   ├── __init__
│       │   └── prepare_alpaca.py
│       └── utils.py
├── litgpt/
│   ├── __init__.py
│   ├── __main__.py
│   ├── adapter.py
│   ├── adapter_v2.py
│   ├── api.py
│   ├── args.py
│   ├── chat/
│   │   ├── __init__.py
│   │   └── base.py
│   ├── config.py
│   ├── constants.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── alpaca.py
│   │   ├── alpaca_2k.py
│   │   ├── alpaca_gpt4.py
│   │   ├── base.py
│   │   ├── deita.py
│   │   ├── flan.py
│   │   ├── json_data.py
│   │   ├── lima.py
│   │   ├── lit_data.py
│   │   ├── longform.py
│   │   ├── microllama.py
│   │   ├── openwebtext.py
│   │   ├── prepare_slimpajama.py
│   │   ├── prepare_starcoder.py
│   │   ├── text_files.py
│   │   ├── tinyllama.py
│   │   └── tinystories.py
│   ├── deploy/
│   │   ├── __init__.py
│   │   └── serve.py
│   ├── eval/
│   │   └── evaluate.py
│   ├── finetune/
│   │   ├── __init__.py
│   │   ├── adapter.py
│   │   ├── adapter_v2.py
│   │   ├── full.py
│   │   ├── lora.py
│   │   └── lora_legacy.py
│   ├── generate/
│   │   ├── __init__.py
│   │   ├── adapter.py
│   │   ├── adapter_v2.py
│   │   ├── base.py
│   │   ├── full.py
│   │   ├── sequentially.py
│   │   ├── speculative_decoding.py
│   │   └── tp.py
│   ├── lora.py
│   ├── model.py
│   ├── parser_config.py
│   ├── pretrain.py
│   ├── prompts.py
│   ├── scripts/
│   │   ├── __init__.py
│   │   ├── convert_hf_checkpoint.py
│   │   ├── convert_lit_checkpoint.py
│   │   ├── convert_pretrained_checkpoint.py
│   │   ├── download.py
│   │   └── merge_lora.py
│   ├── tokenizer.py
│   ├── types.py
│   └── utils.py
├── pyproject.toml
├── tests/
│   ├── conftest.py
│   ├── convert/
│   │   ├── __init__.py
│   │   ├── test_hf_checkpoint.py
│   │   ├── test_lit_checkpoint.py
│   │   └── test_pretrained_checkpoint.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── _fixtures/
│   │   │   ├── alpaca.json
│   │   │   ├── dolly.json
│   │   │   ├── longform_train.json
│   │   │   └── longform_val.json
│   │   ├── test_alpaca.py
│   │   ├── test_base.py
│   │   ├── test_deita.py
│   │   ├── test_json.py
│   │   ├── test_lit_data.py
│   │   ├── test_longform.py
│   │   ├── test_openwebtext.py
│   │   ├── test_textfiles.py
│   │   ├── test_tinyllama.py
│   │   └── test_tinystories.py
│   ├── ext_thunder/
│   │   ├── __init__.py
│   │   ├── test_thunder_distributed.py
│   │   ├── test_thunder_networks.py
│   │   ├── test_thunder_pretrain.py
│   │   └── test_unsloth_executor.py
│   ├── generate/
│   │   ├── __init__.py
│   │   ├── test_adapter.py
│   │   ├── test_main.py
│   │   ├── test_sequentially.py
│   │   ├── test_tp.py
│   │   └── utils.py
│   ├── test_adapter.py
│   ├── test_adapter_v2.py
│   ├── test_api.py
│   ├── test_args.py
│   ├── test_batch.py
│   ├── test_chat.py
│   ├── test_ci.py
│   ├── test_cli.py
│   ├── test_config.py
│   ├── test_config_hub.py
│   ├── test_deepseek_moe.py
│   ├── test_distributed.py
│   ├── test_evaluate.py
│   ├── test_full.py
│   ├── test_generate_speculatively.py
│   ├── test_lora.py
│   ├── test_merge_lora.py
│   ├── test_model.py
│   ├── test_multihead_latent_attention.py
│   ├── test_pretrain.py
│   ├── test_prompts.py
│   ├── test_readme.py
│   ├── test_rope.py
│   ├── test_serve.py
│   ├── test_tokenizer.py
│   ├── test_trainer_support.py
│   ├── test_types.py
│   ├── test_utils.py
│   └── test_yarn.py
└── tutorials/
    ├── 0_to_litgpt.md
    ├── convert_hf_checkpoint.md
    ├── convert_lit_models.md
    ├── deploy.md
    ├── developer-docs/
    │   ├── README.md
    │   ├── adding-models.md
    │   └── python-api.md
    ├── download_model_weights.md
    ├── evaluation.md
    ├── examples/
    │   └── ptl-trainer/
    │       ├── README.md
    │       ├── litgpt_ptl_medium.py
    │       └── litgpt_ptl_small.py
    ├── finetune.md
    ├── finetune_adapter.md
    ├── finetune_full.md
    ├── finetune_lora.md
    ├── full_finetune_example.py
    ├── inference.md
    ├── mkdocs.yml
    ├── oom.md
    ├── prepare_dataset.md
    ├── pretrain.md
    ├── pretrain_tinyllama.md
    ├── python-api.md
    ├── quantize.md
    └── resource-tables.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .devcontainer/Dockerfile
================================================
# See here for image contents: https://github.com/devcontainers/images/blob/main/src/python/.devcontainer/Dockerfile

# [Choice] Python version (use -bookworm or -bullseye variants on local arm64/Apple Silicon): 3, 3.12, 3.11, 3.10, 3.9, 3.8, 3-bookworm, 3.12-bookworm, 3.11-bookworm, 3.10-bookworm, 3.9-bookworm, 3.8-bookworm, 3-bullseye, 3.12-bullseye, 3.11-bullseye, 3.10-bullseye, 3.9-bullseye, 3.8-bullseye, 3-buster, 3.12-buster, 3.11-buster, 3.10-buster, 3.9-buster, 3.8-buster
ARG VARIANT=3-bookworm
FROM mcr.microsoft.com/devcontainers/python:1-${VARIANT}

# Temporary: Upgrade python packages due to https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-40897
# They are installed by the base image (python) which does not have the patch.
RUN python3 -m pip install --upgrade pip setuptools


================================================
FILE: .devcontainer/devcontainer.json
================================================
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.194.0/containers/python-3
{
  "name": "Python 3 (litgpt)",
  "build": {
    "dockerfile": "Dockerfile",
    "context": "..",
    "args": {
      "VARIANT": "3.11-bookworm"
    }
  },
  "runArgs": [
    // Enable GPU passthrough, requires WSL2 on Windows
    //"--gpus=all",
    // One of the following options is required for torch multiprocessing
    //"--ipc=host",
    //"--shm-size=4gb",
  ],
  // Features to add to the dev container. More info: https://containers.dev/features.
  "features": {
    "ghcr.io/devcontainers/features/git:1": {},
    "ghcr.io/devcontainers/features/git-lfs:1": {},
    //"ghcr.io/devcontainers/features/nvidia-cuda:1": {},
    "ghcr.io/devcontainers-extra/features/actionlint:1": {},
    "ghcr.io/devcontainers-extra/features/pre-commit:2": {},
    "ghcr.io/dhoeric/features/act:1": {},
    "ghcr.io/devcontainers/features/docker-in-docker:2": {
      "version": "latest",
      "moby": true
    }
  },
  // Set *default* container specific settings.json values on container create.
  "customizations": {
    "vscode": {
      "settings": {
        "editor.tabSize": 4,
        "editor.renderWhitespace": "all",
        "editor.formatOnSave": true,
        "editor.rulers": [120],
        "files.exclude": {
          "**/__pycache__": true
        },
        "python.pythonPath": "/usr/local/bin/python",
        "python.defaultInterpreterPath": "/usr/local/bin/python",
        "python.languageServer": "Pylance",
        "python.analysis.autoImportCompletions": true,
        "python.analysis.completeFunctionParens": true,
        "python.analysis.autoSearchPaths": true,
        "python.testing.pytestArgs": ["tests"],
        "python.testing.unittestEnabled": false,
        "python.testing.pytestEnabled": true,
        "code-eol.highlightNonDefault": true,
        "code-eol.highlightExtraWhitespace": true,
        "autoDocstring.docstringFormat": "google-notypes",
        "autoDocstring.guessTypes": true,
        "autoDocstring.generateDocstringOnEnter": true,
        "autoDocstring.startOnNewLine": true,
        "telemetry.telemetryLevel": "off",
        "[python]": {
          "editor.formatOnSave": true,
          "editor.defaultFormatter": "charliermarsh.ruff",
          "editor.codeActionsOnSave": {
            "source.organizeImports": "always",
            "source.fixAll": "always"
          }
        }
      },
      // Add the IDs of extensions you want installed when the container is created.
      "extensions": [
        "ms-python.python",
        "ms-python.vscode-pylance",
        "ms-toolsai.jupyter",
        "GitHub.copilot",
        "GitHub.copilot-chat",
        "github.vscode-github-actions",
        "SanjulaGanepola.github-local-actions",
        "charliermarsh.ruff",
        "esbenp.prettier-vscode",
        "ms-vscode.test-adapter-converter",
        "njqdev.vscode-python-typehint",
        "KevinRose.vsc-python-indent",
        "medo64.render-crlf",
        "shardulm94.trailing-spaces",
        "nhoizey.gremlins",
        "wayou.vscode-todo-highlight",
        "Gruntfuggly.todo-tree",
        "njpwerner.autodocstring",
        "rodolphebarbanneau.python-docstring-highlighter",
        "mechatroner.rainbow-csv",
        "uctakeoff.vscode-counter",
        "bierner.github-markdown-preview",
        "yahyabatulu.vscode-markdown-alert",
        "ms-vscode-remote.vscode-remote-extensionpack",
        "ms-azuretools.vscode-docker",
        "redhat.vscode-yaml"
      ]
    }
  },
  // Use 'forwardPorts' to make a list of ports inside the container available locally.
  // "forwardPorts": [],
  // Use 'postCreateCommand' to run commands after the container is created.
  "postCreateCommand": "pre-commit install && pip install '.[extra,compiler,test]' -U",
  // Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
  "remoteUser": "vscode"
}


================================================
FILE: .github/CODEOWNERS
================================================
* @lantiga @t-vi @lianakoleva @KaelanDt @k223kim @andyland
/README.md                           @williamfalcon @lantiga @lianakoleva


================================================
FILE: .github/ISSUE_TEMPLATE/ask-a-question.md
================================================
---
name: Ask a Question
about: Ask and answer questions related to LitGPT
title: ''
labels: question

---

Please describe your question here.


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yaml
================================================
name: Bug Report
description: Report errors related to LitGPT
title: "Description"
labels: bug
body:
  - type: markdown
    attributes:
      value: |
        Thank you for taking the time to report an issue. Please fill out the details below to help us resolve it.

  - type: textarea
    id: bug_description
    attributes:
      label: Bug description
      description: A description of the issue.
      placeholder: |
        Please provide a description of what the bug or issue is.
    validations:
      required: true

  - type: input
    attributes:
      label: Reproduced in studio
      description: >
        Create a new Lightning Studio with code that reproduces the issue and share the link.
        Also include all the relevant files and data required to reproduce shared issue.
        In case the code does not crash, please add assert statements to show what is the real and expected output.
        A simple guide on how to create such a studio can be found [here](https://www.youtube.com/watch?v=YcW-2Zt_bFg&ab_channel=LightningAI).
      placeholder: https://lightning.ai/...
    validations:
      required: false

  - type: dropdown
    id: operating_system
    attributes:
      label: What operating system are you using?
      description: If applicable, please select the operating system where you experienced this issue.
      options:
        - "Unknown"
        - "macOS"
        - "Linux"
        - "Windows"
    validations:
      required: true

  - type: textarea
    id: version
    attributes:
      label: LitGPT Version
      description: |
        Please provide details about your LitGPT version by running the following code in your terminal:
        ```
        pip show litgpt | grep Version:
        ```
    validations:
      required: false


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.md
================================================
---
name: Suggest a Feature
about: Propose a new feature or enhancement
title: ''
labels: enhancement

---

Please describe the feature or enhancement along with the intended usecase.


================================================
FILE: .github/dependabot.yml
================================================
# Basic dependabot.yml file with
# minimum configuration for two package managers

version: 2
updates:
  # Enable version updates for python
  - package-ecosystem: "pip"
    # Look for a `requirements` in the `root` directory
    directory: "/"
    # Check for updates once a week
    schedule:
      interval: "monthly"
    # Labels on pull requests for version updates only
    labels:
      - "dependencies"
    pull-request-branch-name:
      # Separate sections of the branch name with a hyphen
      # for example, `dependabot-npm_and_yarn-next_js-acorn-6.4.1`
      separator: "-"
    # Allow up to 5 open pull requests for pip dependencies
    open-pull-requests-limit: 3

  # Enable version updates for GitHub Actions
  - package-ecosystem: "github-actions"
    directory: "/"
    # Check for updates once a week
    schedule:
      interval: "weekly"
    # Labels on pull requests for version updates only
    labels:
      - "CI / actions"
    pull-request-branch-name:
      # Separate sections of the branch name with a hyphen
      # for example, `dependabot-npm_and_yarn-next_js-acorn-6.4.1`
      separator: "-"
    # Allow up to 5 open pull requests for GitHub Actions
    open-pull-requests-limit: 1
    groups:
      GHA-updates:
        patterns:
          - "*"


================================================
FILE: .github/workflows/check-links.yml
================================================
name: Check hyperlinks

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v6

      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: "3.10"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install "mistune<3.1"  # a newer version is incompatible with nbconvert
          pip install pytest pytest-check-links

      - name: Check links
        run: |
          pytest --check-links README.md --check-links-ignore "http*"
          pytest --check-links tutorials --check-links-ignore "http*"


================================================
FILE: .github/workflows/cpu-tests.yml
================================================
name: CPU tests

on:
  push:
    branches: [main]
  pull_request_target:
    branches: [main]
    types: [opened, reopened, ready_for_review, labeled, synchronize]
  pull_request: {} # todo
  workflow_dispatch: {}

# lock down all permissions by default
permissions:
  contents: read # needed to check out code
  checks: write # needed for test results
  pull-requests: read # needed for PR metadata
  actions: read # needed to use actions
  security-events: none
  statuses: write # needed to update commit status

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
  cancel-in-progress: ${{ startsWith(github.event_name, 'pull_request') }}

defaults:
  run:
    shell: bash

env:
  HF_HOME: .cache-HF # Define HF_HOME for caching
  TRANSFORMERS_CACHE: .cache-HF/transformers
  DATASETS_CACHE: .cache-HF/datasets
  HF_DATASETS_CACHE: .cache-HF/datasets
  TORCH_URL: "https://download.pytorch.org/whl/cpu/"

jobs:
  testing-imports:
    runs-on: ${{ matrix.os }}
    if: github.event_name != 'pull_request_target'
    strategy:
      fail-fast: false
      matrix:
        os: ["ubuntu-22.04", "ubuntu-24.04", "macOS-14", "windows-2022"]
        python-version: ["3.10"]
    timeout-minutes: 10
    steps:
      - name: Checkout generic
        uses: actions/checkout@v6
      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install minimal dependencies
        run: |
          pip install . -U --extra-index-url="${TORCH_URL}"
          pip list

      - name: Testing package imports
        # make sure all modules are still importable with only the minimal dependencies available
        run: |
          modules=$(
            find litgpt -type f -name "*.py" | \
            sed 's/\.py$//' | sed 's/\//./g' | \
            sed 's/.__init__//g' | xargs -I {} echo "import {};"
          )
          echo "$modules"
          python -c "$modules"

  pytester:
    # Route PRs based on contributor type to avoid duplicate runs:
    # - Collaborators: use pull_request (tests workflow changes from PR)
    # - External forks: use pull_request_target (uses trusted workflow from main)
    # - Always run for push to main and workflow_dispatch
    if: |
      (github.event_name == 'pull_request' && contains('OWNER,MEMBER,COLLABORATOR', github.event.pull_request.author_association)) ||
      (github.event_name == 'pull_request_target' && !contains('OWNER,MEMBER,COLLABORATOR', github.event.pull_request.author_association)) ||
      (github.event_name != 'pull_request' && github.event_name != 'pull_request_target')
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: ["ubuntu-22.04"]
        python-version: ["3.10", "3.11", "3.12", "3.13"]
        requires: ["latest"]
        include:
          - { os: "ubuntu-22.04", python-version: "3.10", requires: "oldest" }
          - { os: "windows-2022", python-version: "3.10", requires: "latest" }
          - { os: "macOS-14", python-version: "3.10", requires: "latest" }
    timeout-minutes: 35
    steps:
      - name: Checkout generic
        uses: actions/checkout@v6
        if: github.event_name != 'pull_request_target'
      - name: Checkout for `pull_request_target`
        uses: actions/checkout@v6
        if: github.event_name == 'pull_request_target'
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - uses: actions/setup-python@v6
        with:
          python-version: ${{ matrix.python-version }}
          cache-dependency-path: pyproject.toml
          cache: "pip"

      # Add caching for HF models and tokenizers
      - name: HF cache
        uses: actions/cache@v5
        continue-on-error: true
        with:
          path: .cache-HF
          key: hf-cache_${{ runner.os }}-py${{ matrix.python-version }}
          restore-keys: |
            hf-cache_${{ runner.os }}-py${{ matrix.python-version }}
            hf-cache_${{ runner.os }}-
            hf-cache_

      - name: Set min. dependencies
        if: matrix.requires == 'oldest'
        run: |
          pip install 'lightning-utilities[cli]>=0.15.1'
          python -m lightning_utilities.cli requirements set-oldest --req_files=pyproject.toml
      - name: Install dependencies
        run: |
          pip install '.[extra,compiler,test]' -U --upgrade-strategy eager --extra-index-url="${TORCH_URL}"
          pip list

      - name: Run tests
        env:
          HF_TOKEN: ${{ secrets.HF_TOKEN }}
        run: pytest -v litgpt/ tests/ --timeout=180 --durations=100

      - name: Show cache
        run: |
          pip install -q py-tree
          python -m py_tree -d 1 .cache-HF

  testing-guardian:
    runs-on: ubuntu-latest
    needs: [pytester, testing-imports]
    if: |
      (github.event_name == 'pull_request_target' && !contains('OWNER,MEMBER,COLLABORATOR', github.event.pull_request.author_association)) ||
      (github.event_name == 'pull_request' && contains('OWNER,MEMBER,COLLABORATOR', github.event.pull_request.author_association))
    steps:
      - run: echo "${{ needs.pytester.result }}"
      - name: failing...
        if: needs.pytester.result == 'failure'
        run: exit 1
      - name: cancelled or skipped...
        if: contains(fromJSON('["cancelled", "skipped"]'), needs.pytester.result)
        timeout-minutes: 1
        run: sleep 90


================================================
FILE: .github/workflows/mkdocs-deploy.yml
================================================
name: Deploy MkDocs

on:
  push:
    branches: [main]

permissions:
  contents: write

jobs:
  deploy:
    runs-on: ubuntu-24.04
    steps:
      # Step 1: Checkout the repository
      - uses: actions/checkout@v6

      # Step 2: Set up Python
      - uses: actions/setup-python@v6
        with:
          python-version: "3.x"
          cache: "pip"

      # Step 3: Install MkDocs and dependencies
      - run: pip install mkdocs mkdocs-material mkdocs-pagetree-plugin
      # Step 4: Deploy to GitHub Pages
      - run: |
          mkdir -p gh-pages/docs
          cp -r tutorials/* gh-pages/docs
          cd gh-pages
          mv docs/mkdocs.yml mkdocs.yml
          echo "{{ pagetree }}" > docs/index.md
          mkdocs gh-deploy --force


================================================
FILE: .github/workflows/publish-pkg.yml
================================================
# To create a release, create a tag and push it to GitHub:
#git tag -a "v0.0.1-beta" -m "beta version testing"
#git push --tags
# https://dev.to/iamtekson/publish-package-to-pypi-and-release-new-version-using-github-actions-108k
name: Publish LitGPT to PyPI

on:
  push:
    tags:
      - "v*"
jobs:
  build-n-publish:
    name: Build and publish to PyPI
    runs-on: ubuntu-latest
    environment:
      name: pypi
      url: https://pypi.org/p/litgpt
    permissions:
      id-token: write

    steps:
      - name: Checkout source
        uses: actions/checkout@v6

      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: "3.x"
          cache: "pip"

      - name: Build source and wheel distributions
        run: |
          python -m pip install --upgrade build twine
          pip install importlib_metadata==7.2.1
          python -m build
          twine check --strict dist/*
      - name: Publish distribution to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          user: __token__
          password: ${{ secrets.PYPI_API_TOKEN }}


================================================
FILE: .gitignore
================================================
.ipynb_checkpoints/
__pycache__
.idea
.DS_Store
*.egg-info
build
dist
.venv
.venv/
.vscode
uv.lock

# data
data
datasets
!litgpt/data
!tests/data
checkpoints
out
wandb
events.out.tfevents*

# test artifacts from tests/test_readme.py
**/custom_finetuning_dataset.json
client.py
**/custom_texts/


================================================
FILE: .lightning/workflows/tests.yaml
================================================
trigger:
  push:
    branches: ["main"]
  pull_request:
    branches: ["main"]

image: "pytorchlightning/lightning-thunder:ubuntu24.04-cuda12.8.1-cudnn-fe1.15.0-py3.12-pt_2.8.0-dev"
machine: "L4_X_2"
interruptible: "true"
timeout: "45" # minutes
parametrize:
  matrix:
    dependency: ["", "compiler"]
  include: []
  exclude: []

env:
  SKIP_WITH_CI: "1" # skip single tests with CI
  NCCL_DEBUG: "INFO"
  CUBLAS_WORKSPACE_CONFIG: ":4096:8"
  NCCL_IGNORE_DISABLED_P2P: "1"
  TORCH_VERSION: "2.8.0"
  RUN_ONLY_CUDA_TESTS: "1" # run CUDA tests only

run: |
  whereis nvidia
  nvidia-smi
  python --version
  pip --version
  pip list
  set -ex

  echo "Install uv and create virtual environment"
  curl -LsSf https://astral.sh/uv/install.sh | sh
  [ -f "$HOME/.local/bin/env" ] && . "$HOME/.local/bin/env"
  export PATH="$HOME/.local/bin:$PATH"
  uv venv .venv --system-site-packages
  . .venv/bin/activate
  hash -r

  uv pip install -q '.[extra,test]' "torch==${TORCH_VERSION}" cffi -U

  if [ "${dependency}" == "compiler" ]; then
    uv pip uninstall torchvision torchaudio
    uv pip install -q '.[compiler,extra,test]' "torch==${TORCH_VERSION}"
    python -c "from thunder.executors import nvfuser_available ; assert nvfuser_available(), 'nvFuser is missing!'"
    python -c "from thunder.executors.triton_utils import triton_version ; assert triton_version() is not None, 'triton is missing!'"
  fi

  uv pip list
  python -c "import torch ; gpus = torch.cuda.device_count() ; assert gpus >= 2, f'GPU: {gpus}'"
  python -c "from torch import __version__ as ver ; assert str(ver).split('+')[0] == '${TORCH_VERSION}', f'PyTorch: installed {ver} but expected ${TORCH_VERSION}'"

  pytest -v --durations=100

  wget https://raw.githubusercontent.com/Lightning-AI/utilities/main/scripts/run_standalone_tests.sh
  PL_RUN_STANDALONE_TESTS=1 bash run_standalone_tests.sh "tests"

  if [ "${dependency}" == "compiler" ]; then
    uv pip uninstall lightning-thunder transformers
    # install thunder from source, so that, thunder.tests will be available
    uv pip install -U "lightning-thunder[test] @ git+https://github.com/Lightning-AI/lightning-thunder.git" "torch==${TORCH_VERSION}"
    # Pin transformers to match thunder's test_networks.py requirements
    # See: https://github.com/Lightning-AI/lightning-thunder/blob/main/requirements/test.txt
    # Get transformers version from thunder requirements
    TRANSFORMERS_VERSION=$(curl -fsSL https://raw.githubusercontent.com/Lightning-AI/lightning-thunder/main/requirements/test.txt \
      | grep '^transformers==' \
      | cut -d'=' -f3 \
      | cut -d'#' -f1 \
      | xargs)
    if [ -z "${TRANSFORMERS_VERSION}" ]; then
      echo "Error: Could not determine transformers version from lightning-thunder requirements"
      exit 1
    fi
    uv pip install transformers==${TRANSFORMERS_VERSION}
    # without env var, it filters out all tests
    RUN_ONLY_CUDA_TESTS=0 pytest tests/ext_thunder/test_thunder_networks.py -v
  fi


================================================
FILE: .pre-commit-config.yaml
================================================
# Copyright The Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

default_language_version:
  python: python3

ci:
  autofix_prs: true
  autoupdate_commit_msg: "[pre-commit.ci] pre-commit suggestions"
  autoupdate_schedule: quarterly
  # submodules: true

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v6.0.0
    hooks:
      - id: end-of-file-fixer
      - id: trailing-whitespace
        exclude: README.md
      - id: check-yaml
      - id: check-toml
      #- id: check-docstring-first
      #- id: check-executables-have-shebangs
      - id: check-case-conflict
      - id: check-added-large-files
        args: ["--maxkb=250", "--enforce-all"]
      - id: detect-private-key

  - repo: https://github.com/codespell-project/codespell
    rev: v2.4.1
    hooks:
      - id: codespell
        additional_dependencies: [tomli]
        args: ["--write-changes"]
        exclude: pyproject.toml

  #- repo: https://github.com/crate-ci/typos
  #  rev: dictgen-v0.3.1
  #  hooks:
  #    - id: typos
  #      args: [] # empty to do not write fixes
  #      exclude: pyproject.toml

  #- repo: https://github.com/executablebooks/mdformat
  #  rev: 0.7.21
  #  hooks:
  #    - id: mdformat
  #      args: ["--number"]
  #      additional_dependencies:
  #        - mdformat-gfm
  #        - mdformat-black
  #        - mdformat_frontmatter

  - repo: https://github.com/pre-commit/mirrors-prettier
    rev: v3.1.0
    hooks:
      - id: prettier
        files: \.(json|yml|yaml|toml)
        # https://prettier.io/docs/en/options.html#print-width
        args: ["--print-width=140"]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.14.10
    hooks:
      - id: ruff
        args: ["--fix"]
      - id: ruff-format
      - id: ruff

  - repo: https://github.com/tox-dev/pyproject-fmt
    rev: v2.11.1
    hooks:
      - id: pyproject-fmt
        additional_dependencies: [tox]
  - repo: https://github.com/abravalheri/validate-pyproject
    rev: v0.24.1
    hooks:
      - id: validate-pyproject


================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, you can cite it as shown below."
title: "LitGPT"
abstract: "20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale."
date-released: 2023-03-22
authors:
  - name: "The Lightning AI team"
license: "Apache-2.0"
url: "https://github.com/Lightning-AI/litgpt"


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [2023] Lightning AI

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
<div align="center">


# ⚡ LitGPT

**20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.**

<pre>
✅ From scratch implementations      ✅ No abstractions         ✅ Beginner friendly
   ✅ Flash attention                   ✅ FSDP                    ✅ LoRA, QLoRA, Adapter
✅ Reduce GPU memory (fp4/8/16/32)   ✅ 1-1000+ GPUs/TPUs       ✅ 20+ LLMs         
</pre>


---


![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pytorch-lightning)
![cpu-tests](https://github.com/lightning-AI/lit-stablelm/actions/workflows/cpu-tests.yml/badge.svg) [![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/Lightning-AI/lit-stablelm/blob/master/LICENSE) [![Discord](https://img.shields.io/discord/1077906959069626439)](https://discord.gg/VptPCZkGNa)

<p align="center">
  <a href="#quick-start">Quick start</a> •
  <a href="#choose-from-20-llms">Models</a> •
  <a href="#finetune-an-llm">Finetune</a> •
  <a href="#deploy-an-llm">Deploy</a> •
  <a href="#all-workflows">All workflows</a> •
  <a href="#state-of-the-art-features">Features</a> •
  <a href="#training-recipes">Recipes (YAML)</a> •
  <a href="https://lightning.ai/">Lightning AI</a> •
    <a href="#tutorials">Tutorials</a>
</p>

&nbsp;

<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-quick-start">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/get-started-badge.svg" height="36px" alt="Get started"/>
</a>

&nbsp;

</div>

# Looking for GPUs?
Over 340,000 developers use [Lightning Cloud](https://lightning.ai/?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme) - purpose-built for PyTorch and PyTorch Lightning. 
- [GPUs](https://lightning.ai/pricing?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme) from $0.19.   
- [Clusters](https://lightning.ai/clusters?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme): frontier-grade training/inference clusters.   
- [AI Studio (vibe train)](https://lightning.ai/studios?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme): workspaces where AI helps you debug, tune and vibe train.
- [AI Studio (vibe deploy)](https://lightning.ai/studios?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme): workspaces where AI helps you optimize, and deploy models.     
- [Notebooks](https://lightning.ai/notebooks?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme): Persistent GPU workspaces where AI helps you code and analyze.
- [Inference](https://lightning.ai/deploy?utm_source=litgpt_readme&utm_medium=referral&utm_campaign=litgpt_readme): Deploy models as inference APIs.

# Finetune, pretrain, and inference LLMs Lightning fast ⚡⚡
Every LLM is implemented from scratch with **no abstractions** and **full control**, making them blazing fast, minimal, and performant at enterprise scale.

✅ **Enterprise ready -** Apache 2.0 for unlimited enterprise use.</br>
✅ **Developer friendly -** Easy debugging with no abstraction layers and single file implementations.</br>
✅ **Optimized performance -** Models designed to maximize performance, reduce costs, and speed up training.</br>
✅ **Proven recipes -** Highly-optimized training/finetuning recipes tested at enterprise scale.</br>

&nbsp;

# Quick start
Install LitGPT
```
pip install 'litgpt[extra]'
```

Load and use any of the [20+ LLMs](#choose-from-20-llms):
```python
from litgpt import LLM

llm = LLM.load("microsoft/phi-2")
text = llm.generate("Fix the spelling: Every fall, the family goes to the mountains.")
print(text)
# Corrected Sentence: Every fall, the family goes to the mountains.
```

&nbsp;

✅ Optimized for fast inference</br>
✅ Quantization</br>
✅ Runs on low-memory GPUs</br>
✅ No layers of internal abstractions</br>
✅ Optimized for production scale</br>

<details>
  <summary>Advanced install options</summary>

Install from source:

```bash
git clone https://github.com/Lightning-AI/litgpt
cd litgpt
# if using uv
uv sync --all-extras
# if using pip
pip install -e ".[extra,compiler,test]"
```
</details>

[Explore the full Python API docs](tutorials/python-api.md).

&nbsp;

---
# Choose from 20+ LLMs
Every model is written from scratch to maximize performance and remove layers of abstraction:

| Model | Model size | Author | Reference |
|----|----|----|----|
| Llama 3, 3.1, 3.2, 3.3 | 1B, 3B, 8B, 70B, 405B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                           |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                       |
| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                     |
| Gemma 2 | 2B, 9B, 27B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf)  |
| Phi 4 | 14B | Microsoft Research | [Abdin et al. 2024](https://arxiv.org/abs/2412.08905)                                                                            |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/)                                               |
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186)                                          |
| R1 Distill Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)                                                                                 |
| ... | ... | ... | ...   |

<details>
  <summary>See full list of 20+ LLMs</summary>

&nbsp;

#### All models

| Model | Model size | Author | Reference |
|----|----|----|----|
| CodeGemma | 7B | Google | [Google Team, Google Deepmind](https://ai.google.dev/gemma/docs/codegemma)                                                                 |
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [Rozière et al. 2023](https://arxiv.org/abs/2308.12950)                                                                   |
| Falcon | 7B, 40B, 180B | TII UAE | [TII 2023](https://falconllm.tii.ae)                                                                                              |
| Falcon 3 | 1B, 3B, 7B, 10B | TII UAE | [TII 2024](https://huggingface.co/blog/falcon3)                                                                                              |
| FreeWilly2 (Stable Beluga 2) | 70B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stable-beluga-large-instruction-fine-tuned-models)                 |
| Function Calling Llama 2 | 7B | Trelis | [Trelis et al. 2023](https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2)                                  |
| Gemma | 2B, 7B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf)                                       |
| Gemma 2 | 9B, 27B | Google | [Google Team, Google Deepmind](https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf)                                  |
| Gemma 3 | 1B, 4B, 12B, 27B | Google | [Google Team, Google Deepmind](https://arxiv.org/pdf/2503.19786)                                  |
| Llama 2 | 7B, 13B, 70B | Meta AI | [Touvron et al. 2023](https://arxiv.org/abs/2307.09288)                                                                           |
| Llama 3.1 | 8B, 70B | Meta AI | [Meta AI 2024](https://github.com/meta-llama/llama3)                                                                                 |
| Llama 3.2 | 1B, 3B | Meta AI | [Meta AI 2024](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/)                                           |
| Llama 3.3 | 70B | Meta AI | [Meta AI 2024](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)                                                                                 |
| Mathstral | 7B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mathstral/)                                                                                  |
| MicroLlama | 300M | Ken Wang | [MicroLlama repo](https://github.com/keeeeenw/MicroLlama)                                                                             |
| Mixtral MoE | 8x7B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/mixtral-of-experts/)                                                                     |
| Mistral | 7B, 123B | Mistral AI | [Mistral AI 2023](https://mistral.ai/news/announcing-mistral-7b/)                                                                  |
| Mixtral MoE | 8x22B | Mistral AI | [Mistral AI 2024](https://mistral.ai/news/mixtral-8x22b/)                                                                         |
| OLMo | 1B, 7B | Allen Institute for AI (AI2) | [Groeneveld et al. 2024](https://aclanthology.org/2024.acl-long.841/)    |
| OpenLLaMA | 3B, 7B, 13B | OpenLM Research | [Geng & Liu 2023](https://github.com/openlm-research/open_llama)                                                         |
| Phi 1.5 & 2 | 1.3B, 2.7B | Microsoft Research  | [Li et al. 2023](https://arxiv.org/abs/2309.05463)                                                                  |
| Phi 3 | 3.8B | Microsoft Research | [Abdin et al. 2024](https://arxiv.org/abs/2404.14219)                                                                            |
| Phi 4 | 14B | Microsoft Research | [Abdin et al. 2024](https://arxiv.org/abs/2412.08905)                                                                            |
| Phi 4 Mini Instruct | 3.8B | Microsoft Research | [Microsoft 2025](https://arxiv.org/abs/2503.01743)                                           |
| Phi 4 Mini Reasoning | 3.8B | Microsoft Research | [Xu, Peng et al. 2025](https://arxiv.org/abs/2504.21233)                                           |
| Phi 4 Reasoning | 3.8B | Microsoft Research | [Abdin et al. 2025](https://arxiv.org/abs/2504.21318)                                           |
| Phi 4 Reasoning Plus | 3.8B | Microsoft Research | [Abdin et al. 2025](https://arxiv.org/abs/2504.21318)                                           |
| Platypus | 7B, 13B, 70B |  Lee et al. | [Lee, Hunter, and Ruiz 2023](https://arxiv.org/abs/2308.07317)                                                               |
| Pythia | {14,31,70,160,410}M, {1,1.4,2.8,6.9,12}B | EleutherAI | [Biderman et al. 2023](https://arxiv.org/abs/2304.01373)                                            |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwen2.5/)                                               |
| Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [Hui, Binyuan et al. 2024](https://arxiv.org/abs/2409.12186)                                          |
| Qwen2.5 1M (Long Context) | 7B, 14B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwen2.5-1m/)                                          |
| Qwen2.5 Math | 1.5B, 7B, 72B | Alibaba Group | [An, Yang et al. 2024](https://arxiv.org/abs/2409.12122)                                          |
| QwQ | 32B | Alibaba Group | [Qwen Team 2025](https://qwenlm.github.io/blog/qwq-32b/)                                                                         |
| QwQ-Preview | 32B | Alibaba Group | [Qwen Team 2024](https://qwenlm.github.io/blog/qwq-32b-preview/)                                                                         |
| Qwen3 | 0.6B, 1.7B, 4B{Hybrid, Thinking-2507, Instruct-2507}, 8B, 14B, 32B | Alibaba Group | [Qwen Team 2025](https://arxiv.org/abs/2505.09388/)                                                                         |
| Qwen3 MoE | 30B{Hybrid, Thinking-2507, Instruct-2507}, 235B{Hybrid, Thinking-2507, Instruct-2507} | Alibaba Group | [Qwen Team 2025](https://arxiv.org/abs/2505.09388/)                                                                         |
| R1 Distill Llama | 8B, 70B | DeepSeek AI | [DeepSeek AI 2025](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf)                                                                                 |
| SmolLM2 | 135M, 360M, 1.7B | Hugging Face | [Hugging Face 2024](https://github.com/huggingface/smollm)                                                               |
| Salamandra | 2B, 7B | Barcelona Supercomputing Centre | [BSC-LTC 2024](https://github.com/BSC-LTC/salamandra)                                                                         |
| StableCode | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                                  |
| StableLM  | 3B, 7B | Stability AI | [Stability AI 2023](https://github.com/Stability-AI/StableLM)                                                                    |
| StableLM Zephyr | 3B | Stability AI | [Stability AI 2023](https://stability.ai/blog/stablecode-llm-generative-ai-coding)                                             |
| TinyLlama | 1.1B | Zhang et al. | [Zhang et al. 2023](https://github.com/jzhang38/TinyLlama)                                                                         |


**Tip**: You can list all available models by running the `litgpt download list` command.


</details>

&nbsp;

---

# Workflows

<p align="center">
  <a href="#finetune-an-llm">Finetune</a> •
  <a href="#pretrain-an-llm">Pretrain</a> •
  <a href="#continue-pretraining-an-llm">Continued pretraining</a> •
    <a href="#evaluate-an-llm">Evaluate</a> •
    <a href="#deploy-an-llm">Deploy</a> •
    <a href="#test-an-llm">Test</a>
</p>

&nbsp;

Use the command line interface to run advanced workflows such as pretraining or finetuning on your own data.


## All workflows
After installing LitGPT, select the model and workflow to run (finetune, pretrain, evaluate, deploy, etc...):

```bash
# litgpt [action] [model]
litgpt  serve     meta-llama/Llama-3.2-3B-Instruct
litgpt  finetune  meta-llama/Llama-3.2-3B-Instruct
litgpt  pretrain  meta-llama/Llama-3.2-3B-Instruct
litgpt  chat      meta-llama/Llama-3.2-3B-Instruct
litgpt  evaluate  meta-llama/Llama-3.2-3B-Instruct
```

&nbsp;

----

## Finetune an LLM

<div align="center">
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-finetune">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/run-on-studio.svg" height="36px" alt="Run on Studios"/>
</a>
</div>

&nbsp;

Finetuning is the process of taking a pretrained AI model and further training it on a smaller, specialized dataset tailored to a specific task or application.


&nbsp;

```bash
# 0) setup your dataset
curl -L https://huggingface.co/datasets/ksaw008/finance_alpaca/resolve/main/finance_alpaca.json -o my_custom_dataset.json

# 1) Finetune a model (auto downloads weights)
litgpt finetune microsoft/phi-2 \
  --data JSON \
  --data.json_path my_custom_dataset.json \
  --data.val_split_fraction 0.1 \
  --out_dir out/custom-model

# 2) Test the model
litgpt chat out/custom-model/final

# 3) Deploy the model
litgpt serve out/custom-model/final
```

[Read the full finetuning docs](tutorials/finetune.md)

&nbsp;

----

## Deploy an LLM

<div align="center">
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-serve">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/deploy-on-studios.svg" height="36px" alt="Deploy on Studios"/>
</a>
</div>

&nbsp;

Deploy a pretrained or finetune LLM to use it in real-world applications. Deploy, automatically sets up a web server that can be accessed by a website or app.

```bash
# deploy an out-of-the-box LLM
litgpt serve microsoft/phi-2

# deploy your own trained model
litgpt serve path/to/microsoft/phi-2/checkpoint
```

<details>
  <summary>Show code to query server:</summary>

&nbsp;

Test the server in a separate terminal and integrate the model API into your AI product:
```python
# 3) Use the server (in a separate Python session)
import requests, json
response = requests.post(
    "http://127.0.0.1:8000/predict",
    json={"prompt": "Fix typos in the following sentence: Example input"}
)
print(response.json()["output"])
```
</details>

[Read the full deploy docs](tutorials/deploy.md).

&nbsp;

----

## Evaluate an LLM
Evaluate an LLM to test its performance on various tasks to see how well it understands and generates text. Simply put, we can evaluate things like how well would it do in college-level chemistry, coding, etc... (MMLU, Truthful QA, etc...)

```bash
litgpt evaluate microsoft/phi-2 --tasks 'truthfulqa_mc2,mmlu'
```

[Read the full evaluation docs](tutorials/evaluation.md).

&nbsp;

----

##  Test an LLM

<div align="center">
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-chat">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/run-on-studio.svg" height="36px" alt="Run on Studios"/>
</a>
</div>

&nbsp;

Test how well the model works via an interactive chat. Use the `chat` command to chat, extract embeddings, etc...

Here's an example showing how to use the Phi-2 LLM:
```bash
litgpt chat microsoft/phi-2

>> Prompt: What do Llamas eat?
```

<details>
  <summary>Full code:</summary>

&nbsp;

```bash
# 1) List all supported LLMs
litgpt download list

# 2) Use a model (auto downloads weights)
litgpt chat microsoft/phi-2

>> Prompt: What do Llamas eat?
```

The download of certain models requires an additional access token. You can read more about this in the [download](tutorials/download_model_weights.md#specific-models-and-access-tokens) documentation.

</details>

[Read the full chat docs](tutorials/inference.md).

&nbsp;

----

## Pretrain an LLM

<div align="center">
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-pretrain">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/run-on-studio.svg" height="36px" alt="Run on Studios"/>
</a>
</div>

&nbsp;

Pretraining is the process of teaching an AI model by exposing it to a large amount of data before it is fine-tuned for specific tasks.

<details>
  <summary>Show code:</summary>

&nbsp;

```bash
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a tokenizer
litgpt download EleutherAI/pythia-160m \
  --tokenizer_only True

# 2) Pretrain the model
litgpt pretrain EleutherAI/pythia-160m \
  --tokenizer_dir EleutherAI/pythia-160m \
  --data TextFiles \
  --data.train_data_path "custom_texts/" \
  --train.max_tokens 10_000_000 \
  --out_dir out/custom-model

# 3) Test the model
litgpt chat out/custom-model/final
```
</details>

[Read the full pretraining docs](tutorials/pretrain.md)

&nbsp;

----

## Continue pretraining an LLM

<div align="center">
<a target="_blank" href="https://lightning.ai/lightning-ai/studios/litgpt-continue-pretraining">
  <img src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/run-on-studio.svg" height="36px" alt="Run on Studios"/>
</a>
</div>

&nbsp;

Continued pretraining is another way of finetuning that specializes an already pretrained model by training on custom data:

<details>
  <summary>Show code:</summary>

&nbsp;

```bash
mkdir -p custom_texts
curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Continue pretraining a model (auto downloads weights)
litgpt pretrain EleutherAI/pythia-160m \
  --tokenizer_dir EleutherAI/pythia-160m \
  --initial_checkpoint_dir EleutherAI/pythia-160m \
  --data TextFiles \
  --data.train_data_path "custom_texts/" \
  --train.max_tokens 10_000_000 \
  --out_dir out/custom-model

# 2) Test the model
litgpt chat out/custom-model/final
```

</details>

[Read the full continued pretraining docs](tutorials/pretrain.md#continued-pretraining-on-custom-data)

&nbsp;

----

# State-of-the-art features

✅ State-of-the-art optimizations: Flash Attention v2, multi-GPU support via fully-sharded data parallelism, [optional CPU offloading](tutorials/oom.md#do-sharding-across-multiple-gpus), and [TPU and XLA support](extensions/xla).</br>
✅ [Pretrain](tutorials/pretrain.md), [finetune](tutorials/finetune.md), and [deploy](tutorials/inference.md)</br>
✅ Reduce compute requirements with low-precision settings: FP16, BF16, and FP16/FP32 mixed.</br>
✅ Lower memory requirements with [quantization](tutorials/quantize.md): 4-bit floats, 8-bit integers, and double quantization.</br>
✅ [Configuration files](config_hub) for great out-of-the-box performance.</br>
✅ Parameter-efficient finetuning: [LoRA](tutorials/finetune_lora.md), [QLoRA](tutorials/finetune_lora.md), [Adapter](tutorials/finetune_adapter.md), and [Adapter v2](tutorials/finetune_adapter.md).</br>
✅ [Exporting](tutorials/convert_lit_models.md) to other popular model weight formats.</br>
✅ Many popular datasets for [pretraining](tutorials/pretrain.md) and [finetuning](tutorials/prepare_dataset.md), and [support for custom datasets](tutorials/prepare_dataset.md#preparing-custom-datasets-for-instruction-finetuning).</br>
✅ Readable and easy-to-modify code to experiment with the latest research ideas.</br>

&nbsp;

---

# Training recipes

LitGPT comes with validated recipes (YAML configs) to train models under different conditions.  We've generated these recipes based on the parameters we found to perform the best for different training conditions.

Browse all training recipes [here](config_hub).

### Example

```bash
litgpt finetune \
  --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml
```
<details>
  <summary>✅ Use configs to customize training</summary>

Configs let you customize training for all granular parameters like:

```yaml
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

...
```
</details>

<details>
  <summary>✅ Example: LoRA finetuning config</summary>

&nbsp;

```yaml
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:

  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: float, default: 0.0003)
  learning_rate: 0.0002

  #   (type: float, default: 0.02)
  weight_decay: 0.0

  #   (type: float, default: 0.9)
  beta1: 0.9

  #   (type: float, default: 0.95)
  beta2: 0.95

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:

  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

# The name of the logger to send metrics to. (type: Literal['wandb', 'tensorboard', 'csv'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337
```
</details>

<details>
  <summary>✅ Override any parameter in the CLI:</summary>

```bash
litgpt finetune \
  --config https://raw.githubusercontent.com/Lightning-AI/litgpt/main/config_hub/finetune/llama-2-7b/lora.yaml \
  --lora_r 4
```
</details>

&nbsp;

----

# Project highlights

LitGPT powers many great AI projects, initiatives, challenges and of course enterprises. Please submit a pull request to be considered for a feature.

<details>
  <summary>📊 SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling</summary>

The [Samba](https://github.com/microsoft/Samba) project by researchers at Microsoft is built on top of the LitGPT code base and combines state space models with sliding window attention, which outperforms pure state space models.

</details>

<details>
  <summary>🏆 NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day</summary>

The LitGPT repository was the official starter kit for the [NeurIPS 2023 LLM Efficiency Challenge](https://llm-efficiency-challenge.github.io), which is a competition focused on finetuning an existing non-instruction tuned LLM for 24 hours on a single GPU.

</details>

<details>
  <summary>🦙 TinyLlama: An Open-Source Small Language Model</summary>


LitGPT powered the [TinyLlama project](https://github.com/jzhang38/TinyLlama) and [TinyLlama: An Open-Source Small Language Model](https://arxiv.org/abs/2401.02385) research paper.

</details>

<details>
  <summary>🍪 MicroLlama: MicroLlama-300M</summary>

[MicroLlama](https://github.com/keeeeenw/MicroLlama) is a 300M Llama model pretrained on 50B tokens powered by TinyLlama and LitGPT.
</details>

<details>
  <summary>🔬 Pre-training Small Base LMs with Fewer Tokens</summary>

The research paper ["Pre-training Small Base LMs with Fewer Tokens"](https://arxiv.org/abs/2404.08634), which utilizes LitGPT, develops smaller base language models by inheriting a few transformer blocks from larger models and training on a tiny fraction of the data used by the larger models. It demonstrates that these smaller models can perform comparably to larger models despite using significantly less training data and resources.

</details>

&nbsp;

----

# Community

We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.

- [Request a feature](https://github.com/Lightning-AI/litgpt/issues)
- [Submit your first contribution](https://lightning.ai/pages/community/tutorial/how-to-contribute-to-litgpt/)
- [Join our Discord](https://discord.gg/VptPCZkGNa)

&nbsp;

# Tutorials

🚀 [Get started](tutorials/0_to_litgpt.md)</br>
⚡️ [Finetuning, incl. LoRA, QLoRA, and Adapters](tutorials/finetune.md)</br>
🤖 [Pretraining](tutorials/pretrain.md)</br>
💬 [Model evaluation](tutorials/evaluation.md)</br>
📘 [Supported and custom datasets](tutorials/prepare_dataset.md)</br>
🧹 [Quantization](tutorials/quantize.md)</br>
🤯 [Tips for dealing with out-of-memory (OOM) errors](tutorials/oom.md)</br>
🧑🏽‍💻 [Using cloud TPUs](extensions/xla)</br>

&nbsp;

----

### Acknowledgments

This implementation extends on [Lit-LLaMA](https://github.com/lightning-AI/lit-llama) and [nanoGPT](https://github.com/karpathy/nanoGPT), and it's **powered by [Lightning Fabric](https://lightning.ai/docs/fabric/stable/) ⚡**.

- [@karpathy](https://github.com/karpathy) for [nanoGPT](https://github.com/karpathy/nanoGPT)
- [@EleutherAI](https://github.com/EleutherAI) for [GPT-NeoX](https://github.com/EleutherAI/gpt-neox) and the [Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)
- [@TimDettmers](https://github.com/TimDettmers) for [bitsandbytes](https://github.com/TimDettmers/bitsandbytes)
- [@Microsoft](https://github.com/microsoft) for [LoRA](https://github.com/microsoft/LoRA)
- [@tridao](https://github.com/tridao) for [Flash Attention 2](https://github.com/Dao-AILab/flash-attention)

### License

LitGPT is released under the [Apache 2.0](https://github.com/Lightning-AI/litgpt/blob/main/LICENSE) license.

### Citation

If you use LitGPT in your research, please cite the following work:

```bibtex
@misc{litgpt-2023,
  author       = {Lightning AI},
  title        = {LitGPT},
  howpublished = {\url{https://github.com/Lightning-AI/litgpt}},
  year         = {2023},
}
```

&nbsp;


================================================
FILE: config_hub/finetune/README.md
================================================
## Config files

The table below lists the performances you can expect from the provided config files. Note that you can achieve lower memory consumption by lowering the micro batch size as needed. In addition, you can lower the rank (`lora_r`) in the LoRA configuration files and disable LoRA for certain layers (for example, setting `lora_projection` and other LoRA layer-specific parameters to `false`).
For more information, see the [Dealing with out-of-memory (OOM) errors](../../tutorials/oom.md) on lowering the memory requirements.
The "Cost" column refers to the on-demand compute cost on [Lightning AI Studios where these benchmarks were executed](https://lightning.ai/lightning-ai/studios/automated-benchmarks-for-litgpt).
All experiments were conducted using bfloat-16 precision on the Alpaca2k dataset. The "Multitask score" refers to [MMLU](https://arxiv.org/abs/2009.03300).

&nbsp;

| Config                            | Model                  | Epochs | Max seq length | Micro batch size | Machine | Training runtime | Cost | Peak memory | Validation loss | Validation perplexity | Multitask score (MMLU) |
| --------------------------------- | ---------------------- | ------ | -------------- | ---------------- | ------- | ---------------- | ---- | ----------- | --------------- | --------------------- | --------------- |
| falcon-7b/lora.yaml               | falcon-7b              | 4      | 512            | 1                | 1xA10G  | 24.84 min        | $0.7 | 16.69 GB    | 0.945           | 2.573                 | 26.2%           |
| falcon-7b/lora.yaml               | falcon-7b              | 4      | 512            | 1                | 4xA10G  | 24.94 min        | $2.0 | 16.69 GB    | 0.945           | 2.573                 | 26.4%           |
| falcon-7b/qlora.yaml              | falcon-7b              | 4      | 512            | 1                | 1xA10G  | 50.85 min        | $1.5 | 9.44 GB     | 0.993           | 2.699                 | 26.3%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| gemma-2b/full.yaml                | gemma-2b               | 1      | 512            | 1                | 4xA10G  | 14.06 min        | $1.1 | 17.43 GB    | 1.021           | 2.777                 | 32.4%           |
| gemma-2b/lora.yaml                | gemma-2b               | 2      | 512            | 2                | 1xA10G  | 9.41 min         | $0.3 | 12.62 GB    | 0.981           | 2.666                 | 34.4%           |
| gemma-2b/lora.yaml                | gemma-2b               | 2      | 512            | 2                | 4xA10G  | 9.41 min         | $0.8 | 12.62 GB    | 0.981           | 2.667                 | 34.0%           |
| gemma-2b/qlora.yaml               | gemma-2b               | 2      | 512            | 2                | 1xA10G  | 12.91 min        | $0.4 | 11.58 GB    | 1.085           | 2.959                 | 36.4%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| gemma-7b/lora.yaml                | gemma-7b               | 2      | 512            | 1                | 1xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| gemma-7b/lora.yaml                | gemma-7b               | 2      | 512            | 1                | 4xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| gemma-7b/qlora.yaml               | gemma-7b               | 2      | 512            | 1                | 1xA10G  | 43.58 min        | $1.3 | 17.18 GB    | 0.973           | 2.646                 | 62.45%          |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| gemma2-2b/lora.yaml               | gemma-2b               | 2      | 512            | 2                | 1xA10G  | 11.96 min        | $0.4 | 14.31 GB    | 0.951           | 2.589                 | 23.84%          |
| gemma2b/qlora.yaml                | gemma-2b               | 2      | 512            | 2                | 1xA10G  | 16.06 min        | $0.5 | 13.52 GB    | 0.983           | 2.673                 | 24.12%          |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| gemma2-9b/lora.yaml               | gemma-2-9b             | 2      | 512            | 1                | 1xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| gemma2-9b/lora.yaml               | gemma-2-9b             | 2      | 512            | 1                | 4xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| gemma2-9b/qlora.yaml              | gemma-2-9b             | 2      | 512            | 1                | 1xA10G  | 50.01 min        | $4.0 | 20.92 GB    | 0.852           | 2.345                 | 24.2%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| llama-2-7b/full.yaml              | llama-2-7b             | 1      | 512            | 4                | 4xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| llama-2-7b/lora.yaml              | llama-2-7b             | 4      | 512            | 2                | 1xA10G  | 32.82 min        | $1.0 | 19.77 GB    | 0.802           | 2.230                 | 40.3%           |
| llama-2-7b/lora.yaml              | llama-2-7b             | 4      | 512            | 2                | 4xA10G  | 32.83 min        | $2.6 | 19.77 GB    | 0.802           | 2.229                 | 40.2%           |
| llama-2-7b/qlora.yaml             | llama-2-7b             | 4      | 512            | 2                | 1xA10G  | 45.67 min        | $1.4 | 13.68 GB    | 0.814           | 2.258                 | 38.6%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| llama-3-8b/full.yaml              | llama-3-8b             | 1      | 512            | 4                | 4xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   |                 |
| llama-3-8b/lora.yaml              | llama-3-8b             | 2      | 512            | 1                | 1xA10G  | 14.79 min        | $0.4 | 19.73 GB    | 0.888           | 2.431                 | 62.4%           |
| llama-3-8b/lora.yaml              | llama-3-8b             | 2      | 512            | 1                | 4xA10G  | 14.88 min        | $1.2 | 19.73 GB    | 0.889           | 2.432                 | 62.5%           |
| llama-3-8b/qlora.yaml             | llama-3-8b             | 2      | 512            | 2                | 1xA10G  | 22.24 min        | $0.7 | 17.41 GB    | 0.939           | 2.558                 | 62.2%           |
|                                   |                        |        |                |                  |         |                  |      |            |                 |                        |                 |
| llama-3.1-8b/full.yaml            | llama-3.1-8b           | 1      | 512            | 4                | 1xA10G  | OOM              | OOM  | OOM         | OOM             | OOM                   | OOM             |
| llama-3.1-8b/lora.yaml            | llama-3.1-8b           | 2      | 512            | 1                | 1xA10G  | 13.36 min        | $1.1 | 19.73 GB    | 0.878           | 2.406                 | xx.xx           |
| llama-3.1-8b/qlora.yaml           | llama-3.1-8b           | 2      | 512            | 2                | 1xA10G  | 21.81 min        | $0.7 | 17.41 GB    | 0.928           | 2.529                 | xx.xx           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| llama-3.2-1b/full.yaml            | llama-3.2-1b           | 1      | 512            | 4                | 1xA10G  |  2.01 min        | $0.1 |  8.70 GB    | 1.442           | 4.229                 | 38.21%          |
| llama-3.2-1b/lora.yaml            | llama-3.2-1b           | 2      | 512            | 1                | 1xA10G  |  4.17 min        | $0.4 |  4.49 GB    | 1.114           | 3.046                 | 36.87%          |
| llama-3.2-1b/qlora.yaml           | llama-3.2-1b           | 2      | 512            | 2                | 1xA10G  |  6.20 min        | $0.6 |  5.53 GB    | 1.201           | 3.322                 | 36.49%          |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| llama-3.2-3b/full.yaml            | llama-3.2-3b           | 1      | 512            | 4                | 1xA10G  |  4.71 min        | $0.4 | 16.51 GB    | 1.255           | 3.509                 | 54.69%          |
| llama-3.2-3b/lora.yaml            | llama-3.2-3b           | 2      | 512            | 1                | 1xA10G  |  8.31 min        | $0.8 |  9.67 GB    | 0.973           | 2.647                 | 54.77%          |
| llama-3.2-3b/qlora.yaml           | llama-3.2-3b           | 2      | 512            | 2                | 1xA10G  | 14.89 min        | $1.4 | 10.30 GB    | 1.031           | 2.804                 | 55.08%          |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| mistral-7b-v0.2/lora.yaml         | mistral-7b-v0.2        | 4      | 512            | 2                | 1xA10G  | 31.00 min        | $0.9 | 20.66 GB    | 0.801           | 2.228                 | 55.7%           |
| mistral-7b-v0.2/lora.yaml         | mistral-7b-v0.2        | 4      | 512            | 2                | 4xA10G  | 31.00 min        | $2.5 | 20.66 GB    | 0.802           | 2.229                 | 55.5%           |
| mistral-7b-v0.2/qlora.yaml        | mistral-7b-v0.2        | 4      | 512            | 2                | 1xA10G  | 44.75 min        | $1.3 | 14.29 GB    | 0.813           | 2.255                 | 56.5%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| mistral-7b/lora.yaml              | mistral-7b             | 4      | 512            | 2                | 1xA10G  | 31.01 min        | $0.9 | 20.66 GB    | 0.794           | 2.211                 | 57.9%           |
| mistral-7b/lora.yaml              | mistral-7b             | 4      | 512            | 2                | 4xA10G  | 31.03 min        | $2.5 | 20.66 GB    | 0.796           | 2.218                 | 57.9%           |
| mistral-7b/qlora.yaml             | mistral-7b             | 4      | 512            | 2                | 1xA10G  | 44.75 min        | $1.3 | 14.29 GB    | 0.803           | 2.231                 | 57.9%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| phi-2/full.yaml                   | phi-2                  | 1      | 512            | 4                | 4xA10G  | 11.87 min        | $1.0 | 14.44 GB    | 1.305           | 3.688                 | 38.4%           |
| phi-2/lora.yaml                   | phi-2                  | 1      | 512            | 4                | 1xA10G  | 3.78 min         | $0.1 | 13.98 GB    | 0.819           | 2.269                 | 53.0%           |
| phi-2/lora.yaml                   | phi-2                  | 1      | 512            | 4                | 4xA10G  | 3.78 min         | $0.3 | 13.98 GB    | 0.820           | 2.271                 | 52.4%           |
| phi-2/qlora.yaml                  | phi-2                  | 1      | 512            | 4                | 1xA10G  | 4.51 min         | $0.1 | 14.27 GB    | 0.837           | 2.310                 | 52.3%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| phi-3/full.yaml                   | Phi-3-mini-4k-instruct | 1      | 512            | 4                | 1xA10G  | 6.93 min         | $0.2 | 17.01 GB    | 0.714           | 2.043                 | 69.81%          |
| phi-3/lora.yaml                   | Phi-3-mini-4k-instruct | 1      | 512            | 4                | 1xA10G  | 6.46 min         | $0.2 | 19.75 GB    | 0.707           | 2.028                 | 69.70%          |
| phi-3/qlora.yaml                  | Phi-3-mini-4k-instruct | 1      | 512            | 4                | 1xA10G  | 7.47 min         | $0.2 | 19.13 GB    | 0.729           | 2.074                 | 68.96%          |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| stablelm-base-alpha-3b/full.yaml  | stablelm-base-alpha-3b | 1      | 512            | 1                | 4xA10G  | 70.13 min        | $5.6 | 21.23 GB    | 1.513           | 4.540                 | 23.2%           |
| stablelm-base-alpha-3b/lora.yaml  | stablelm-base-alpha-3b | 4      | 512            | 1                | 1xA10G  | 13.07 min        | $0.4 | 8.58 GB     | 1.361           | 3.900                 | 25.9%           |
| stablelm-base-alpha-3b/lora.yaml  | stablelm-base-alpha-3b | 4      | 512            | 1                | 4xA10G  | 13.16 min        | $1.1 | 8.58 GB     | 1.362           | 3.906                 | 25.9%           |
| stablelm-base-alpha-3b/qlora.yaml | stablelm-base-alpha-3b | 4      | 512            | 1                | 1xA10G  | 25.86 min        | $0.8 | 5.24 GB     | 1.388           | 4.009                 | 26.1%           |
|                                   |                        |        |                |                  |         |                  |      |             |                 |                       |                 |
| tiny-llama/full.yaml              | tiny-llama             | 1      | 512            | 4                | 1xA10G  | 2.58 min         | $0.1 | 14.10 GB    | 1.088           | 2.968                 | 24.6%           |
| tiny-llama/full.yaml              | tiny-llama             | 1      | 512            | 4                | 4xA10G  | 2.57 min         | $0.2 | 14.10 GB    | 1.088           | 2.968                 | 24.5%           |
| tiny-llama/lora.yaml              | tiny-llama             | 3      | 512            | 8                | 1xA10G  | 8.09 min         | $0.2 | 13.50 GB    | 1.039           | 2.826                 | 25.5%           |
| tiny-llama/qlora.yaml             | tiny-llama             | 3      | 512            | 8                | 1xA10G  | 8.70 min         | $0.3 | 16.24 GB    | 1.056           | 2.874                 | 25.3%           |

*OOM = Out of memory


&nbsp;
## Extending the context length

If you require a longer sequence length than the one used in a given config file, you can either edit the `max_seq_length` in the config file or pass an additional argument when running the finetuning command, for example, `--max_seq_length 4096` to override the sequence length provided in the config file.

&nbsp;
## Training on GPUs without bfloat16 support

If you are training on GPUs without bfloat-16 support, you need to change the `precision` option to `16-true` (16-bit floating point precision) or `16-mixed` (16/32-bit mixed precision) training:

```bash
litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --precision 16-true
```
or

```bash
litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --precision 16-mixed
```

Note that `16-true` is more compute and memory-efficient, but it can sometimes lead to training convergence issues. In this case, it's recommended to use `16-mixed`.

&nbsp;
## Multi-GPU experiments

All runs are single-GPU experiments, use `--devices 4` to utilize more than one GPU:


```bash
litgpt finetune lora \
  --config config_hub/finetune/phi-2/lora.yaml \
  --devices 4
```


================================================
FILE: config_hub/finetune/falcon-7b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/tiiuae/falcon-7b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-falcon-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/falcon-7b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/tiiuae/falcon-7b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-falcon-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma-2b/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/full-gemma-2b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 4

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 16

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 100

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps: 50

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma-2b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-gemma-2b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 8

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma-2b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-2b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma-7b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-7b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma-7b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-7b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma2-2b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2-2b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-gemma-2-2b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 8

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma2-2b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2-2b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-2-2b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma2-9b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2-9b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-gemma-2-9b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/gemma2-9b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/google/gemma-2-9b

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-gemma-2-9b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 16

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.1

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: true

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: true

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: true

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: true

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.03847
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 800

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 6

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 200

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-2-7b/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
out_dir: out/finetune/full-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use (type: Union[int, str], default: 1)
devices: 4

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
# from the latest checkpoint in ``out_dir``. An error will be raised if no checkpoint is found. Passing
# ``'auto'`` will resume from the latest checkpoint but not error if no checkpoint exists.
# (type: Union[bool, Literal["auto"], Path], default: False)
resume: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
  global_batch_size: 64

  # Number of samples per data-parallel rank (type: int, default: 1)
  micro_batch_size: 4

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 25

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 600)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-2-7b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-2-7b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-2-7b-hf

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama2-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3-8b/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
out_dir: out/finetune/full-llama-3-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use (type: Union[int, str], default: 1)
devices: 4

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
# from the latest checkpoint in ``out_dir``. An error will be raised if no checkpoint is found. Passing
# ``'auto'`` will resume from the latest checkpoint but not error if no checkpoint exists.
# (type: Union[bool, Literal["auto"], Path], default: False)
resume: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
  global_batch_size: 64

  # Number of samples per data-parallel rank (type: int, default: 1)
  micro_batch_size: 4

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 25

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 600)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.1

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3-8b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-llama-3-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3-8b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama3-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.1-8b/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3.1-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
out_dir: out/finetune/full-llama-3.1-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use (type: Union[int, str], default: 1)
devices: 4

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
# from the latest checkpoint in ``out_dir``. An error will be raised if no checkpoint is found. Passing
# ``'auto'`` will resume from the latest checkpoint but not error if no checkpoint exists.
# (type: Union[bool, Literal["auto"], Path], default: False)
resume: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
  global_batch_size: 64

  # Number of samples per data-parallel rank (type: int, default: 1)
  micro_batch_size: 4

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 25

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 600)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.1

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.1-8b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3.1-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-llama-3.1-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.1-8b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Meta-Llama-3.1-8B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama3.1-8b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-1B/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-1B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
out_dir: out/finetune/full-llama-3.2-1B

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
# from the latest checkpoint in ``out_dir``. An error will be raised if no checkpoint is found. Passing
# ``'auto'`` will resume from the latest checkpoint but not error if no checkpoint exists.
# (type: Union[bool, Literal["auto"], Path], default: False)
# resume: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
  global_batch_size: 64

  # Number of samples per data-parallel rank (type: int, default: 1)
  micro_batch_size: 4

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 25

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 600)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.1

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-1B/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-1B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-llama-3.2-1B

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-1B/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-1B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama3.2-1b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-3B/full.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-3B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/finetune/full)
out_dir: out/finetune/full-llama-3.2-3B

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# How many devices/GPUs to use (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# Path to a checkpoint directory to resume from in case training was interrupted, or ``True`` to resume
# from the latest checkpoint in ``out_dir``. An error will be raised if no checkpoint is found. Passing
# ``'auto'`` will resume from the latest checkpoint but not error if no checkpoint exists.
# (type: Union[bool, Literal["auto"], Path], default: False)
# resume: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 64)
  global_batch_size: 64

  # Number of samples per data-parallel rank (type: int, default: 1)
  micro_batch_size: 4

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 25

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 1

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 600)
  interval: 25

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.1

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-3B/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-3B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-llama-3.2-3B

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 1

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/llama-3.2-3B/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/meta-llama/Llama-3.2-3B

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-llama3.2-3b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize: bnb.nf4

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    val_split_fraction: 0.05
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4
    download_dir: data/alpaca2k

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 2

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/mistral-7b/lora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/mistralai/Mistral-7B-v0.1

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/lora-mistral-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quantize.md`` for more information. (type: Optional[Literal['nf4', 'nf4-dq', 'fp4', 'fp4-dq', 'int8-training']], default: null)
quantize:

# How many devices/GPUs to use. (type: Union[int, str], default: 1)
devices: 1

# How many nodes to use. (type: int, default: 1)
num_nodes: 1

# The LoRA rank. (type: int, default: 8)
lora_r: 32

# The LoRA alpha. (type: int, default: 16)
lora_alpha: 16

# The LoRA dropout value. (type: float, default: 0.05)
lora_dropout: 0.05

# Whether to apply LoRA to the query weights in attention. (type: bool, default: True)
lora_query: true

# Whether to apply LoRA to the key weights in attention. (type: bool, default: False)
lora_key: false

# Whether to apply LoRA to the value weights in attention. (type: bool, default: True)
lora_value: true

# Whether to apply LoRA to the output projection in the attention block. (type: bool, default: False)
lora_projection: false

# Whether to apply LoRA to the weights of the MLP in the attention block. (type: bool, default: False)
lora_mlp: false

# Whether to apply LoRA to output head in GPT. (type: bool, default: False)
lora_head: false

# Data-related arguments. If not provided, the default is ``litgpt.data.Alpaca``.
data:
  class_path: litgpt.data.Alpaca2k
  init_args:
    mask_prompt: false
    prompt_style: alpaca
    ignore_index: -100
    seed: 42
    num_workers: 4

# Training-related arguments. See ``litgpt.args.TrainArgs`` for details
train:
  # Number of optimizer steps between saving checkpoints (type: Optional[int], default: 1000)
  save_interval: 200

  # Number of iterations between logging calls (type: int, default: 1)
  log_interval: 1

  # Number of samples between optimizer steps across data-parallel ranks (type: int, default: 128)
  global_batch_size: 8

  # Number of samples per data-parallel rank (type: int, default: 4)
  micro_batch_size: 2

  # Number of iterations with learning rate warmup active (type: int, default: 100)
  lr_warmup_steps: 10

  # Number of epochs to train on (type: Optional[int], default: 5)
  epochs: 4

  # Total number of tokens to train on (type: Optional[int], default: null)
  max_tokens:

  # Limits the number of optimizer steps to run. (type: Optional[int], default: null)
  max_steps:

  # Limits the length of samples. Off by default (type: Optional[int], default: null)
  max_seq_length: 512

  # Whether to tie the embedding weights with the language modeling head weights. (type: Optional[bool], default: null)
  tie_embeddings:

  #   (type: Optional[float], default: null)
  max_norm:

  #   (type: float, default: 6e-05)
  min_lr: 6.0e-05

# Evaluation-related arguments. See ``litgpt.args.EvalArgs`` for details
eval:
  # Number of optimizer steps between evaluation calls (type: int, default: 100)
  interval: 100

  # Number of tokens to generate (type: Optional[int], default: 100)
  max_new_tokens: 100

  # Number of iterations (type: int, default: 100)
  max_iters: 100

  # Whether to evaluate on the validation set at the beginning of the training
  initial_validation: false

  # Whether to evaluate on the validation set at the end the training
  final_validation: true

# The name of the logger to send metrics to. (type: LoggerChoice, i.e. Literal['wandb', 'tensorboard', 'csv', 'mlflow', 'litlogger'], default: csv)
logger_name: csv

# The random seed to use for reproducibility. (type: int, default: 1337)
seed: 1337

# Optimizer-related arguments
optimizer:
  class_path: torch.optim.AdamW

  init_args:
    #   (type: float, default: 0.001)
    lr: 0.0002

    #   (type: float, default: 0.01)
    weight_decay: 0.0

    #   (type: tuple, default: (0.9,0.999))
    betas:
      - 0.9
      - 0.95


================================================
FILE: config_hub/finetune/mistral-7b/qlora.yaml
================================================
# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/stabilityai/stablelm-base-alpha-3b)
checkpoint_dir: checkpoints/mistralai/Mistral-7B-v0.1

# Directory in which to save checkpoints and logs. (type: <class 'Path'>, default: out/lora)
out_dir: out/finetune/qlora-mistral-7b

# The precision to use for finetuning. Possible choices: "bf16-true", "bf16-mixed", "32-true". (type: Optional[str], default: null)
precision: bf16-true

# If set, quantize the model with this algorithm. See ``tutorials/quan

Download .txt

gitextract_ctr2cg_x/

├── .devcontainer/
│   ├── Dockerfile
│   └── devcontainer.json
├── .github/
│   ├── CODEOWNERS
│   ├── ISSUE_TEMPLATE/
│   │   ├── ask-a-question.md
│   │   ├── bug-report.yaml
│   │   └── feature-request.md
│   ├── dependabot.yml
│   └── workflows/
│       ├── check-links.yml
│       ├── cpu-tests.yml
│       ├── mkdocs-deploy.yml
│       └── publish-pkg.yml
├── .gitignore
├── .lightning/
│   └── workflows/
│       └── tests.yaml
├── .pre-commit-config.yaml
├── CITATION.cff
├── LICENSE
├── README.md
├── config_hub/
│   ├── finetune/
│   │   ├── README.md
│   │   ├── falcon-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma-2b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma2-2b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── gemma2-9b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-2-7b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3-8b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.1-8b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.2-1B/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── llama-3.2-3B/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── mistral-7b/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── mistral-7b-v0.2/
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── phi-2/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── phi-3/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   ├── stablelm-base-alpha-3b/
│   │   │   ├── full.yaml
│   │   │   ├── lora.yaml
│   │   │   └── qlora.yaml
│   │   └── tiny-llama/
│   │       ├── full.yaml
│   │       ├── lora.yaml
│   │       └── qlora.yaml
│   └── pretrain/
│       ├── debug.yaml
│       ├── microllama.yaml
│       ├── tinyllama.yaml
│       └── tinystories.yaml
├── extensions/
│   ├── thunder/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── pretrain.py
│   │   ├── strategies/
│   │   │   ├── __init__.py
│   │   │   ├── thunder_ddp.py
│   │   │   └── thunder_fsdp.py
│   │   └── unsloth/
│   │       ├── __init__.py
│   │       ├── executor.py
│   │       └── kernels/
│   │           ├── __init__.py
│   │           ├── cross_entropy_loss.py
│   │           ├── rope_embedding.py
│   │           ├── swiglu.py
│   │           └── utils.py
│   └── xla/
│       ├── README.md
│       ├── __init__
│       ├── finetune/
│       │   ├── __init__
│       │   └── adapter.py
│       ├── generate/
│       │   ├── __init__
│       │   ├── adapter.py
│       │   └── base.py
│       ├── scripts/
│       │   ├── __init__
│       │   └── prepare_alpaca.py
│       └── utils.py
├── litgpt/
│   ├── __init__.py
│   ├── __main__.py
│   ├── adapter.py
│   ├── adapter_v2.py
│   ├── api.py
│   ├── args.py
│   ├── chat/
│   │   ├── __init__.py
│   │   └── base.py
│   ├── config.py
│   ├── constants.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── alpaca.py
│   │   ├── alpaca_2k.py
│   │   ├── alpaca_gpt4.py
│   │   ├── base.py
│   │   ├── deita.py
│   │   ├── flan.py
│   │   ├── json_data.py
│   │   ├── lima.py
│   │   ├── lit_data.py
│   │   ├── longform.py
│   │   ├── microllama.py
│   │   ├── openwebtext.py
│   │   ├── prepare_slimpajama.py
│   │   ├── prepare_starcoder.py
│   │   ├── text_files.py
│   │   ├── tinyllama.py
│   │   └── tinystories.py
│   ├── deploy/
│   │   ├── __init__.py
│   │   └── serve.py
│   ├── eval/
│   │   └── evaluate.py
│   ├── finetune/
│   │   ├── __init__.py
│   │   ├── adapter.py
│   │   ├── adapter_v2.py
│   │   ├── full.py
│   │   ├── lora.py
│   │   └── lora_legacy.py
│   ├── generate/
│   │   ├── __init__.py
│   │   ├── adapter.py
│   │   ├── adapter_v2.py
│   │   ├── base.py
│   │   ├── full.py
│   │   ├── sequentially.py
│   │   ├── speculative_decoding.py
│   │   └── tp.py
│   ├── lora.py
│   ├── model.py
│   ├── parser_config.py
│   ├── pretrain.py
│   ├── prompts.py
│   ├── scripts/
│   │   ├── __init__.py
│   │   ├── convert_hf_checkpoint.py
│   │   ├── convert_lit_checkpoint.py
│   │   ├── convert_pretrained_checkpoint.py
│   │   ├── download.py
│   │   └── merge_lora.py
│   ├── tokenizer.py
│   ├── types.py
│   └── utils.py
├── pyproject.toml
├── tests/
│   ├── conftest.py
│   ├── convert/
│   │   ├── __init__.py
│   │   ├── test_hf_checkpoint.py
│   │   ├── test_lit_checkpoint.py
│   │   └── test_pretrained_checkpoint.py
│   ├── data/
│   │   ├── __init__.py
│   │   ├── _fixtures/
│   │   │   ├── alpaca.json
│   │   │   ├── dolly.json
│   │   │   ├── longform_train.json
│   │   │   └── longform_val.json
│   │   ├── test_alpaca.py
│   │   ├── test_base.py
│   │   ├── test_deita.py
│   │   ├── test_json.py
│   │   ├── test_lit_data.py
│   │   ├── test_longform.py
│   │   ├── test_openwebtext.py
│   │   ├── test_textfiles.py
│   │   ├── test_tinyllama.py
│   │   └── test_tinystories.py
│   ├── ext_thunder/
│   │   ├── __init__.py
│   │   ├── test_thunder_distributed.py
│   │   ├── test_thunder_networks.py
│   │   ├── test_thunder_pretrain.py
│   │   └── test_unsloth_executor.py
│   ├── generate/
│   │   ├── __init__.py
│   │   ├── test_adapter.py
│   │   ├── test_main.py
│   │   ├── test_sequentially.py
│   │   ├── test_tp.py
│   │   └── utils.py
│   ├── test_adapter.py
│   ├── test_adapter_v2.py
│   ├── test_api.py
│   ├── test_args.py
│   ├── test_batch.py
│   ├── test_chat.py
│   ├── test_ci.py
│   ├── test_cli.py
│   ├── test_config.py
│   ├── test_config_hub.py
│   ├── test_deepseek_moe.py
│   ├── test_distributed.py
│   ├── test_evaluate.py
│   ├── test_full.py
│   ├── test_generate_speculatively.py
│   ├── test_lora.py
│   ├── test_merge_lora.py
│   ├── test_model.py
│   ├── test_multihead_latent_attention.py
│   ├── test_pretrain.py
│   ├── test_prompts.py
│   ├── test_readme.py
│   ├── test_rope.py
│   ├── test_serve.py
│   ├── test_tokenizer.py
│   ├── test_trainer_support.py
│   ├── test_types.py
│   ├── test_utils.py
│   └── test_yarn.py
└── tutorials/
    ├── 0_to_litgpt.md
    ├── convert_hf_checkpoint.md
    ├── convert_lit_models.md
    ├── deploy.md
    ├── developer-docs/
    │   ├── README.md
    │   ├── adding-models.md
    │   └── python-api.md
    ├── download_model_weights.md
    ├── evaluation.md
    ├── examples/
    │   └── ptl-trainer/
    │       ├── README.md
    │       ├── litgpt_ptl_medium.py
    │       └── litgpt_ptl_small.py
    ├── finetune.md
    ├── finetune_adapter.md
    ├── finetune_full.md
    ├── finetune_lora.md
    ├── full_finetune_example.py
    ├── inference.md
    ├── mkdocs.yml
    ├── oom.md
    ├── prepare_dataset.md
    ├── pretrain.md
    ├── pretrain_tinyllama.md
    ├── python-api.md
    ├── quantize.md
    └── resource-tables.md

Download .txt

SYMBOL INDEX (1090 symbols across 117 files)

FILE: extensions/thunder/pretrain.py
  function forward_and_loss (line 49) | def forward_and_loss(model: nn.Module, input_ids: torch.Tensor, targets:...
  function setup (line 56) | def setup(
  function main (line 189) | def main(
  function fit (line 276) | def fit(
  function validate (line 403) | def validate(fabric: L.Fabric, model: nn.Module, val_dataloader: DataLoa...
  function get_dataloaders (line 423) | def get_dataloaders(
  function get_lr (line 436) | def get_lr(learning_rate: float, it: int, warmup_iters: int, max_iters: ...
  function initialize_weights (line 450) | def initialize_weights(fabric: L.Fabric, model: GPT, n_layer: int, n_emb...
  function init_out_dir (line 472) | def init_out_dir(out_dir: Path) -> Path:
  function save_checkpoint (line 478) | def save_checkpoint(fabric, state, tokenizer_dir, checkpoint_file):
  function validate_args (line 490) | def validate_args(train: TrainArgs, eval: EvalArgs, initial_checkpoint_d...
  function jit (line 508) | def jit(fn: Callable, executors: List[str]) -> Any:

FILE: extensions/thunder/strategies/thunder_ddp.py
  class ThunderDDPStrategy (line 36) | class ThunderDDPStrategy(ParallelStrategy):
    method __init__ (line 37) | def __init__(
    method root_device (line 82) | def root_device(self) -> torch.device:
    method num_nodes (line 87) | def num_nodes(self) -> int:
    method num_nodes (line 91) | def num_nodes(self, num_nodes: int) -> None:
    method num_processes (line 96) | def num_processes(self) -> int:
    method distributed_sampler_kwargs (line 101) | def distributed_sampler_kwargs(self) -> Dict[str, Any]:
    method _configure_launcher (line 105) | def _configure_launcher(self) -> None:
    method process_group_backend (line 111) | def process_group_backend(self) -> Optional[str]:
    method _configure_launcher (line 115) | def _configure_launcher(self) -> None:
    method setup_environment (line 120) | def setup_environment(self) -> None:
    method setup_module (line 125) | def setup_module(self, module: Module) -> Module:
    method module_to_device (line 148) | def module_to_device(self, module: Module) -> None:
    method all_reduce (line 152) | def all_reduce(
    method barrier (line 160) | def barrier(self, *args: Any, **kwargs: Any) -> None:
    method broadcast (line 169) | def broadcast(self, obj: TBroadcast, src: int = 0) -> TBroadcast:
    method _setup_distributed (line 177) | def _setup_distributed(self) -> None:
    method _get_process_group_backend (line 183) | def _get_process_group_backend(self) -> str:
    method _set_world_ranks (line 186) | def _set_world_ranks(self) -> None:
  class _ThunderDataParalellBackwardSyncControl (line 195) | class _ThunderDataParalellBackwardSyncControl(_BackwardSyncControl):
    method __init__ (line 196) | def __init__(self):
    method no_backward_sync (line 200) | def no_backward_sync(self, module: Module, enabled: bool) -> ContextMa...
  class _SyncGradsContextManager (line 246) | class _SyncGradsContextManager:
    method __init__ (line 247) | def __init__(self, module: Module) -> None:
    method __enter__ (line 251) | def __enter__(self) -> None:
    method __exit__ (line 257) | def __exit__(self, exc_type: Any, exc_value: Any, traceback: Any) -> N...

FILE: extensions/thunder/strategies/thunder_fsdp.py
  class ThunderFSDPStrategy (line 46) | class ThunderFSDPStrategy(ParallelStrategy, _Sharded):
    method __init__ (line 47) | def __init__(
    method root_device (line 129) | def root_device(self) -> torch.device:
    method num_nodes (line 134) | def num_nodes(self) -> int:
    method num_processes (line 138) | def num_processes(self) -> int:
    method distributed_sampler_kwargs (line 143) | def distributed_sampler_kwargs(self) -> Dict[str, Any]:
    method _configure_launcher (line 147) | def _configure_launcher(self) -> None:
    method setup_environment (line 153) | def setup_environment(self) -> None:
    method setup_module (line 158) | def setup_module(self, module: Module) -> Module:
    method module_to_device (line 193) | def module_to_device(self, module: Module) -> None:
    method module_init_context (line 197) | def module_init_context(self, empty_init: Optional[bool] = None) -> Co...
    method module_sharded_context (line 209) | def module_sharded_context(self) -> ContextManager:
    method all_reduce (line 213) | def all_reduce(
    method barrier (line 221) | def barrier(self, *args: Any, **kwargs: Any) -> None:
    method broadcast (line 230) | def broadcast(self, obj: TBroadcast, src: int = 0) -> TBroadcast:
    method clip_gradients_norm (line 239) | def clip_gradients_norm(
    method save_checkpoint (line 250) | def save_checkpoint(
    method load_checkpoint (line 310) | def load_checkpoint(
    method _setup_distributed (line 399) | def _setup_distributed(self) -> None:
    method _set_world_ranks (line 406) | def _set_world_ranks(self) -> None:
  function _is_sharded_checkpoint (line 415) | def _is_sharded_checkpoint(path: Path) -> bool:
  function _is_full_checkpoint (line 420) | def _is_full_checkpoint(path: Path) -> bool:
  function _get_state_dict (line 424) | def _get_state_dict(
  function _unwrap_tom (line 453) | def _unwrap_tom(obj: object) -> object:

FILE: extensions/thunder/unsloth/executor.py
  function unsloth_cross_entropy_meta (line 36) | def unsloth_cross_entropy_meta(logits: TensorProxy, labels: TensorProxy)...
  function unsloth_cross_entropy_backward_impl (line 54) | def unsloth_cross_entropy_backward_impl(dlosses: Tensor, logits: Tensor,...
  function unsloth_cross_entropy_backward_meta (line 59) | def unsloth_cross_entropy_backward_meta(
  function unsloth_cross_entropy_checker (line 70) | def unsloth_cross_entropy_checker(
  function cross_entropy_to_unsloth (line 92) | def cross_entropy_to_unsloth(
  function unsloth_cross_entropy_grad (line 113) | def unsloth_cross_entropy_grad(
  function swiglu (line 158) | def swiglu(e: torch.Tensor, g: torch.Tensor) -> torch.Tensor:
  class ThunderLLaMAMLP (line 162) | class ThunderLLaMAMLP(OriginalLLaMAMLP):
    method forward (line 163) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  function swiglu_forward_meta (line 173) | def swiglu_forward_meta(e: TensorProxy, g: TensorProxy) -> TensorProxy:
  function unsloth_swiglu_backward_meta (line 185) | def unsloth_swiglu_backward_meta(DW: TensorProxy, e: TensorProxy, g: Ten...
  function unsloth_swiglu_backward_fn (line 189) | def unsloth_swiglu_backward_fn(DW: Tensor, e: Tensor, g: Tensor) -> Tupl...
  function swiglu_to_unsloth_checker (line 204) | def swiglu_to_unsloth_checker(e: TensorProxy, g: TensorProxy) -> bool:
  function unsloth_swiglu_grad (line 208) | def unsloth_swiglu_grad(e: TensorProxy, g: TensorProxy) -> TensorProxy:
  function apply_rope_meta (line 231) | def apply_rope_meta(x: TensorProxy, cos: TensorProxy, sin: TensorProxy) ...
  function unsloth_apply_rope_meta (line 240) | def unsloth_apply_rope_meta(
  function unsloth_apply_rope_backward_meta (line 256) | def unsloth_apply_rope_backward_meta(
  function apply_rope_to_unsloth_checker (line 267) | def apply_rope_to_unsloth_checker(x: TensorProxy, cos: TensorProxy, sin:...
  function unsloth_apply_rope_grad (line 271) | def unsloth_apply_rope_grad(x: TensorProxy, cos: TensorProxy, sin: Tenso...

FILE: extensions/thunder/unsloth/kernels/cross_entropy_loss.py
  function _cross_entropy_forward (line 27) | def _cross_entropy_forward(
  function _chunked_cross_entropy_forward (line 83) | def _chunked_cross_entropy_forward(
  function _cross_entropy_backward (line 149) | def _cross_entropy_backward(
  function _cross_entropy_forward_impl (line 204) | def _cross_entropy_forward_impl(logits, labels):
  function _cross_entropy_backward_impl (line 262) | def _cross_entropy_backward_impl(dlosses, logits, logsumexp, labels):

FILE: extensions/thunder/unsloth/kernels/rope_embedding.py
  function _rope_embedding (line 32) | def _rope_embedding(
  function _rope_embedding_forward_impl (line 86) | def _rope_embedding_forward_impl(Q, cos, sin):
  function _rope_embedding_backward_impl (line 126) | def _rope_embedding_backward_impl(dY, cos, sin, n_groups, BLOCK_SIZE, nu...

FILE: extensions/thunder/unsloth/kernels/swiglu.py
  function _fg_kernel (line 25) | def _fg_kernel(
  function swiglu_fg_kernel (line 52) | def swiglu_fg_kernel(e, g):
  function _DWf_DW_dfg_kernel (line 71) | def _DWf_DW_dfg_kernel(
  function swiglu_DWf_DW_dfg_kernel (line 120) | def swiglu_DWf_DW_dfg_kernel(DW, e, g):

FILE: extensions/thunder/unsloth/kernels/utils.py
  function calculate_settings (line 25) | def calculate_settings(n):

FILE: extensions/xla/finetune/adapter.py
  function setup (line 54) | def setup(
  function main (line 76) | def main(fabric: L.Fabric, data_dir: Path, checkpoint_dir: Path, out_dir...
  function train (line 122) | def train(
  function validate (line 222) | def validate(
  function get_batch (line 254) | def get_batch(fabric: L.Fabric, data: List[Dict], longest_seq_length: in...
  function get_longest_seq_length (line 272) | def get_longest_seq_length(data: List[Dict]) -> int:
  function save_adapter_checkpoint (line 277) | def save_adapter_checkpoint(fabric: L.Fabric, model: torch.nn.Module, fi...

FILE: extensions/xla/generate/adapter.py
  function setup (line 25) | def setup(
  function main (line 60) | def main(

FILE: extensions/xla/generate/base.py
  function generate (line 27) | def generate(
  function setup (line 97) | def setup(
  function main (line 125) | def main(

FILE: extensions/xla/scripts/prepare_alpaca.py
  function prepare (line 19) | def prepare(
  function download_if_missing (line 86) | def download_if_missing(file_path: Path, file_url: str) -> None:
  function prepare_sample (line 99) | def prepare_sample(example: dict, tokenizer: Tokenizer, max_length: int,...
  function generate_prompt (line 129) | def generate_prompt(example: dict) -> str:

FILE: extensions/xla/utils.py
  function rank_print (line 16) | def rank_print(fabric: L.Fabric, message: object, *, flush: bool = True,...
  function materialize_parameters (line 25) | def materialize_parameters(module: torch.nn.Module, device: torch.device...
  function sequential_load_and_fsdp_wrap (line 34) | def sequential_load_and_fsdp_wrap(

FILE: litgpt/__main__.py
  function _check_commands (line 57) | def _check_commands():
  function main (line 63) | def main() -> None:

FILE: litgpt/adapter.py
  class Config (line 25) | class Config(BaseConfig):
  class GPT (line 30) | class GPT(BaseModel):
    method __init__ (line 32) | def __init__(self, config: Config) -> None:
    method from_name (line 49) | def from_name(cls, name: str, **kwargs: Any) -> Self:
    method _init_weights (line 52) | def _init_weights(self, module: nn.Module) -> None:
  class Block (line 59) | class Block(BaseBlock):
    method __init__ (line 60) | def __init__(self, config: Config, block_idx: int) -> None:
  class CausalSelfAttention (line 65) | class CausalSelfAttention(BaseCausalSelfAttention):
    method __init__ (line 69) | def __init__(self, config: Config, block_idx: int) -> None:
    method scaled_dot_product_attention (line 79) | def scaled_dot_product_attention(
    method reset_parameters (line 111) | def reset_parameters(self) -> None:
    method _load_from_state_dict (line 115) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  function mark_only_adapter_as_trainable (line 122) | def mark_only_adapter_as_trainable(model: GPT) -> None:
  function adapter_filter (line 128) | def adapter_filter(key: str, value: Any) -> bool:

FILE: litgpt/adapter_v2.py
  class Config (line 28) | class Config(BaseConfig):
    method mlp_class (line 30) | def mlp_class(self) -> Type:
  function adapter_filter (line 34) | def adapter_filter(key: str, value: Any) -> bool:
  class AdapterV2Linear (line 50) | class AdapterV2Linear(torch.nn.Module):
    method __init__ (line 51) | def __init__(self, in_features: int, out_features: int, **kwargs) -> N...
    method forward (line 57) | def forward(self, x: torch.Tensor) -> torch.Tensor:
    method reset_parameters (line 60) | def reset_parameters(self) -> None:
  class GPT (line 65) | class GPT(BaseModel):
    method __init__ (line 67) | def __init__(self, config: Config) -> None:
    method from_name (line 84) | def from_name(cls, name: str, **kwargs: Any) -> Self:
    method _init_weights (line 87) | def _init_weights(self, module: nn.Module) -> None:
    method _load_from_state_dict (line 93) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class Block (line 100) | class Block(BaseBlock):
    method __init__ (line 101) | def __init__(self, config: Config, block_idx: int) -> None:
  class CausalSelfAttention (line 107) | class CausalSelfAttention(BaseCausalSelfAttention):
    method __init__ (line 111) | def __init__(self, config: Config, block_idx: int) -> None:
    method _load_from_state_dict (line 119) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class GptNeoxMLP (line 141) | class GptNeoxMLP(litgpt.model.GptNeoxMLP):
    method __init__ (line 142) | def __init__(self, config: Config) -> None:
    method _load_from_state_dict (line 148) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class LLaMAMLP (line 160) | class LLaMAMLP(litgpt.model.LLaMAMLP):
    method __init__ (line 161) | def __init__(self, config: Config, intermediate_size: Optional[int] = ...
    method _load_from_state_dict (line 169) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class GemmaMLP (line 183) | class GemmaMLP(LLaMAMLP):
    method forward (line 184) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LLaMAMoE (line 191) | class LLaMAMoE(litgpt.model.LLaMAMoE):
    method __init__ (line 192) | def __init__(self, config: Config) -> None:
    method _load_from_state_dict (line 200) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  function mark_only_adapter_v2_as_trainable (line 207) | def mark_only_adapter_v2_as_trainable(model: GPT) -> None:

FILE: litgpt/api.py
  class LLM (line 37) | class LLM(torch.nn.Module):
    method __init__ (line 38) | def __init__(
    method tokenizer (line 76) | def tokenizer(self):
    method state_dict (line 79) | def state_dict(self, destination=None, prefix="", keep_vars=False):
    method load_state_dict (line 82) | def load_state_dict(self, state_dict, strict=True):
    method forward (line 85) | def forward(
    method trainer_setup (line 100) | def trainer_setup(self, trainer_ckpt: Optional[Path] = None) -> None:
    method save (line 126) | def save(self, out_dir: Optional[Path] = None, prompt_style: Optional[...
    method load (line 148) | def load(
    method distribute (line 256) | def distribute(
    method generate (line 461) | def generate(
    method _text_to_token_ids (line 570) | def _text_to_token_ids(self, prompt: str, sys_prompt: Optional[str] = ...
    method benchmark (line 576) | def benchmark(self, num_iterations=1, **kwargs):
  class Preprocessor (line 619) | class Preprocessor:
    method __init__ (line 624) | def __init__(self, tokenizer: Tokenizer, device: str = "cpu") -> None:
    method encode (line 628) | def encode(self, text: str) -> torch.Tensor:
    method decode (line 631) | def decode(self, token_ids: torch.Tensor) -> str:
  function calculate_number_of_devices (line 635) | def calculate_number_of_devices(devices):
  function benchmark_dict_to_markdown_table (line 643) | def benchmark_dict_to_markdown_table(data):
  function pull_request_benchmark_util (line 666) | def pull_request_benchmark_util(model_name="microsoft/phi-2", num_iterat...

FILE: litgpt/args.py
  class TrainArgs (line 9) | class TrainArgs:
    method __post_init__ (line 42) | def __post_init__(self) -> None:
    method gradient_accumulation_iters (line 57) | def gradient_accumulation_iters(self, devices: int, num_nodes: int = 1...
    method batch_size (line 63) | def batch_size(self, devices: int, num_nodes: int = 1) -> int:
    method warmup_iters (line 69) | def warmup_iters(self, devices: int, num_nodes: int, max_iters: int, t...
  class EvalArgs (line 79) | class EvalArgs:
  class LogArgs (line 98) | class LogArgs:

FILE: litgpt/chat/base.py
  function generate (line 28) | def generate(
  function process_prompt (line 77) | def process_prompt(
  function interact (line 123) | def interact(multiline, model, tokenizer, prompt_style, fabric, temperat...
  function main (line 151) | def main(

FILE: litgpt/config.py
  function find_multiple (line 12) | def find_multiple(n: int, k: int) -> int:
  class Config (line 26) | class Config:
    method __post_init__ (line 118) | def __post_init__(self):
    method from_name (line 186) | def from_name(cls, name: str, **kwargs: Any) -> Optional[Self]:
    method from_file (line 206) | def from_file(cls, path: Union[str, Path], **kwargs: Any) -> Self:
    method from_checkpoint (line 215) | def from_checkpoint(cls, path: Path, **kwargs: Any) -> Self:
    method mlp_class (line 224) | def mlp_class(self) -> Type:
    method norm_class (line 231) | def norm_class(self) -> Type:
  function check_indicator_and_length (line 252) | def check_indicator_and_length(

FILE: litgpt/data/alpaca.py
  class Alpaca (line 21) | class Alpaca(DataModule):
    method __post_init__ (line 49) | def __post_init__(self) -> None:
    method connect (line 54) | def connect(
    method prepare_data (line 61) | def prepare_data(self) -> None:
    method setup (line 65) | def setup(self, stage: str = "") -> None:
    method train_dataloader (line 94) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 104) | def val_dataloader(self) -> DataLoader:
  function download_if_missing (line 114) | def download_if_missing(file_path: Path, file_url: str, mode: str = "w",...

FILE: litgpt/data/alpaca_2k.py
  class Alpaca2k (line 12) | class Alpaca2k(Alpaca):
    method prepare_data (line 24) | def prepare_data(self) -> None:
    method setup (line 29) | def setup(self, stage: str = "") -> None:

FILE: litgpt/data/alpaca_gpt4.py
  class AlpacaGPT4 (line 13) | class AlpacaGPT4(Alpaca):

FILE: litgpt/data/base.py
  class DataModule (line 15) | class DataModule(LightningDataModule):
    method connect (line 19) | def connect(
    method setup (line 30) | def setup(self, stage: str = "") -> None:
    method __repr__ (line 34) | def __repr__(self) -> str:
  class SFTDataset (line 38) | class SFTDataset(Dataset):
    method __init__ (line 58) | def __init__(
    method __len__ (line 78) | def __len__(self) -> int:
    method __getitem__ (line 81) | def __getitem__(self, idx: int) -> Dict[str, Union[Tensor, Dict[str, i...
  function get_sft_collate_fn (line 111) | def get_sft_collate_fn(max_seq_length: int = -1, pad_id: int = 0, ignore...
  function _sft_collate_fn (line 121) | def _sft_collate_fn(

FILE: litgpt/data/deita.py
  class Deita (line 17) | class Deita(DataModule):
    method __post_init__ (line 43) | def __post_init__(self) -> None:
    method connect (line 48) | def connect(
    method prepare_data (line 55) | def prepare_data(self) -> None:
    method setup (line 60) | def setup(self, stage: str = "") -> None:
    method train_dataloader (line 84) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 94) | def val_dataloader(self) -> DataLoader:
  function format_dataset (line 104) | def format_dataset(dataset: List[dict], include_multi_turn_conversations...

FILE: litgpt/data/flan.py
  class FLAN (line 22) | class FLAN(DataModule):
    method __post_init__ (line 48) | def __post_init__(self):
    method connect (line 62) | def connect(
    method prepare_data (line 69) | def prepare_data(self) -> None:
    method train_dataloader (line 77) | def train_dataloader(self):
    method val_dataloader (line 80) | def val_dataloader(self):
    method _dataloader (line 83) | def _dataloader(self, split: str) -> DataLoader:
  function load_jsonl (line 108) | def load_jsonl(filename: Path) -> List[Dict[str, str]]:
  function _transform (line 116) | def _transform(item: dict) -> dict:
  function _supported_subsets (line 122) | def _supported_subsets() -> Set[str]:

FILE: litgpt/data/json_data.py
  class JSON (line 18) | class JSON(DataModule):
    method __post_init__ (line 45) | def __post_init__(self):
    method connect (line 69) | def connect(
    method setup (line 76) | def setup(self, stage: str = "") -> None:
    method train_dataloader (line 96) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 106) | def val_dataloader(self) -> DataLoader:
    method get_splits (line 115) | def get_splits(self) -> Tuple:
    method find_split (line 138) | def find_split(self, split_name: str) -> Optional[Path]:
  function load_split (line 145) | def load_split(json_path: Path) -> Any:

FILE: litgpt/data/lima.py
  class LIMA (line 17) | class LIMA(DataModule):
    method __post_init__ (line 46) | def __post_init__(self):
    method connect (line 57) | def connect(
    method prepare_data (line 64) | def prepare_data(self) -> None:
    method setup (line 69) | def setup(self, stage: str = "") -> None:
    method train_dataloader (line 100) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 110) | def val_dataloader(self) -> DataLoader:
  function format_dataset (line 120) | def format_dataset(dataset_partition: dict, include_multi_turn_conversat...

FILE: litgpt/data/lit_data.py
  class LitData (line 14) | class LitData(DataModule):
    method __post_init__ (line 33) | def __post_init__(self) -> None:
    method connect (line 38) | def connect(
    method train_dataloader (line 44) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 48) | def val_dataloader(self) -> DataLoader:
    method _dataloader (line 52) | def _dataloader(self, input_dir: str, train: bool):

FILE: litgpt/data/longform.py
  class LongForm (line 20) | class LongForm(DataModule):
    method __post_init__ (line 42) | def __post_init__(self) -> None:
    method connect (line 47) | def connect(
    method prepare_data (line 54) | def prepare_data(self) -> None:
    method train_dataloader (line 59) | def train_dataloader(self):
    method val_dataloader (line 62) | def val_dataloader(self):
    method _dataloader (line 65) | def _dataloader(self, split: str) -> DataLoader:
  function _transform (line 88) | def _transform(item: dict) -> dict:

FILE: litgpt/data/microllama.py
  class MicroLlama (line 10) | class MicroLlama(TinyLlama):
    method __init__ (line 13) | def __init__(self, data_path: Union[str, Path] = Path("data/"), seed: ...

FILE: litgpt/data/openwebtext.py
  class OpenWebText (line 15) | class OpenWebText(DataModule):
    method __post_init__ (line 32) | def __post_init__(self) -> None:
    method connect (line 38) | def connect(
    method prepare_data (line 45) | def prepare_data(self) -> None:
    method train_dataloader (line 83) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 96) | def val_dataloader(self) -> DataLoader:

FILE: litgpt/data/prepare_slimpajama.py
  class SlimPajamaDataRecipe (line 13) | class SlimPajamaDataRecipe(DataChunkRecipe):
    method __init__ (line 16) | def __init__(self, tokenizer: Tokenizer, chunk_size: int):
    method prepare_structure (line 20) | def prepare_structure(self, input_dir):
    method prepare_item (line 24) | def prepare_item(self, filepath):
  function prepare (line 36) | def prepare(

FILE: litgpt/data/prepare_starcoder.py
  class StarcoderDataRecipe (line 18) | class StarcoderDataRecipe(DataChunkRecipe):
    method __init__ (line 21) | def __init__(self, tokenizer: Tokenizer, chunk_size: int):
    method prepare_structure (line 25) | def prepare_structure(self, input_dir):
    method prepare_item (line 29) | def prepare_item(self, item_metadata):
  function prepare (line 52) | def prepare(

FILE: litgpt/data/text_files.py
  class TextFiles (line 16) | class TextFiles(DataModule):
    method __post_init__ (line 39) | def __post_init__(self) -> None:
    method connect (line 47) | def connect(self, tokenizer: Optional[Tokenizer] = None, batch_size: i...
    method prepare_data (line 52) | def prepare_data(self) -> None:
    method train_dataloader (line 108) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 122) | def val_dataloader(self) -> DataLoader:
  function tokenize (line 136) | def tokenize(filename: str, tokenizer: Tokenizer):
  function validate_tokenizer (line 143) | def validate_tokenizer(tokenizer: Tokenizer) -> None:

FILE: litgpt/data/tinyllama.py
  class TinyLlama (line 13) | class TinyLlama(DataModule):
    method __post_init__ (line 33) | def __post_init__(self):
    method connect (line 44) | def connect(
    method prepare_data (line 50) | def prepare_data(self) -> None:
    method train_dataloader (line 59) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 92) | def val_dataloader(self) -> DataLoader:

FILE: litgpt/data/tinystories.py
  class TinyStories (line 20) | class TinyStories(DataModule):
    method __post_init__ (line 38) | def __post_init__(self) -> None:
    method connect (line 43) | def connect(self, tokenizer: Optional[Tokenizer] = None, batch_size: i...
    method prepare_data (line 48) | def prepare_data(self) -> None:
    method train_dataloader (line 81) | def train_dataloader(self) -> DataLoader:
    method val_dataloader (line 94) | def val_dataloader(self) -> DataLoader:
  function tokenize (line 108) | def tokenize(filename: str, tokenizer: Tokenizer):
  function download (line 124) | def download(data_dir: Path):

FILE: litgpt/deploy/serve.py
  class BaseLitAPI (line 21) | class BaseLitAPI(LitAPI):
    method __init__ (line 22) | def __init__(
    method setup (line 50) | def setup(self, device: str) -> None:
    method decode_request (line 71) | def decode_request(self, request: Dict[str, Any]) -> Any:
  class SimpleLitAPI (line 76) | class SimpleLitAPI(BaseLitAPI):
    method __init__ (line 77) | def __init__(
    method setup (line 103) | def setup(self, device: str):
    method predict (line 106) | def predict(self, inputs: str) -> Any:
    method encode_response (line 116) | def encode_response(self, output: str) -> Dict[str, Any]:
  class StreamLitAPI (line 121) | class StreamLitAPI(BaseLitAPI):
    method __init__ (line 122) | def __init__(
    method setup (line 148) | def setup(self, device: str):
    method predict (line 151) | def predict(self, inputs: torch.Tensor) -> Any:
    method encode_response (line 161) | def encode_response(self, output):
  class OpenAISpecLitAPI (line 166) | class OpenAISpecLitAPI(BaseLitAPI):
    method __init__ (line 167) | def __init__(
    method setup (line 193) | def setup(self, device: str):
    method decode_request (line 213) | def decode_request(self, request: "ChatCompletionRequest") -> Any:
    method predict (line 217) | def predict(self, inputs: str, context: dict) -> Any:
  function run_server (line 234) | def run_server(

FILE: litgpt/eval/evaluate.py
  function prepare_results (line 15) | def prepare_results(results, save_filepath, print_results=True):
  function convert_and_evaluate (line 27) | def convert_and_evaluate(

FILE: litgpt/finetune/adapter.py
  function setup (line 48) | def setup(
  function main (line 151) | def main(
  function fit (line 244) | def fit(
  function validate (line 391) | def validate(
  function generate_example (line 412) | def generate_example(fabric: L.Fabric, model: GPT, tokenizer: Tokenizer,...
  function get_lr_scheduler (line 444) | def get_lr_scheduler(optimizer, warmup_steps: int, max_steps: int):
  function get_dataloaders (line 451) | def get_dataloaders(
  function get_longest_seq_length (line 464) | def get_longest_seq_length(data: List[Dict]) -> Tuple[int, int]:
  function save_adapter_checkpoint (line 472) | def save_adapter_checkpoint(fabric: L.Fabric, model: torch.nn.Module, fi...
  function validate_args (line 477) | def validate_args(train: TrainArgs, eval: EvalArgs) -> None:

FILE: litgpt/finetune/adapter_v2.py
  function setup (line 49) | def setup(
  function main (line 153) | def main(
  function fit (line 261) | def fit(
  function validate (line 418) | def validate(
  function generate_example (line 439) | def generate_example(fabric: L.Fabric, model: GPT, tokenizer: Tokenizer,...
  function get_lr_scheduler (line 467) | def get_lr_scheduler(optimizer, warmup_steps: int, max_steps: int):
  function get_dataloaders (line 474) | def get_dataloaders(
  function get_longest_seq_length (line 487) | def get_longest_seq_length(data: List[Dict]) -> Tuple[int, int]:
  function save_adapter_v2_checkpoint (line 495) | def save_adapter_v2_checkpoint(fabric: L.Fabric, model: torch.nn.Module,...
  function validate_args (line 500) | def validate_args(train: TrainArgs, eval: EvalArgs) -> None:

FILE: litgpt/finetune/full.py
  function setup (line 44) | def setup(
  function main (line 126) | def main(
  function fit (line 209) | def fit(
  function validate (line 363) | def validate(
  function generate_example (line 383) | def generate_example(fabric: L.Fabric, model: GPT, tokenizer: Tokenizer,...
  function get_lr_scheduler (line 415) | def get_lr_scheduler(optimizer, warmup_steps: int, max_steps: int):
  function get_dataloaders (line 422) | def get_dataloaders(
  function get_longest_seq_length (line 435) | def get_longest_seq_length(data: List[Dict]) -> Tuple[int, int]:
  function validate_args (line 443) | def validate_args(train: TrainArgs, eval: EvalArgs) -> None:

FILE: litgpt/finetune/lora.py
  function setup (line 49) | def setup(
  function main (line 183) | def main(
  function fit (line 285) | def fit(
  function validate (line 440) | def validate(
  function generate_example (line 461) | def generate_example(fabric: L.Fabric, model: GPT, tokenizer: Tokenizer,...
  function get_lr_scheduler (line 490) | def get_lr_scheduler(optimizer, warmup_steps: int, max_steps: int):
  function get_dataloaders (line 497) | def get_dataloaders(
  function get_longest_seq_length (line 510) | def get_longest_seq_length(data: List[Dict]) -> Tuple[int, int]:
  function parallelize_fn (line 518) | def parallelize_fn(model, device_mesh, activation_checkpointing=True):
  function save_lora_checkpoint (line 542) | def save_lora_checkpoint(fabric: L.Fabric, model: torch.nn.Module, file_...
  function validate_args (line 559) | def validate_args(train: TrainArgs, eval: EvalArgs) -> None:

FILE: litgpt/finetune/lora_legacy.py
  function setup (line 49) | def setup(
  function main (line 183) | def main(
  function fit (line 278) | def fit(
  function validate (line 425) | def validate(
  function generate_example (line 446) | def generate_example(fabric: L.Fabric, model: GPT, tokenizer: Tokenizer,...
  function get_lr_scheduler (line 475) | def get_lr_scheduler(optimizer, warmup_steps: int, max_steps: int):
  function get_dataloaders (line 482) | def get_dataloaders(
  function get_longest_seq_length (line 495) | def get_longest_seq_length(data: List[Dict]) -> Tuple[int, int]:
  function save_lora_checkpoint (line 503) | def save_lora_checkpoint(fabric: L.Fabric, model: torch.nn.Module, file_...
  function validate_args (line 508) | def validate_args(train: TrainArgs, eval: EvalArgs) -> None:

FILE: litgpt/generate/adapter.py
  function main (line 28) | def main(

FILE: litgpt/generate/adapter_v2.py
  function main (line 28) | def main(

FILE: litgpt/generate/base.py
  function multinomial_num_samples_1 (line 30) | def multinomial_num_samples_1(probs: torch.Tensor) -> torch.Tensor:
  function sample_top_p (line 38) | def sample_top_p(logits: torch.Tensor, top_p: float) -> torch.Tensor:
  function sample (line 53) | def sample(
  function next_token (line 76) | def next_token(
  function batched_sample (line 88) | def batched_sample(logits: list[torch.Tensor], kwargs: list[dict]) -> to...
  function batched_next_token (line 95) | def batched_next_token(
  function generate_fn (line 130) | def generate_fn(
  function batched_generate_fn (line 241) | def batched_generate_fn(
  function generate (line 374) | def generate(
  function main (line 431) | def main(

FILE: litgpt/generate/full.py
  function main (line 27) | def main(

FILE: litgpt/generate/sequentially.py
  function sequential (line 36) | def sequential(model: GPT, root: torch.device, max_seq_length: int, devi...
  function chunk_sizes (line 96) | def chunk_sizes(num_units: int, devices: int) -> List[int]:
  function layer_to_device (line 102) | def layer_to_device(
  function move_block_input (line 117) | def move_block_input(device: torch.device, module: torch.nn.Module, ins):
  function move_block_output (line 123) | def move_block_output(device: torch.device, module: torch.nn.Module, ins...
  function replace_device (line 128) | def replace_device(module: torch.nn.Module, replace: torch.device, by: t...
  function main (line 146) | def main(

FILE: litgpt/generate/speculative_decoding.py
  function sample (line 32) | def sample(
  function speculative_decoding (line 60) | def speculative_decoding(
  function generate (line 172) | def generate(
  function setup_model (line 306) | def setup_model(config: Config, max_returned_tokens: int, fabric: L.Fabr...
  function load_model (line 319) | def load_model(checkpoint_dir: Path, fabric: L.Fabric) -> Tuple[Config, ...
  function main (line 329) | def main(

FILE: litgpt/generate/tp.py
  function tensor_parallel_linear (line 33) | def tensor_parallel_linear(fabric: L.Fabric, linear: torch.nn.Linear, st...
  function tensor_parallel_mlp (line 53) | def tensor_parallel_mlp(fabric: L.Fabric, mlp: Union[GptNeoxMLP, LLaMAML...
  function tensor_parallel_attn (line 72) | def tensor_parallel_attn(fabric: L.Fabric, attn: CausalSelfAttention) ->...
  function all_reduce_output (line 78) | def all_reduce_output(world_size: int, module: torch.nn.Module, ins, out...
  function tensor_parallel (line 84) | def tensor_parallel(fabric: L.Fabric, model: GPT) -> GPT:
  function main (line 103) | def main(

FILE: litgpt/lora.py
  class LoRALayer (line 64) | class LoRALayer(nn.Module):
    method __init__ (line 65) | def __init__(self, r: int, lora_alpha: int, lora_dropout: float):
  class LoRALinear (line 89) | class LoRALinear(LoRALayer):
    method __init__ (line 91) | def __init__(
    method reset_parameters (line 130) | def reset_parameters(self) -> None:
    method get_lora_AB (line 138) | def get_lora_AB(self) -> torch.Tensor:
    method merge (line 142) | def merge(self) -> None:
    method forward (line 165) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LoRAQKVLinear (line 175) | class LoRAQKVLinear(LoRALinear):
    method __init__ (line 177) | def __init__(
    method lora_ind (line 265) | def lora_ind(self) -> torch.Tensor:
    method zero_pad (line 285) | def zero_pad(self, x: torch.Tensor) -> torch.Tensor:
    method conv1d (line 325) | def conv1d(self, input: torch.Tensor, weight: torch.Tensor) -> torch.T...
    method get_lora_AB (line 361) | def get_lora_AB(self) -> torch.Tensor:
    method merge (line 373) | def merge(self) -> None:
    method forward (line 378) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  function mark_only_lora_as_trainable (line 414) | def mark_only_lora_as_trainable(model: nn.Module, bias: str = "none") ->...
  function lora_filter (line 447) | def lora_filter(key: str, value: Any) -> bool:
  class Config (line 452) | class Config(BaseConfig):
    method mlp_class (line 475) | def mlp_class(self) -> Type:
  class GPT (line 479) | class GPT(BaseModel):
    method __init__ (line 481) | def __init__(self, config: Config) -> None:
    method from_name (line 504) | def from_name(cls, name: str, **kwargs: Any) -> Self:
    method _init_weights (line 507) | def _init_weights(self, module: nn.Module) -> None:
    method _load_from_state_dict (line 513) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class Block (line 520) | class Block(BaseBlock):
    method __init__ (line 521) | def __init__(self, config: Config, block_idx: int) -> None:
  class CausalSelfAttention (line 527) | class CausalSelfAttention(BaseCausalSelfAttention):
    method __init__ (line 528) | def __init__(self, config: Config, block_idx: int) -> None:
    method _load_from_state_dict (line 553) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  function create_lora_linear (line 572) | def create_lora_linear(
  class GptNeoxMLP (line 593) | class GptNeoxMLP(litgpt.model.GptNeoxMLP):
    method __init__ (line 594) | def __init__(self, config: Config) -> None:
    method _load_from_state_dict (line 600) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class LLaMAMLP (line 612) | class LLaMAMLP(litgpt.model.LLaMAMLP):
    method __init__ (line 613) | def __init__(self, config: Config, intermediate_size: Optional[int] = ...
    method _load_from_state_dict (line 621) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  class GemmaMLP (line 635) | class GemmaMLP(LLaMAMLP):
    method forward (line 636) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LLaMAMoE (line 643) | class LLaMAMoE(litgpt.model.LLaMAMoE):
    method __init__ (line 644) | def __init__(self, config: Config) -> None:
    method _load_from_state_dict (line 652) | def _load_from_state_dict(self, state_dict: Dict, prefix: str, *args: ...
  function merge_lora_weights (line 659) | def merge_lora_weights(model: GPT) -> None:

FILE: litgpt/model.py
  class GPT (line 22) | class GPT(nn.Module):
    method __init__ (line 23) | def __init__(self, config: Config) -> None:
    method max_seq_length (line 40) | def max_seq_length(self) -> int:
    method max_seq_length (line 44) | def max_seq_length(self, value: int) -> None:
    method reset_parameters (line 70) | def reset_parameters(self) -> None:
    method _init_weights (line 74) | def _init_weights(self, module: nn.Module) -> None:
    method forward (line 85) | def forward(
    method from_name (line 184) | def from_name(cls, name: str, **kwargs: Any) -> Self:
    method rope_cache (line 187) | def rope_cache(self, device: Optional[torch.device] = None) -> Tuple[t...
    method rope_cache_length (line 261) | def rope_cache_length(self) -> int:
    method set_kv_cache (line 274) | def set_kv_cache(
    method clear_kv_cache (line 303) | def clear_kv_cache(self) -> None:
  class Block (line 309) | class Block(nn.Module):
    method __init__ (line 310) | def __init__(
    method forward (line 345) | def forward(
  class CausalSelfAttention (line 390) | class CausalSelfAttention(nn.Module):
    method __init__ (line 391) | def __init__(self, config: Config, block_idx: int) -> None:
    method forward (line 430) | def forward(
    method scaled_dot_product_attention (line 576) | def scaled_dot_product_attention(
    method build_kv_cache (line 598) | def build_kv_cache(
    method _load_from_state_dict (line 637) | def _load_from_state_dict(self, state_dict: dict, prefix: str, *args: ...
  class MultiheadLatentAttention (line 649) | class MultiheadLatentAttention(nn.Module):
    method __init__ (line 650) | def __init__(self, config: Config, block_idx: int) -> None:
    method forward (line 685) | def forward(
    method scaled_dot_product_attention (line 763) | def scaled_dot_product_attention(
    method build_kv_cache (line 785) | def build_kv_cache(
  class GptNeoxMLP (line 804) | class GptNeoxMLP(nn.Module):
    method __init__ (line 805) | def __init__(self, config: Config, intermediate_size: Optional[int] = ...
    method forward (line 812) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LLaMAMLP (line 818) | class LLaMAMLP(nn.Module):
    method __init__ (line 819) | def __init__(self, config: Config, intermediate_size: Optional[int] = ...
    method forward (line 827) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class GemmaMLP (line 834) | class GemmaMLP(LLaMAMLP):
    method forward (line 835) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class LLaMAMoE (line 842) | class LLaMAMoE(nn.Module):
    method __init__ (line 843) | def __init__(self, config: Config) -> None:
    method forward (line 859) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  class GroupedTopkRouter (line 888) | class GroupedTopkRouter(nn.Module):
    method __init__ (line 894) | def __init__(self, config: Config) -> None:
    method get_topk_indices (line 901) | def get_topk_indices(self, scores: torch.Tensor) -> torch.Tensor:
    method forward (line 921) | def forward(self, x: torch.Tensor) -> torch.Tensor:
  function yarn_get_mscale (line 933) | def yarn_get_mscale(scale=1, mscale=1):
  function build_rope_cache (line 939) | def build_rope_cache(
  function batched_index_select (line 1075) | def batched_index_select(t, dim, idx):
  function batched_index_copy_ (line 1094) | def batched_index_copy_(t, dim, idx, val):
  function apply_rope (line 1144) | def apply_rope(x: torch.Tensor, cos: torch.Tensor, sin: torch.Tensor) ->...
  function apply_rope_interleave (line 1176) | def apply_rope_interleave(x: torch.Tensor, cos: torch.Tensor, sin: torch...
  function do_softcapping (line 1217) | def do_softcapping(x: torch.Tensor, thresh: float) -> torch.Tensor:
  class KVCache (line 1221) | class KVCache(nn.Module):
    method __init__ (line 1227) | def __init__(
    method forward (line 1243) | def forward(self, input_pos: torch.Tensor, k: torch.Tensor, v: torch.T...
    method reset_parameters (line 1283) | def reset_parameters(self) -> None:
  function build_mask_cache (line 1288) | def build_mask_cache(max_seq_length: int, device: Optional[torch.device]...
  class RMSNorm (line 1293) | class RMSNorm(torch.nn.Module):
    method __init__ (line 1300) | def __init__(self, size: int, dim: int = -1, eps: float = 1e-6, add_un...
    method forward (line 1307) | def forward(self, x: torch.Tensor) -> torch.Tensor:
    method reset_parameters (line 1316) | def reset_parameters(self) -> None:

FILE: litgpt/parser_config.py
  function parser_commands (line 8) | def parser_commands() -> List[str]:
  function save_hyperparameters (line 34) | def save_hyperparameters(

FILE: litgpt/pretrain.py
  function setup (line 49) | def setup(
  function main (line 177) | def main(
  function fit (line 288) | def fit(
  function validate (line 427) | def validate(
  function get_dataloaders (line 451) | def get_dataloaders(
  function get_lr (line 464) | def get_lr(learning_rate: float, it: int, warmup_iters: int, max_iters: ...
  function initialize_weights (line 478) | def initialize_weights(fabric: L.Fabric, model: GPT, n_layer: int, n_emb...
  function save_checkpoint (line 500) | def save_checkpoint(fabric, state, tokenizer_dir, checkpoint_file):
  function validate_args (line 512) | def validate_args(train: TrainArgs, eval: EvalArgs, initial_checkpoint_d...

FILE: litgpt/prompts.py
  class PromptStyle (line 17) | class PromptStyle:
    method apply (line 21) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 24) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
    method from_name (line 28) | def from_name(cls, name: str) -> "PromptStyle":
    method from_config (line 32) | def from_config(cls, config: Config) -> "PromptStyle":
  class Default (line 36) | class Default(PromptStyle):
    method apply (line 37) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 40) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class Alpaca (line 44) | class Alpaca(PromptStyle):
    method apply (line 45) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class FLAN (line 60) | class FLAN(PromptStyle):
    method apply (line 61) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Longform (line 69) | class Longform(PromptStyle):
    method apply (line 70) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class StableLMAlpha (line 78) | class StableLMAlpha(PromptStyle):
    method apply (line 79) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 89) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class StableLMZephyr (line 98) | class StableLMZephyr(PromptStyle):
    method apply (line 99) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Falcon (line 103) | class Falcon(PromptStyle):
    method apply (line 104) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 107) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class Falcon3 (line 117) | class Falcon3(PromptStyle):
    method apply (line 118) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 121) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class Llama2FunctionCalling (line 128) | class Llama2FunctionCalling(PromptStyle):
    method apply (line 129) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Llama2 (line 155) | class Llama2(PromptStyle):
    method apply (line 156) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Llama3 (line 170) | class Llama3(PromptStyle):
    method apply (line 171) | def apply(
    method stop_tokens (line 216) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class R1Base (line 223) | class R1Base(PromptStyle):
    method apply (line 224) | def apply(
    method stop_tokens (line 265) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class FreeWilly2 (line 272) | class FreeWilly2(PromptStyle):
    method apply (line 273) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Platypus (line 278) | class Platypus(PromptStyle):
    method apply (line 279) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class StableCode (line 283) | class StableCode(PromptStyle):
    method apply (line 284) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class CodeLlama (line 288) | class CodeLlama(PromptStyle):
    method apply (line 289) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi1 (line 300) | class Phi1(PromptStyle):
    method apply (line 301) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
    method stop_tokens (line 304) | def stop_tokens(self, tokenizer: "Tokenizer") -> Tuple[List[int], ...]:
  class Phi2 (line 315) | class Phi2(PromptStyle):
    method apply (line 316) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi3 (line 320) | class Phi3(PromptStyle):
    method apply (line 321) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi4 (line 326) | class Phi4(PromptStyle):
    method apply (line 327) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi4Reasoning (line 335) | class Phi4Reasoning(PromptStyle):
    method apply (line 336) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi4Mini (line 344) | class Phi4Mini(PromptStyle):
    method apply (line 345) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Phi4MiniReasoning (line 353) | class Phi4MiniReasoning(PromptStyle):
    method apply (line 354) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class TinyLlama (line 359) | class TinyLlama(PromptStyle):
    method apply (line 360) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Gemma (line 365) | class Gemma(PromptStyle):
    method apply (line 366) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class OLMo (line 370) | class OLMo(PromptStyle):
    method apply (line 371) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class ChatML (line 375) | class ChatML(PromptStyle):
    method __init__ (line 376) | def __init__(self, system_message: Optional[str] = None):
    method apply (line 379) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  class Qwen2_5 (line 386) | class Qwen2_5(ChatML):
    method __init__ (line 387) | def __init__(self):
  class Qwen2_5_Math (line 391) | class Qwen2_5_Math(ChatML):
    method __init__ (line 392) | def __init__(self):
  class QwQ (line 396) | class QwQ(ChatML):
    method __init__ (line 397) | def __init__(self):
  class Qwen3 (line 403) | class Qwen3(ChatML):
    method __init__ (line 404) | def __init__(self):
  class SmolLM2 (line 408) | class SmolLM2(ChatML):
    method __init__ (line 409) | def __init__(self):
  class Salamandra (line 413) | class Salamandra(ChatML):
    method __init__ (line 414) | def __init__(self):
  function model_name_to_prompt_style (line 456) | def model_name_to_prompt_style(model_name: str) -> PromptStyle:
  function save_prompt_style (line 520) | def save_prompt_style(style: Union[str, PromptStyle], checkpoint_dir: Pa...
  function load_prompt_style (line 529) | def load_prompt_style(checkpoint_dir: Path) -> PromptStyle:
  function has_prompt_style (line 539) | def has_prompt_style(checkpoint_dir: Path) -> bool:

FILE: litgpt/scripts/convert_hf_checkpoint.py
  function copy_weights_gpt_neox (line 28) | def copy_weights_gpt_neox(
  function copy_weights_falcon (line 81) | def copy_weights_falcon(
  function copy_weights_hf_llama (line 139) | def copy_weights_hf_llama(
  function copy_weights_gemma_2 (line 226) | def copy_weights_gemma_2(
  function copy_weights_gemma_3 (line 294) | def copy_weights_gemma_3(
  function copy_weights_phi (line 397) | def copy_weights_phi(
  function copy_weights_qwen_2_5 (line 493) | def copy_weights_qwen_2_5(
  function copy_weights_olmo2 (line 563) | def copy_weights_olmo2(
  function copy_weights_qwen_3 (line 642) | def copy_weights_qwen_3(
  function qkv_reassemble (line 727) | def qkv_reassemble(
  function layer_template (line 748) | def layer_template(layer_name: str, num_matches: int = 1) -> Tuple[str, ...
  function load_param (line 756) | def load_param(
  function convert_hf_checkpoint (line 772) | def convert_hf_checkpoint(

FILE: litgpt/scripts/convert_lit_checkpoint.py
  function copy_weights_falcon (line 18) | def copy_weights_falcon(
  function copy_weights_gpt_neox (line 66) | def copy_weights_gpt_neox(
  function copy_weights_llama (line 103) | def copy_weights_llama(
  function copy_weights_gemma_2 (line 169) | def copy_weights_gemma_2(
  function copy_weights_gemma_3 (line 218) | def copy_weights_gemma_3(
  function copy_weights_phi (line 269) | def copy_weights_phi(
  function copy_weights_qwen_2_5 (line 348) | def copy_weights_qwen_2_5(
  function copy_weights_olmo2 (line 396) | def copy_weights_olmo2(
  function copy_weights_qwen_3 (line 454) | def copy_weights_qwen_3(
  function qkv_reassemble (line 520) | def qkv_reassemble(param: Union[torch.Tensor, NotYetLoadedTensor], confi...
  function check_conversion_supported (line 538) | def check_conversion_supported(lit_weights: Dict[str, torch.Tensor]) -> ...
  function convert_lit_checkpoint (line 546) | def convert_lit_checkpoint(checkpoint_dir: Path, output_dir: Path) -> None:

FILE: litgpt/scripts/convert_pretrained_checkpoint.py
  function convert_pretrained_checkpoint (line 12) | def convert_pretrained_checkpoint(checkpoint_dir: Path, output_dir: Path...

FILE: litgpt/scripts/download.py
  function download_from_hub (line 14) | def download_from_hub(
  function find_weight_files (line 101) | def find_weight_files(repo_id: str, access_token: Optional[str]) -> Tupl...
  function gated_repo_catcher (line 114) | def gated_repo_catcher(repo_id: str, access_token: Optional[str]):

FILE: litgpt/scripts/merge_lora.py
  function merge_lora (line 17) | def merge_lora(
  function load_lora_metadata (line 86) | def load_lora_metadata(checkpoint_dir: Path) -> Tuple[Dict[str, Any], Pa...

FILE: litgpt/tokenizer.py
  class Tokenizer (line 12) | class Tokenizer:
    method __init__ (line 13) | def __init__(self, checkpoint_dir: Union[Path, str]) -> None:
    method vocab_size (line 73) | def vocab_size(self) -> int:
    method token_to_id (line 80) | def token_to_id(self, token: str) -> int:
    method check_if_bos_token_used (line 91) | def check_if_bos_token_used(self, checkpoint_dir: Path) -> bool:
    method encode (line 108) | def encode(
    method decode (line 144) | def decode(self, tensor: torch.Tensor) -> str:
    method decode_stream (line 155) | def decode_stream(

FILE: litgpt/utils.py
  function init_out_dir (line 47) | def init_out_dir(out_dir: Path) -> Path:
  function find_resume_path (line 55) | def find_resume_path(resume: Union[bool, Literal["auto"], Path], out_dir...
  function num_parameters (line 69) | def num_parameters(module: nn.Module, requires_grad: Optional[bool] = No...
  function reset_parameters (line 81) | def reset_parameters(module: nn.Module) -> None:
  function check_valid_checkpoint_dir (line 88) | def check_valid_checkpoint_dir(
  class SavingProxyForStorage (line 138) | class SavingProxyForStorage:
    method __init__ (line 139) | def __init__(self, obj, saver, protocol_version=5):
    method __reduce_ex__ (line 162) | def __reduce_ex__(self, protocol_version):
  class SavingProxyForTensor (line 166) | class SavingProxyForTensor:
    method __init__ (line 167) | def __init__(self, tensor, saver, protocol_version=5):
    method __reduce_ex__ (line 186) | def __reduce_ex__(self, protocol_version):
  class IncrementalPyTorchPickler (line 192) | class IncrementalPyTorchPickler(pickle.Pickler):
    method __init__ (line 193) | def __init__(self, saver, *args, **kwargs):
    method persistent_id (line 200) | def persistent_id(self, obj):
  class incremental_save (line 248) | class incremental_save:
    method __init__ (line 249) | def __init__(self, name):
    method __enter__ (line 256) | def __enter__(self):
    method store_early (line 259) | def store_early(self, tensor):
    method save (line 264) | def save(self, obj):
    method _write_storage_and_return_key (line 275) | def _write_storage_and_return_key(self, storage):
    method __exit__ (line 294) | def __exit__(self, type, value, traceback):
  function chunked_cross_entropy (line 301) | def chunked_cross_entropy(
  function map_old_state_dict_weights (line 353) | def map_old_state_dict_weights(state_dict: Dict, mapping: Mapping, prefi...
  function get_default_supported_precision (line 362) | def get_default_supported_precision(training: bool) -> str:
  function load_checkpoint (line 382) | def load_checkpoint(fabric: L.Fabric, model: nn.Module, checkpoint_path:...
  function load_checkpoint_update (line 400) | def load_checkpoint_update(
  function load_from_full_model_state_dict (line 413) | def load_from_full_model_state_dict(
  function flops_per_param (line 448) | def flops_per_param(max_seq_length: int, n_layer: int, n_embd: int, n_pa...
  function estimate_flops (line 457) | def estimate_flops(model: "GPT", training: bool) -> int:
  class CycleIterator (line 481) | class CycleIterator:
    method __init__ (line 493) | def __init__(self, iterable: Iterable) -> None:
    method __next__ (line 498) | def __next__(self) -> Any:
    method __iter__ (line 508) | def __iter__(self) -> Self:
  function copy_config_files (line 512) | def copy_config_files(source_dir: Path, out_dir: Path) -> None:
  function CLI (line 524) | def CLI(*args: Any, **kwargs: Any) -> Any:
  function capture_hparams (line 533) | def capture_hparams() -> Dict[str, Any]:
  function save_config (line 548) | def save_config(config: "Config", checkpoint_dir: Path) -> None:
  function parse_devices (line 554) | def parse_devices(devices: Union[str, int]) -> int:
  function choose_logger (line 562) | def choose_logger(
  function get_argument_names (line 609) | def get_argument_names(cls):
  function instantiate_bnb_optimizer (line 618) | def instantiate_bnb_optimizer(optimizer, model_parameters):
  function instantiate_torch_optimizer (line 635) | def instantiate_torch_optimizer(optimizer, model_parameters, **kwargs):
  function extend_checkpoint_dir (line 670) | def extend_checkpoint_dir(checkpoint_dir: Path) -> Path:
  function check_file_size_on_cpu_and_warn (line 681) | def check_file_size_on_cpu_and_warn(checkpoint_path, device, size_limit=...
  function auto_download_checkpoint (line 697) | def auto_download_checkpoint(model_name, access_token=None, ignore_token...
  function check_nvlink_connectivity (line 718) | def check_nvlink_connectivity(fabric=None):
  function _check_nvidia_connectivity (line 746) | def _check_nvidia_connectivity(custom_print):
  function _check_amd_connectivity (line 781) | def _check_amd_connectivity(custom_print):
  function fix_and_load_json (line 827) | def fix_and_load_json(s):
  function create_finetuning_performance_report (line 844) | def create_finetuning_performance_report(training_time, token_counts, de...
  function select_sft_generate_example (line 868) | def select_sft_generate_example(eval, data):
  function _RunIf (line 897) | def _RunIf(thunder: bool = False, **kwargs):
  function kill_process_tree (line 910) | def kill_process_tree(pid: int):

FILE: tests/conftest.py
  function fake_checkpoint_dir (line 23) | def fake_checkpoint_dir(tmp_path):
  class TensorLike (line 34) | class TensorLike:
    method __eq__ (line 35) | def __eq__(self, other):
  function tensor_like (line 40) | def tensor_like():
  class FloatLike (line 44) | class FloatLike:
    method __eq__ (line 45) | def __eq__(self, other):
  function float_like (line 50) | def float_like():
  function restore_default_dtype (line 55) | def restore_default_dtype():
  function destroy_process_group (line 61) | def destroy_process_group():
  function turn_off_tf32_and_set_seed (line 71) | def turn_off_tf32_and_set_seed(monkeypatch):
  class MockTokenizer (line 78) | class MockTokenizer:
    method encode (line 84) | def encode(self, text: str, bos: Optional[bool] = None, eos: bool = Fa...
    method decode (line 94) | def decode(self, tokens: torch.Tensor) -> str:
  function mock_tokenizer (line 99) | def mock_tokenizer():
  function alpaca_path (line 104) | def alpaca_path(tmp_path):
  function dolly_path (line 111) | def dolly_path(tmp_path):
  function longform_path (line 118) | def longform_path(tmp_path):
  function pytest_collection_modifyitems (line 128) | def pytest_collection_modifyitems(items: List[pytest.Function], config: ...

FILE: tests/convert/test_hf_checkpoint.py
  function test_llama2_70b_conversion (line 12) | def test_llama2_70b_conversion():
  function test_convert_hf_checkpoint (line 105) | def test_convert_hf_checkpoint(tmp_path, model_name):
  function test_qkv_reassemble (line 125) | def test_qkv_reassemble():

FILE: tests/convert/test_lit_checkpoint.py
  function test_convert_lit_checkpoint (line 42) | def test_convert_lit_checkpoint(tmp_path, model_name):
  function test_against_falcon_40b (line 64) | def test_against_falcon_40b():
  function test_against_original_gpt_neox (line 94) | def test_against_original_gpt_neox():
  function test_against_hf_llama2 (line 133) | def test_against_hf_llama2(ours_kwargs):
  function test_against_mixtral (line 167) | def test_against_mixtral(model_name):
  function test_against_olmo (line 209) | def test_against_olmo(model_name):
  function test_against_original_open_llama_3b (line 252) | def test_against_original_open_llama_3b():
  function test_against_hf_phi (line 281) | def test_against_hf_phi(model_name):
  function test_against_hf_phi_3 (line 316) | def test_against_hf_phi_3(model_name):
  function test_against_original_stablelm_zephyr_3b (line 354) | def test_against_original_stablelm_zephyr_3b():
  function test_against_original_gemma (line 402) | def test_against_original_gemma(model_name, device, dtype):
  function test_against_original_gemma_2 (line 462) | def test_against_original_gemma_2(model_name, device, dtype):
  function test_against_original_gemma_3 (line 535) | def test_against_original_gemma_3(model_name, device, dtype):
  function test_check_conversion_supported_adapter (line 590) | def test_check_conversion_supported_adapter():
  function test_check_conversion_supported_lora (line 600) | def test_check_conversion_supported_lora():
  function test_against_original_qwen_2_5 (line 634) | def test_against_original_qwen_2_5(model_name, device, dtype):
  function test_qkv_reassemble (line 681) | def test_qkv_reassemble():

FILE: tests/convert/test_pretrained_checkpoint.py
  function test_convert_pretrained_checkpoint (line 10) | def test_convert_pretrained_checkpoint(tmp_path, fake_checkpoint_dir):

FILE: tests/data/test_alpaca.py
  function test_alpaca (line 6) | def test_alpaca(mock_tokenizer, alpaca_path):

FILE: tests/data/test_base.py
  function test_sft_dataset (line 15) | def test_sft_dataset(max_seq_length, ignore_index, mask_prompt, mock_tok...
  function test_sft_collate_fn_padding (line 49) | def test_sft_collate_fn_padding(pad_id, ignore_index):
  function test_sft_collate_fn_truncation (line 74) | def test_sft_collate_fn_truncation():

FILE: tests/data/test_deita.py
  function test_format_dataset (line 9) | def test_format_dataset():
  function test_deita (line 47) | def test_deita(_, format_dataset_mock, mock_tokenizer, tmp_path):

FILE: tests/data/test_json.py
  function test_json (line 12) | def test_json(as_jsonl, tmp_path, mock_tokenizer):
  function test_json_input_validation (line 69) | def test_json_input_validation(tmp_path):
  function test_json_with_splits (line 95) | def test_json_with_splits(as_jsonl, tmp_path, mock_tokenizer):

FILE: tests/data/test_lit_data.py
  function test_input_dir_and_splits (line 13) | def test_input_dir_and_splits(dl_mock, tmp_path):
  function test_dataset_args (line 42) | def test_dataset_args(streaming_dataloader_mock, streaming_dataset_mock,...

FILE: tests/data/test_longform.py
  function test_longform (line 6) | def test_longform(mock_tokenizer, longform_path):

FILE: tests/data/test_openwebtext.py
  function test_openwebtext (line 17) | def test_openwebtext(_, __, optimize_mock, tmp_path, mock_tokenizer):

FILE: tests/data/test_textfiles.py
  class Tokenizer (line 10) | class Tokenizer:
    method encode (line 13) | def encode(self, text, bos, eos):
  function tokenize (line 19) | def tokenize(data):
  function fake_chunk (line 24) | def fake_chunk(path, data):
  function test_textfiles_datamodule (line 35) | def test_textfiles_datamodule(tmp_path):
  class MockTokenizer (line 71) | class MockTokenizer:
    method encode (line 76) | def encode(self, text, bos=True, eos=False, device=None, max_length=-1):
    method decode (line 87) | def decode(self, tensor):
    method decode_stream (line 99) | def decode_stream(self, token_stream, device=None):
    method vocab_size (line 104) | def vocab_size(self):
  function test_textfiles_token_loader (line 108) | def test_textfiles_token_loader(tmp_path):

FILE: tests/data/test_tinyllama.py
  function test_tinyllama (line 12) | def test_tinyllama(_, tmp_path):

FILE: tests/data/test_tinystories.py
  function tokenize (line 10) | def tokenize(data):
  function fake_chunk (line 15) | def fake_chunk(path, data):
  function test_pretok_dataset (line 35) | def test_pretok_dataset(tmp_path, max_seq_len, expected):
  function test_tokenize (line 47) | def test_tokenize(tmp_path, monkeypatch):
  function test_tinystories_datamodule (line 70) | def test_tinystories_datamodule(tmp_path):

FILE: tests/ext_thunder/test_thunder_distributed.py
  function test_thunder_strategy_ddp_input_parsing (line 24) | def test_thunder_strategy_ddp_input_parsing():
  function test_no_backward_sync_thunder (line 32) | def test_no_backward_sync_thunder(choice):
  function test_jit_ddp_before_setup (line 81) | def test_jit_ddp_before_setup(jit):
  function test_strategy_ddp_setup_already_traced (line 98) | def test_strategy_ddp_setup_already_traced():
  function test_thunder_strategy_fsdp_input_parsing (line 114) | def test_thunder_strategy_fsdp_input_parsing():
  function test_save_checkpoint_invalid_settings_raise (line 127) | def test_save_checkpoint_invalid_settings_raise(tmp_path):
  class Submodule (line 160) | class Submodule(torch.nn.Module):
    method __init__ (line 161) | def __init__(self, h: int):
    method forward (line 165) | def forward(self, x):
  class MyModel (line 170) | class MyModel(torch.nn.Module):
    method __init__ (line 171) | def __init__(self, h: int):
    method forward (line 177) | def forward(self):
    method reset_parameters (line 181) | def reset_parameters(self):
  function test_materialize_meta_tensors (line 187) | def test_materialize_meta_tensors():
  class StatefulThing (line 203) | class StatefulThing:
    method state_dict (line 204) | def state_dict(self):
    method load_state_dict (line 207) | def load_state_dict(self, state_dict):
  class TensorLike (line 211) | class TensorLike:
    method __init__ (line 212) | def __init__(self, device: Optional[Union[str, torch.device]] = None, ...
    method __eq__ (line 216) | def __eq__(self, other):
  function test_save_load_full_checkpoint (line 226) | def test_save_load_full_checkpoint(tmp_path):
  function test_load_full_checkpoint_only_model (line 278) | def test_load_full_checkpoint_only_model(tmp_path):
  function distributed_ckpt_to_regular (line 312) | def distributed_ckpt_to_regular(path):
  function test_save_load_sharded_checkpoint (line 348) | def test_save_load_sharded_checkpoint(tmp_path):
  function test_jit_fsdp_before_setup (line 403) | def test_jit_fsdp_before_setup(jit):
  function test_strategy_fsdp_setup_already_traced (line 420) | def test_strategy_fsdp_setup_already_traced():

FILE: tests/ext_thunder/test_thunder_pretrain.py
  function test_pretrain_thunder (line 19) | def test_pretrain_thunder(tmp_path, monkeypatch):

FILE: tests/ext_thunder/test_unsloth_executor.py
  function test_unsloth_cross_entropy (line 11) | def test_unsloth_cross_entropy(reduction):
  function test_unsloth_rope (line 46) | def test_unsloth_rope():
  function test_unsloth_swiglu (line 76) | def test_unsloth_swiglu():
  function test_unsloth_gpt (line 106) | def test_unsloth_gpt():

FILE: tests/generate/test_adapter.py
  function test_main (line 23) | def test_main(fake_checkpoint_dir, monkeypatch, version, tensor_like):
  function test_cli (line 72) | def test_cli(version):

FILE: tests/generate/test_main.py
  function test_generate (line 29) | def test_generate(max_seq_length):
  function test_main (line 61) | def test_main(fake_checkpoint_dir, monkeypatch, tensor_like):
  function test_cli (line 105) | def test_cli():
  function test_sample (line 113) | def test_sample(temperature):
  function test_generate_different_results_with_different_top_p (line 129) | def test_generate_different_results_with_different_top_p():

FILE: tests/generate/test_sequentially.py
  function test_layer_to_device (line 40) | def test_layer_to_device(n_layer, devices, expected):
  function path_to_device (line 50) | def path_to_device(model):
  function test_replace_device (line 54) | def test_replace_device():
  function _test_model_1device (line 98) | def _test_model_1device(accelerator):
  function test_model_1device_cuda (line 145) | def test_model_1device_cuda():
  function test_model_1device_cpu (line 149) | def test_model_1device_cpu():
  function test_model_forward_hooks (line 154) | def test_model_forward_hooks():
  function test_base_with_sequentially (line 269) | def test_base_with_sequentially(tmp_path):
  function test_cli (line 296) | def test_cli():

FILE: tests/generate/test_tp.py
  function test_tensor_parallel_linear (line 19) | def test_tensor_parallel_linear():
  function test_tensor_parallel_llama (line 87) | def test_tensor_parallel_llama(name, expected):
  function test_tp (line 110) | def test_tp(tmp_path):
  function test_cli (line 136) | def test_cli():

FILE: tests/generate/utils.py
  function find_forward_hooks (line 4) | def find_forward_hooks(module):

FILE: tests/test_adapter.py
  function test_config_identical (line 32) | def test_config_identical():
  function test_adapter_filter (line 46) | def test_adapter_filter(tmp_path):
  function test_adapter_script (line 63) | def test_adapter_script(tmp_path, fake_checkpoint_dir, monkeypatch, alpa...
  function test_adapter_gpt_init_weights (line 110) | def test_adapter_gpt_init_weights():
  function test_adapter_compile (line 124) | def test_adapter_compile():
  function test_adapter_bitsandbytes (line 143) | def test_adapter_bitsandbytes(monkeypatch, tmp_path, fake_checkpoint_dir...
  function test_against_hf_gemma (line 255) | def test_against_hf_gemma(model_name):
  function test_against_original_gemma_2 (line 312) | def test_against_original_gemma_2(model_name, device, dtype):
  function test_against_original_gemma_3 (line 383) | def test_against_original_gemma_3(model_name, device, dtype):
  function test_load_legacy_state_dict (line 436) | def test_load_legacy_state_dict():

FILE: tests/test_adapter_v2.py
  function test_config_identical (line 33) | def test_config_identical():
  function test_adapter_v2_filter (line 45) | def test_adapter_v2_filter(tmp_path):
  function test_adapter_v2_script (line 80) | def test_adapter_v2_script(tmp_path, fake_checkpoint_dir, monkeypatch, a...
  function test_adapter_v2_gpt_init_weights (line 127) | def test_adapter_v2_gpt_init_weights():
  function test_base_model_can_be_adapter_v2_loaded (line 140) | def test_base_model_can_be_adapter_v2_loaded(name):
  function test_adapter_v2_compile (line 153) | def test_adapter_v2_compile():
  function test_against_hf_mixtral (line 172) | def test_against_hf_mixtral():
  function test_against_hf_gemma (line 218) | def test_against_hf_gemma(model_name):
  function test_against_original_gemma_2 (line 262) | def test_against_original_gemma_2(model_name):
  function test_against_original_gemma_3 (line 326) | def test_against_original_gemma_3(model_name):
  function test_adapter_v2_bitsandbytes (line 386) | def test_adapter_v2_bitsandbytes(monkeypatch, tmp_path, fake_checkpoint_...
  function test_load_legacy_state_dict (line 542) | def test_load_legacy_state_dict():

FILE: tests/test_api.py
  function mock_llm (line 33) | def mock_llm():
  function test_load_model (line 43) | def test_load_model(mock_llm):
  function test_generate (line 52) | def test_generate(mock_llm):
  function test_stream_generate (line 60) | def test_stream_generate(mock_llm):
  function test_generate_token_ids (line 73) | def test_generate_token_ids(mock_llm):
  function test_calculate_number_of_devices (line 83) | def test_calculate_number_of_devices():
  function test_llm_load_random_init (line 89) | def test_llm_load_random_init(tmp_path):
  function test_llm_load_hub_init (line 115) | def test_llm_load_hub_init(tmp_path):
  function test_model_not_initialized (line 128) | def test_model_not_initialized(tmp_path):
  function test_more_than_1_device_for_sequential_gpu (line 141) | def test_more_than_1_device_for_sequential_gpu(tmp_path):
  function test_more_than_1_device_for_tensor_parallel_gpu (line 174) | def test_more_than_1_device_for_tensor_parallel_gpu(tmp_path):
  function test_sequential_tp_incompatibility_with_random_weights (line 188) | def test_sequential_tp_incompatibility_with_random_weights(strategy, tmp...
  function test_sequential_tp_cpu (line 201) | def test_sequential_tp_cpu(strategy, tmp_path):
  function test_initialization_for_trainer (line 213) | def test_initialization_for_trainer(tmp_path):
  function test_quantization_is_applied (line 225) | def test_quantization_is_applied(tmp_path):
  function test_fixed_kv_cache (line 236) | def test_fixed_kv_cache(tmp_path):
  function test_invalid_accelerator (line 248) | def test_invalid_accelerator(tmp_path):
  function test_returned_benchmark_dir (line 254) | def test_returned_benchmark_dir(tmp_path):
  function test_benchmark_dict_to_markdown_table_single_values (line 276) | def test_benchmark_dict_to_markdown_table_single_values():
  function test_benchmark_dict_to_markdown_table_multiple_values (line 298) | def test_benchmark_dict_to_markdown_table_multiple_values():
  function test_state_dict (line 364) | def test_state_dict(tmp_path):
  function test_save_method (line 373) | def test_save_method(tmp_path):
  function test_forward_method (line 397) | def test_forward_method(tmp_path):
  function test_precision_selection (line 411) | def test_precision_selection(tmp_path):

FILE: tests/test_args.py
  function test_compute_warmup_iters (line 7) | def test_compute_warmup_iters():

FILE: tests/test_batch.py
  function create_llm (line 22) | def create_llm(tmp_path, batch_size, max_seq_length, device) -> tuple[LL...
  function test_batched_equivalence (line 40) | def test_batched_equivalence(tmp_path):
  function test_simple_batch (line 94) | def test_simple_batch():
  function test_batch_generate (line 133) | def test_batch_generate(tmp_path):
  function test_batch_generate_equivalence (line 257) | def test_batch_generate_equivalence(tmp_path):

FILE: tests/test_chat.py
  function test_generate (line 39) | def test_generate(monkeypatch, generated, stop_tokens, expected):
  function test_decode (line 69) | def test_decode():
  function test_main (line 94) | def test_main(mocked_input, stop_iteration, fake_checkpoint_dir, monkeyp...
  function test_cli (line 134) | def test_cli():
  function test_merge_lora_if_needed (line 144) | def test_merge_lora_if_needed(mocked_merge_lora, mocked_input, fake_chec...
  function test_litgpt_chat_endtoend (line 166) | def test_litgpt_chat_endtoend():
  function test_litgpt_generate_endtoend (line 191) | def test_litgpt_generate_endtoend():

FILE: tests/test_ci.py
  function test_gpu_ci_installs_bitsandbytes (line 9) | def test_gpu_ci_installs_bitsandbytes():

FILE: tests/test_cli.py
  function test_cli (line 12) | def test_cli():
  function test_pretrain_allows_max_steps (line 60) | def test_pretrain_allows_max_steps():
  function test_rewrite_finetune_command (line 79) | def test_rewrite_finetune_command():

FILE: tests/test_config.py
  function test_config (line 11) | def test_config():
  function test_from_hf_name (line 29) | def test_from_hf_name():
  function test_nonexisting_name (line 39) | def test_nonexisting_name():
  function test_short_and_hf_names_are_equal_unless_on_purpose (line 45) | def test_short_and_hf_names_are_equal_unless_on_purpose(config):
  function test_from_hf_name_with_org_string (line 53) | def test_from_hf_name_with_org_string():
  function test_from_checkpoint (line 72) | def test_from_checkpoint(tmp_path):
  function test_head_size (line 103) | def test_head_size(head_size):
  function test_find_multiple (line 109) | def test_find_multiple():

FILE: tests/test_config_hub.py
  function test_config_help (line 39) | def test_config_help(script_file, config_file, monkeypatch):

FILE: tests/test_deepseek_moe.py
  function test_deepseek_moe_litgpt_vs_hf (line 15) | def test_deepseek_moe_litgpt_vs_hf(batch_size, seq_len, device):
  function sync_weights (line 94) | def sync_weights(litgpt_model, hf_model):

FILE: tests/test_distributed.py
  function test_no_backward_sync (line 10) | def test_no_backward_sync(strategy):

FILE: tests/test_evaluate.py
  function test_evaluate_script (line 19) | def test_evaluate_script(tmp_path):
  function test_cli (line 72) | def test_cli():

FILE: tests/test_full.py
  function test_full_script (line 18) | def test_full_script(tmp_path, fake_checkpoint_dir, monkeypatch, alpaca_...

FILE: tests/test_generate_speculatively.py
  function test_speculative_decoding_target_never_accepts_draft_tokens (line 19) | def test_speculative_decoding_target_never_accepts_draft_tokens():
  function test_speculative_decoding_target_always_accepts_draft_tokens (line 45) | def test_speculative_decoding_target_always_accepts_draft_tokens():
  function test_speculative_decoding_target_sometimes_accepts_draft_tokens (line 71) | def test_speculative_decoding_target_sometimes_accepts_draft_tokens():
  function test_generate (line 106) | def test_generate(max_seq_length, speculative_k):
  function test_main (line 130) | def test_main(fake_checkpoint_dir, monkeypatch, tensor_like):
  function test_cli (line 211) | def test_cli():

FILE: tests/test_lora.py
  function test_lora_layer_replacement (line 45) | def test_lora_layer_replacement():
  function test_lora_merge (line 55) | def test_lora_merge():
  function test_lora_mqa_gqa (line 99) | def test_lora_mqa_gqa():
  function test_lora_ind_correctness (line 186) | def test_lora_ind_correctness(n_head, n_query_groups, enable_lora):
  function test_lora_filter (line 227) | def test_lora_filter(tmp_path):
  function test_lora_script (line 246) | def test_lora_script(tmp_path, fake_checkpoint_dir, monkeypatch, alpaca_...
  function test_lora_init_when_linear_overridden (line 293) | def test_lora_init_when_linear_overridden():
  function test_lora_linear_utilization (line 318) | def test_lora_linear_utilization(apply_to, target_layer_names, mlp_class...
  function test_lora_gpt_apply_lora_forward_no_exception (line 354) | def test_lora_gpt_apply_lora_forward_no_exception(apply_to):
  function test_lora_gpt_query_groups_merge_and_forward_no_exception (line 368) | def test_lora_gpt_query_groups_merge_and_forward_no_exception(n_query_gr...
  function test_lora_qkv_linear_compare_conv1d (line 406) | def test_lora_qkv_linear_compare_conv1d(head_size, n_head, enable_lora):
  function test_lora_linear_weights_merged_status (line 430) | def test_lora_linear_weights_merged_status(rank, expected_merged):
  function test_lora_qkv_linear_weights_merged_status (line 441) | def test_lora_qkv_linear_weights_merged_status(rank, enable_lora, expect...
  function test_lora_merge_with_bitsandbytes (line 450) | def test_lora_merge_with_bitsandbytes():
  function test_lora_gpt_init_weights (line 517) | def test_lora_gpt_init_weights():
  function test_base_model_can_be_lora_loaded (line 530) | def test_base_model_can_be_lora_loaded(name):
  function test_lora_compile (line 553) | def test_lora_compile():
  function test_against_hf_mixtral (line 584) | def test_against_hf_mixtral():
  function test_against_hf_gemma (line 635) | def test_against_hf_gemma(model_name):
  function test_against_original_gemma_2 (line 690) | def test_against_original_gemma_2(model_name):
  function test_against_original_gemma_3 (line 746) | def test_against_original_gemma_3(model_name):
  function test_lora_bitsandbytes (line 800) | def test_lora_bitsandbytes(monkeypatch, tmp_path, fake_checkpoint_dir, a...
  function test_lora_model_fsdp_init (line 924) | def test_lora_model_fsdp_init():
  function test_zero_pad_cpu_and_mocked_mps (line 958) | def test_zero_pad_cpu_and_mocked_mps():
  function test_load_legacy_state_dict (line 997) | def test_load_legacy_state_dict():
  function test_parallelize_fn (line 1016) | def test_parallelize_fn():
  function test_load_from_full_model_state_dict (line 1089) | def test_load_from_full_model_state_dict():

FILE: tests/test_merge_lora.py
  function test_merge_lora (line 24) | def test_merge_lora(tmp_path, fake_checkpoint_dir, pretrained_dtype, lor...
  function test_load_lora_metadata (line 77) | def test_load_lora_metadata(fake_checkpoint_dir):

FILE: tests/test_model.py
  function test_against_gpt_neox_model (line 76) | def test_against_gpt_neox_model(rotary_pct, batch_size, n_embd, parallel...
  function test_against_hf_falcon (line 145) | def test_against_hf_falcon(kwargs, device, dtype):
  function test_against_original_open_llama_3b (line 191) | def test_against_original_open_llama_3b(device, dtype):
  function test_against_hf_llama_2_and_3 (line 255) | def test_against_hf_llama_2_and_3(ours_kwargs, device, dtype):
  function test_against_hf_phi (line 304) | def test_against_hf_phi(model_name, device, dtype):
  function test_against_hf_phi_3 (line 364) | def test_against_hf_phi_3(model_name, device, dtype):
  function test_against_mistral_hf_models (line 429) | def test_against_mistral_hf_models(device, dtype, model_name):
  function test_against_mathstral_hf_models (line 493) | def test_against_mathstral_hf_models(device, dtype):
  function test_against_hf_mixtral (line 538) | def test_against_hf_mixtral(model_name):
  function test_against_olmo (line 599) | def test_against_olmo(model_name, device, dtype):
  function test_against_olmo2 (line 658) | def test_against_olmo2(model_name, device, dtype):
  function test_against_original_stablelm_zephyr_3b (line 717) | def test_against_original_stablelm_zephyr_3b(device, dtype):
  function test_against_original_gemma (line 768) | def test_against_original_gemma(model_name, device, dtype):
  function test_against_original_gemma_2 (line 825) | def test_against_original_gemma_2(model_name, device, dtype):
  function test_against_original_gemma_3 (line 895) | def test_against_original_gemma_3(model_name, device, dtype):
  function test_against_multimodal_gemma_3 (line 966) | def test_against_multimodal_gemma_3(model_name, device, dtype):
  function test_against_original_qwen_2_5 (line 1040) | def test_against_original_qwen_2_5(model_name, device, dtype):
  function test_against_original_qwen_3 (line 1113) | def test_against_original_qwen_3(model_name, device, dtype):
  function test_against_original_qwen_3_moe (line 1174) | def test_against_original_qwen_3_moe(model_name, device, dtype):
  function test_against_original_salamandra (line 1240) | def test_against_original_salamandra(model_name, device, dtype):
  function test_against_original_smollm2 (line 1300) | def test_against_original_smollm2(model_name, device, dtype):
  function test_against_hf_falcon3 (line 1360) | def test_against_hf_falcon3(model_name, device, dtype):
  function test_model_compile (line 1404) | def test_model_compile():
  function test_kv_cache (line 1427) | def test_kv_cache(max_seq_length):
  function test_model_kv_cache_amp (line 1458) | def test_model_kv_cache_amp():
  function test_rope_cache_length (line 1469) | def test_rope_cache_length(model_name):
  function test_sdpa_choice (line 1491) | def test_sdpa_choice(config):
  function test_sdpa_choice_kv_cache (line 1543) | def test_sdpa_choice_kv_cache(config):
  function test_rope_init_under_fsdp (line 1595) | def test_rope_init_under_fsdp():
  function test_reset_parameters_device (line 1614) | def test_reset_parameters_device():
  function test_batched_index_copy_modes (line 1622) | def test_batched_index_copy_modes():
  function test_load_legacy_state_dict (line 1684) | def test_load_legacy_state_dict():
  function test_kv_cache_buffer_shape (line 1708) | def test_kv_cache_buffer_shape(n_query_groups):
  function test_rope_cos_sin_shapes_if_rope_n_elem_is_odd (line 1732) | def test_rope_cos_sin_shapes_if_rope_n_elem_is_odd(rotary_percentage, fi...
  function test_forward_with_without_input_pos_maxp1 (line 1748) | def test_forward_with_without_input_pos_maxp1():

FILE: tests/test_multihead_latent_attention.py
  function test_multihead_latent_attention_kv_cache (line 12) | def test_multihead_latent_attention_kv_cache():
  function test_multihead_latent_attention_with_mask (line 40) | def test_multihead_latent_attention_with_mask():
  function test_multihead_latent_attention_litgpt_vs_hf (line 78) | def test_multihead_latent_attention_litgpt_vs_hf(batch_size, seq_len, de...
  function sync_weights (line 139) | def sync_weights(litgpt_model, hf_model):

FILE: tests/test_pretrain.py
  function test_optimizer_args (line 23) | def test_optimizer_args(_, tmp_path):
  function test_pretrain (line 49) | def test_pretrain(_, tmp_path):
  function test_initial_checkpoint_dir (line 93) | def test_initial_checkpoint_dir(_, load_mock, tmp_path):
  function test_initialize_weights (line 113) | def test_initialize_weights(strategy, expected):

FILE: tests/test_prompts.py
  function test_default_prompt_style (line 23) | def test_default_prompt_style(mock_tokenizer):
  function test_sys_prompt (line 31) | def test_sys_prompt(mock_tokenizer, sys_prompt: Optional[str]):
  function test_sys_prompt_with_kwargs (line 41) | def test_sys_prompt_with_kwargs(mock_tokenizer, sys_prompt: Optional[str]):
  function test_prompt_style_from_name (line 50) | def test_prompt_style_from_name():
  function test_prompt_style_from_config (line 55) | def test_prompt_style_from_config():
  function test_apply_prompts (line 93) | def test_apply_prompts():
  class CustomPromptStyle (line 104) | class CustomPromptStyle(PromptStyle):
    method apply (line 105) | def apply(self, prompt: str, *, sys_prompt: Optional[str] = None, **kw...
  function test_save_load_prompt_style (line 109) | def test_save_load_prompt_style(tmp_path):
  function test_multiturn_prompt (line 133) | def test_multiturn_prompt():

FILE: tests/test_readme.py
  function run_command (line 22) | def run_command(command):
  function _wait_and_check_response (line 37) | def _wait_and_check_response(waiting: int = 30):
  function test_download_model (line 54) | def test_download_model():
  function test_download_books (line 71) | def test_download_books():
  function test_chat_with_model (line 86) | def test_chat_with_model():
  function test_chat_with_quantized_model (line 95) | def test_chat_with_quantized_model():
  function test_finetune_model (line 105) | def test_finetune_model(tmp_path):
  function test_pretrain_model (line 151) | def test_pretrain_model(tmp_path):
  function test_continue_pretrain_model (line 188) | def test_continue_pretrain_model(tmp_path):
  function test_serve (line 220) | def test_serve():

FILE: tests/test_rope.py
  function test_rope_gptneox (line 14) | def test_rope_gptneox():
  function test_rope_llama_2 (line 36) | def test_rope_llama_2():
  function test_rope_llama_3 (line 82) | def test_rope_llama_3():
  function test_rope_llama_3_1 (line 128) | def test_rope_llama_3_1():
  function test_rope_llama_3_2 (line 181) | def test_rope_llama_3_2():
  function test_rope_gemma_3 (line 234) | def test_rope_gemma_3():
  function test_rope_cos_sin_shapes_if_rope_n_elem_is_odd (line 284) | def test_rope_cos_sin_shapes_if_rope_n_elem_is_odd():

FILE: tests/test_serve.py
  function _wait_and_check_response (line 22) | def _wait_and_check_response(waiting: int = 30):
  function test_simple (line 40) | def test_simple(tmp_path):
  function test_quantize (line 75) | def test_quantize(tmp_path):
  function test_multi_gpu_serve (line 110) | def test_multi_gpu_serve(tmp_path):
  function test_serve_with_openai_spec_missing_chat_template (line 145) | def test_serve_with_openai_spec_missing_chat_template(tmp_path):
  function test_serve_with_openai_spec (line 180) | def test_serve_with_openai_spec(tmp_path):
  function test_serve_with_generate_strategy (line 266) | def test_serve_with_generate_strategy(tmp_path, generate_strategy):

FILE: tests/test_tokenizer.py
  function test_tokenizer_against_hf (line 21) | def test_tokenizer_against_hf(config, tmp_path):
  function test_tokenizer_input_validation (line 89) | def test_tokenizer_input_validation():
  function test_tokenizer_bos_eos (line 99) | def test_tokenizer_bos_eos(

FILE: tests/test_trainer_support.py
  class LitLLM (line 17) | class LitLLM(L.LightningModule):
    method __init__ (line 18) | def __init__(self, checkpoint_dir, tokenizer_dir=None, trainer_ckpt_pa...
    method setup (line 24) | def setup(self, stage):
    method training_step (line 27) | def training_step(self, batch):
    method validation_step (line 32) | def validation_step(self, batch):
    method configure_optimizers (line 37) | def configure_optimizers(self):
  function test_download_model (line 45) | def test_download_model():
  function test_usecase1_pretraining_from_random_weights (line 51) | def test_usecase1_pretraining_from_random_weights(tmp_path):
  function test_usecase2_continued_pretraining_from_checkpoint (line 75) | def test_usecase2_continued_pretraining_from_checkpoint(tmp_path):
  function test_usecase3_resume_from_trainer_checkpoint (line 95) | def test_usecase3_resume_from_trainer_checkpoint(tmp_path):
  function test_usecase4_manually_save_and_resume (line 132) | def test_usecase4_manually_save_and_resume(tmp_path):

FILE: tests/test_types.py
  function test_logger_types_match_constants (line 8) | def test_logger_types_match_constants():

FILE: tests/test_utils.py
  function test_check_valid_checkpoint_dir (line 55) | def test_check_valid_checkpoint_dir(tmp_path):
  function test_incremental_write (line 104) | def test_incremental_write(tmp_path):
  function test_chunked_cross_entropy (line 129) | def test_chunked_cross_entropy(ignore_index, B):
  function test_num_parameters (line 165) | def test_num_parameters():
  function test_num_parameters_bitsandbytes (line 180) | def test_num_parameters_bitsandbytes(mode):
  function test_cycle_iterator (line 193) | def test_cycle_iterator():
  function test_parse_devices (line 210) | def test_parse_devices():
  function test_copy_config_files (line 228) | def test_copy_config_files(fake_checkpoint_dir, tmp_path):
  function test_capture_hparams (line 235) | def test_capture_hparams():
  function _test_function (line 255) | def _test_function(out_dir: Path, foo: bool = False, bar: int = 1):
  function test_save_hyperparameters (line 259) | def test_save_hyperparameters(tmp_path):
  function _test_function2 (line 271) | def _test_function2(out_dir: Path, foo: bool = False, bar: int = 1):
  function test_save_hyperparameters_known_commands (line 287) | def test_save_hyperparameters_known_commands(command, tmp_path):
  function test_choose_logger (line 299) | def test_choose_logger(tmp_path):
  function test_init_out_dir (line 322) | def test_init_out_dir(path_type, input_path, expected):
  function test_find_resume_path (line 337) | def test_find_resume_path(tmp_path):
  function model_parameters (line 365) | def model_parameters():
  function test_instantiate_bnb_optimizer_with_str (line 369) | def test_instantiate_bnb_optimizer_with_str(model_parameters):
  function test_instantiate_bnb_optimizer_with_dict (line 377) | def test_instantiate_bnb_optimizer_with_dict(model_parameters):
  function test_instantiate_bnb_optimizer_with_invalid_str (line 387) | def test_instantiate_bnb_optimizer_with_invalid_str(model_parameters):
  function test_instantiate_torch_optimizer_with_str (line 392) | def test_instantiate_torch_optimizer_with_str(model_parameters):
  function test_instantiate_torch_optimizer_with_class (line 398) | def test_instantiate_torch_optimizer_with_class(model_parameters):
  function test_extend_checkpoint_dir_is_prefixed (line 414) | def test_extend_checkpoint_dir_is_prefixed(input_path, expected):
  function test_extend_checkpoint_dir (line 438) | def test_extend_checkpoint_dir(input_path, expected):
  function test_extend_checkpoint_dir_dont_exist (line 462) | def test_extend_checkpoint_dir_dont_exist(input_path, expected):
  function test_file_size_below_limit_on_cpu (line 466) | def test_file_size_below_limit_on_cpu():
  function test_file_size_above_limit_on_cpu (line 474) | def test_file_size_above_limit_on_cpu():
  function test_file_size_above_limit_on_gpu (line 484) | def test_file_size_above_limit_on_gpu():
  function mock_cuda_is_available_true (line 493) | def mock_cuda_is_available_true(monkeypatch):
  function mock_nvidia_device_properties (line 499) | def mock_nvidia_device_properties(monkeypatch):
  function mock_amd_device_properties (line 507) | def mock_amd_device_properties(monkeypatch):
  function all_nvlink_connected_output (line 515) | def all_nvlink_connected_output():
  function test_all_nvlink_connected (line 527) | def test_all_nvlink_connected(
  function nvlink_partially_connected_output (line 537) | def nvlink_partially_connected_output():
  function test_nvlink_partially_connected_output (line 554) | def test_nvlink_partially_connected_output(
  function nvlink_not_connected_output (line 567) | def nvlink_not_connected_output():
  function test_nvlink_not_connected_output (line 589) | def test_nvlink_not_connected_output(
  function nvlink_all_gpu_connected_but_other_connected_output (line 602) | def nvlink_all_gpu_connected_but_other_connected_output():
  function test_nvlink_all_gpu_connected_but_other_connected_output (line 653) | def test_nvlink_all_gpu_connected_but_other_connected_output(
  function nvidia_smi_nvlink_output_dual_gpu_no_numa (line 666) | def nvidia_smi_nvlink_output_dual_gpu_no_numa():
  function test_check_nvlink_connectivity__returns_fully_connected_when_nvidia_all_nvlink_two_gpus (line 688) | def test_check_nvlink_connectivity__returns_fully_connected_when_nvidia_...
  function rocm_smi_xgmi_output_multi_gpu (line 698) | def rocm_smi_xgmi_output_multi_gpu():
  function test_check_nvlink_connectivity__returns_fully_connected_when_amd_all_xgmi_8_gpus (line 722) | def test_check_nvlink_connectivity__returns_fully_connected_when_amd_all...
  function test_check_nvlink_connectivity__returns_no_gpus_when_no_gpus (line 732) | def test_check_nvlink_connectivity__returns_no_gpus_when_no_gpus(mock_ru...
  function test_check_nvlink_connectivity__returns_unrecognized_vendor_when_unrecognized_vendor (line 740) | def test_check_nvlink_connectivity__returns_unrecognized_vendor_when_unr...
  function test_fix_and_load_json (line 751) | def test_fix_and_load_json():
  function test_select_sft_generate_example (line 805) | def test_select_sft_generate_example():

FILE: tests/test_yarn.py
  function test_deepseek_v3_block_with_yarn (line 15) | def test_deepseek_v3_block_with_yarn(batch_size, seq_len, device):
  function sync_weights (line 177) | def sync_weights(litgpt_model, hf_model):
  function sync_block_weights (line 191) | def sync_block_weights(block_litgpt, block_hf):

FILE: tutorials/examples/ptl-trainer/litgpt_ptl_medium.py
  class LitLLM (line 9) | class LitLLM(L.LightningModule):
    method __init__ (line 10) | def __init__(self):
    method on_train_start (line 22) | def on_train_start(self):
    method training_step (line 26) | def training_step(self, batch):
    method configure_optimizers (line 33) | def configure_optimizers(self):

FILE: tutorials/examples/ptl-trainer/litgpt_ptl_small.py
  class LitLLM (line 10) | class LitLLM(L.LightningModule):
    method __init__ (line 11) | def __init__(self, checkpoint_dir, tokenizer_dir=None, trainer_ckpt_pa...
    method setup (line 17) | def setup(self, stage):
    method training_step (line 20) | def training_step(self, batch):
    method validation_step (line 25) | def validation_step(self, batch):
    method configure_optimizers (line 30) | def configure_optimizers(self):
  function find_latest_checkpoint (line 97) | def find_latest_checkpoint(directory):

FILE: tutorials/full_finetune_example.py
  function validate (line 35) | def validate(model, val_dataloader):
  function train (line 48) | def train(fabric, model, optimizer, scheduler, train_dataloader, val_dat...
  function main (line 80) | def main(fabric):

Download .json

Condensed preview — 233 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,939K chars).

[
  {
    "path": ".devcontainer/Dockerfile",
    "chars": 803,
    "preview": "# See here for image contents: https://github.com/devcontainers/images/blob/main/src/python/.devcontainer/Dockerfile\n\n# "
  },
  {
    "path": ".devcontainer/devcontainer.json",
    "chars": 4042,
    "preview": "// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:\n// https://github.co"
  },
  {
    "path": ".github/CODEOWNERS",
    "chars": 133,
    "preview": "* @lantiga @t-vi @lianakoleva @KaelanDt @k223kim @andyland\n/README.md                           @williamfalcon @lantiga "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/ask-a-question.md",
    "chars": 144,
    "preview": "---\nname: Ask a Question\nabout: Ask and answer questions related to LitGPT\ntitle: ''\nlabels: question\n\n---\n\nPlease descr"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yaml",
    "chars": 1792,
    "preview": "name: Bug Report\ndescription: Report errors related to LitGPT\ntitle: \"Description\"\nlabels: bug\nbody:\n  - type: markdown\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.md",
    "chars": 184,
    "preview": "---\nname: Suggest a Feature\nabout: Propose a new feature or enhancement\ntitle: ''\nlabels: enhancement\n\n---\n\nPlease descr"
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 1283,
    "preview": "# Basic dependabot.yml file with\n# minimum configuration for two package managers\n\nversion: 2\nupdates:\n  # Enable versio"
  },
  {
    "path": ".github/workflows/check-links.yml",
    "chars": 721,
    "preview": "name: Check hyperlinks\n\non:\n  push:\n    branches:\n      - main\n  pull_request:\n    branches:\n      - main\n\njobs:\n  test:"
  },
  {
    "path": ".github/workflows/cpu-tests.yml",
    "chars": 5396,
    "preview": "name: CPU tests\n\non:\n  push:\n    branches: [main]\n  pull_request_target:\n    branches: [main]\n    types: [opened, reopen"
  },
  {
    "path": ".github/workflows/mkdocs-deploy.yml",
    "chars": 746,
    "preview": "name: Deploy MkDocs\n\non:\n  push:\n    branches: [main]\n\npermissions:\n  contents: write\n\njobs:\n  deploy:\n    runs-on: ubun"
  },
  {
    "path": ".github/workflows/publish-pkg.yml",
    "chars": 1123,
    "preview": "# To create a release, create a tag and push it to GitHub:\n#git tag -a \"v0.0.1-beta\" -m \"beta version testing\"\n#git push"
  },
  {
    "path": ".gitignore",
    "chars": 294,
    "preview": ".ipynb_checkpoints/\n__pycache__\n.idea\n.DS_Store\n*.egg-info\nbuild\ndist\n.venv\n.venv/\n.vscode\nuv.lock\n\n# data\ndata\ndatasets"
  },
  {
    "path": ".lightning/workflows/tests.yaml",
    "chars": 2986,
    "preview": "trigger:\n  push:\n    branches: [\"main\"]\n  pull_request:\n    branches: [\"main\"]\n\nimage: \"pytorchlightning/lightning-thund"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 2549,
    "preview": "# Copyright The Lightning team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use th"
  },
  {
    "path": "CITATION.cff",
    "chars": 335,
    "preview": "cff-version: 1.2.0\nmessage: \"If you use this software, you can cite it as shown below.\"\ntitle: \"LitGPT\"\nabstract: \"20+ h"
  },
  {
    "path": "LICENSE",
    "chars": 11344,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 31080,
    "preview": "<div align=\"center\">\n\n\n# ⚡ LitGPT\n\n**20+ high-performance LLMs with recipes to pretrain, finetune, and deploy at scale.*"
  },
  {
    "path": "config_hub/finetune/README.md",
    "chars": 17392,
    "preview": "## Config files\n\nThe table below lists the performances you can expect from the provided config files. Note that you can"
  },
  {
    "path": "config_hub/finetune/falcon-7b/lora.yaml",
    "chars": 4228,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/falcon-7b/qlora.yaml",
    "chars": 4280,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma-2b/full.yaml",
    "chars": 3217,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma-2b/lora.yaml",
    "chars": 4252,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma-2b/qlora.yaml",
    "chars": 4262,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma-7b/lora.yaml",
    "chars": 4254,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma-7b/qlora.yaml",
    "chars": 4262,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma2-2b/lora.yaml",
    "chars": 4256,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma2-2b/qlora.yaml",
    "chars": 4266,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma2-9b/lora.yaml",
    "chars": 4257,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/gemma2-9b/qlora.yaml",
    "chars": 4266,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-2-7b/full.yaml",
    "chars": 3573,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-2-7b/lora.yaml",
    "chars": 4236,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-2-7b/qlora.yaml",
    "chars": 4288,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3-8b/full.yaml",
    "chars": 3576,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3-8b/lora.yaml",
    "chars": 4239,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3-8b/qlora.yaml",
    "chars": 4290,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.1-8b/full.yaml",
    "chars": 3580,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.1-8b/lora.yaml",
    "chars": 4243,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.1-8b/qlora.yaml",
    "chars": 4294,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-1B/full.yaml",
    "chars": 3577,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-1B/lora.yaml",
    "chars": 4238,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-1B/qlora.yaml",
    "chars": 4289,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-3B/full.yaml",
    "chars": 3577,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-3B/lora.yaml",
    "chars": 4238,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/llama-3.2-3B/qlora.yaml",
    "chars": 4289,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/mistral-7b/lora.yaml",
    "chars": 4238,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/mistral-7b/qlora.yaml",
    "chars": 4290,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/mistral-7b-v0.2/lora.yaml",
    "chars": 4236,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/mistral-7b-v0.2/qlora.yaml",
    "chars": 4288,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-2/full.yaml",
    "chars": 3189,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-2/lora.yaml",
    "chars": 4250,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-2/qlora.yaml",
    "chars": 4259,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-3/full.yaml",
    "chars": 3139,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-3/lora.yaml",
    "chars": 4204,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/phi-3/qlora.yaml",
    "chars": 4213,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/stablelm-base-alpha-3b/full.yaml",
    "chars": 3247,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/stablelm-base-alpha-3b/lora.yaml",
    "chars": 4259,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/stablelm-base-alpha-3b/qlora.yaml",
    "chars": 4311,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/tiny-llama/full.yaml",
    "chars": 3258,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/tiny-llama/lora.yaml",
    "chars": 4297,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/finetune/tiny-llama/qlora.yaml",
    "chars": 4306,
    "preview": "# The path to the base model's checkpoint directory to load for finetuning. (type: <class 'Path'>, default: checkpoints/"
  },
  {
    "path": "config_hub/pretrain/debug.yaml",
    "chars": 4129,
    "preview": "# The name of the model to pretrain. Choose from names in ``litgpt.config``. Mutually exclusive with\n# ``model_config``."
  },
  {
    "path": "config_hub/pretrain/microllama.yaml",
    "chars": 4281,
    "preview": "# The name of the model to pretrain. Choose from names in ``litgpt.config``. Mutually exclusive with\n# ``model_config``."
  },
  {
    "path": "config_hub/pretrain/tinyllama.yaml",
    "chars": 4153,
    "preview": "# The name of the model to pretrain. Choose from names in ``litgpt.config``. Mutually exclusive with\n# ``model_config``."
  },
  {
    "path": "config_hub/pretrain/tinystories.yaml",
    "chars": 4498,
    "preview": "# The name of the model to pretrain. Choose from names in ``litgpt.config``. Mutually exclusive with\n# ``model_config``."
  },
  {
    "path": "extensions/thunder/README.md",
    "chars": 30710,
    "preview": "# Lightning Thunder: a source-to-source compiler for PyTorch\n\n[Lightning Thunder](https://github.com/Lightning-AI/lightn"
  },
  {
    "path": "extensions/thunder/__init__.py",
    "chars": 194,
    "preview": "import sys\nfrom pathlib import Path\n\n# support running without installing as a package, adding extensions to the Python "
  },
  {
    "path": "extensions/thunder/pretrain.py",
    "chars": 20468,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport math\nimport os\nimport pprint\n"
  },
  {
    "path": "extensions/thunder/strategies/__init__.py",
    "chars": 118,
    "preview": "from .thunder_ddp import ThunderDDPStrategy  # noqa: F401\nfrom .thunder_fsdp import ThunderFSDPStrategy  # noqa: F401\n"
  },
  {
    "path": "extensions/thunder/strategies/thunder_ddp.py",
    "chars": 10413,
    "preview": "\"\"\"Fabric Strategy to support Thunder DDP: To be upstreamed into Fabric eventually.\"\"\"\n\nfrom contextlib import nullconte"
  },
  {
    "path": "extensions/thunder/strategies/thunder_fsdp.py",
    "chars": 21058,
    "preview": "\"\"\"Fabric Strategy to support Thunder FSDP: To be upstreamed into Fabric eventually.\"\"\"\n\nimport shutil\nfrom contextlib i"
  },
  {
    "path": "extensions/thunder/unsloth/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "extensions/thunder/unsloth/executor.py",
    "chars": 8782,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport sys\nfrom pathlib import Path\nf"
  },
  {
    "path": "extensions/thunder/unsloth/kernels/__init__.py",
    "chars": 352,
    "preview": "from .cross_entropy_loss import _cross_entropy_backward_impl, _cross_entropy_forward_impl  # noqa: F401\nfrom .rope_embed"
  },
  {
    "path": "extensions/thunder/unsloth/kernels/cross_entropy_loss.py",
    "chars": 8170,
    "preview": "# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.\n#\n# Licensed under the Apache License,"
  },
  {
    "path": "extensions/thunder/unsloth/kernels/rope_embedding.py",
    "chars": 4531,
    "preview": "# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.\n#\n# Licensed under the Apache License,"
  },
  {
    "path": "extensions/thunder/unsloth/kernels/swiglu.py",
    "chars": 3465,
    "preview": "# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.\n#\n# Licensed under the Apache License,"
  },
  {
    "path": "extensions/thunder/unsloth/kernels/utils.py",
    "chars": 1255,
    "preview": "# Copyright 2023-present Daniel Han-Chen & the Unsloth team. All rights reserved.\n#\n# Licensed under the Apache License,"
  },
  {
    "path": "extensions/xla/README.md",
    "chars": 7758,
    "preview": "# TPU support\n\nThis project utilizes [`Fabric`](https://lightning.ai/docs/fabric/stable), which supports TPUs via [PyTor"
  },
  {
    "path": "extensions/xla/__init__",
    "chars": 194,
    "preview": "import sys\nfrom pathlib import Path\n\n# support running without installing as a package, adding extensions to the Python "
  },
  {
    "path": "extensions/xla/finetune/__init__",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "extensions/xla/finetune/adapter.py",
    "chars": 11904,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport sys\nimport time\nfro"
  },
  {
    "path": "extensions/xla/generate/__init__",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "extensions/xla/generate/adapter.py",
    "chars": 4860,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nfrom pathlib "
  },
  {
    "path": "extensions/xla/generate/base.py",
    "chars": 6863,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nfrom pathlib "
  },
  {
    "path": "extensions/xla/scripts/__init__",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "extensions/xla/scripts/prepare_alpaca.py",
    "chars": 5659,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"Implementation derived from https"
  },
  {
    "path": "extensions/xla/utils.py",
    "chars": 5131,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport itertools\nfrom functools impo"
  },
  {
    "path": "litgpt/__init__.py",
    "chars": 882,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport logging\nimport re\n\nfrom litgp"
  },
  {
    "path": "litgpt/__main__.py",
    "chars": 3355,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport warnings\n\nimport torch\nfrom j"
  },
  {
    "path": "litgpt/adapter.py",
    "chars": 5362,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"Implementation of the paper:\n\nLLa"
  },
  {
    "path": "litgpt/adapter_v2.py",
    "chars": 8933,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"Implementation of the paper:\n\nLLa"
  },
  {
    "path": "litgpt/api.py",
    "chars": 31374,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n#\n# This file implements the LitGPT P"
  },
  {
    "path": "litgpt/args.py",
    "chars": 5024,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport math\nimport warnings\nfrom data"
  },
  {
    "path": "litgpt/chat/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "litgpt/chat/base.py",
    "chars": 10783,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nfrom pathlib "
  },
  {
    "path": "litgpt/config.py",
    "chars": 98599,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom copy import deepcopy\nfrom datac"
  },
  {
    "path": "litgpt/constants.py",
    "chars": 1387,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\"\"\"Centralized package availability c"
  },
  {
    "path": "litgpt/data/__init__.py",
    "chars": 1036,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom litgpt.data.alpaca import Alpac"
  },
  {
    "path": "litgpt/data/alpaca.py",
    "chars": 5799,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\"\"\"Implementation derived from https:"
  },
  {
    "path": "litgpt/data/alpaca_2k.py",
    "chars": 2013,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\nfrom dataclasses import dataclass, "
  },
  {
    "path": "litgpt/data/alpaca_gpt4.py",
    "chars": 950,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\nfrom dataclasses import dataclass, "
  },
  {
    "path": "litgpt/data/base.py",
    "chars": 6321,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom abc import abstractmethod\nfrom f"
  },
  {
    "path": "litgpt/data/deita.py",
    "chars": 4664,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\"\"\"Implementation derived from https:"
  },
  {
    "path": "litgpt/data/flan.py",
    "chars": 6951,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nfrom dataclasses import "
  },
  {
    "path": "litgpt/data/json_data.py",
    "chars": 6532,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nimport warnings\nfrom dat"
  },
  {
    "path": "litgpt/data/lima.py",
    "chars": 5453,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\"\"\"Implementation derived from https:"
  },
  {
    "path": "litgpt/data/lit_data.py",
    "chars": 2888,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nfrom dataclasses import dat"
  },
  {
    "path": "litgpt/data/longform.py",
    "chars": 3485,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nfrom dataclasses import "
  },
  {
    "path": "litgpt/data/microllama.py",
    "chars": 539,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom dataclasses import dataclass\nfro"
  },
  {
    "path": "litgpt/data/openwebtext.py",
    "chars": 4389,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nfrom dataclasses import dat"
  },
  {
    "path": "litgpt/data/prepare_slimpajama.py",
    "chars": 2087,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nimport os\nimport time\nfr"
  },
  {
    "path": "litgpt/data/prepare_starcoder.py",
    "chars": 2401,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport time\nimport traceba"
  },
  {
    "path": "litgpt/data/text_files.py",
    "chars": 6213,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport glob\nimport os\nfrom dataclasse"
  },
  {
    "path": "litgpt/data/tinyllama.py",
    "chars": 4310,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom dataclasses import dataclass, fi"
  },
  {
    "path": "litgpt/data/tinystories.py",
    "chars": 5682,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport glob\nimport json\nimport os\nfro"
  },
  {
    "path": "litgpt/deploy/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "litgpt/deploy/serve.py",
    "chars": 12031,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport json\nimport sys\nfrom pathlib i"
  },
  {
    "path": "litgpt/eval/evaluate.py",
    "chars": 4809,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nimport os\nfrom pathlib i"
  },
  {
    "path": "litgpt/finetune/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "litgpt/finetune/adapter.py",
    "chars": 20475,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport dataclasses\nimport math\nimport"
  },
  {
    "path": "litgpt/finetune/adapter_v2.py",
    "chars": 21501,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport dataclasses\nimport math\nimport"
  },
  {
    "path": "litgpt/finetune/full.py",
    "chars": 18953,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport dataclasses\nimport math\nimport"
  },
  {
    "path": "litgpt/finetune/lora.py",
    "chars": 23151,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport dataclasses\nimport math\nimport"
  },
  {
    "path": "litgpt/finetune/lora_legacy.py",
    "chars": 21446,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport dataclasses\nimport math\nimport"
  },
  {
    "path": "litgpt/generate/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "litgpt/generate/adapter.py",
    "chars": 6510,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nimport warnin"
  },
  {
    "path": "litgpt/generate/adapter_v2.py",
    "chars": 6531,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nimport warnin"
  },
  {
    "path": "litgpt/generate/base.py",
    "chars": 24112,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nimport warnin"
  },
  {
    "path": "litgpt/generate/full.py",
    "chars": 6287,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nimport warnin"
  },
  {
    "path": "litgpt/generate/sequentially.py",
    "chars": 13271,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport itertools\nimport logging\nimpo"
  },
  {
    "path": "litgpt/generate/speculative_decoding.py",
    "chars": 20479,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport sys\nimport time\nimport warnin"
  },
  {
    "path": "litgpt/generate/tp.py",
    "chars": 11687,
    "preview": "\"\"\"Tensor-parallel implementation adapted from https://github.com/pytorch-labs/gpt-fast/blob/14df27/tp.py\"\"\"\n\nimport log"
  },
  {
    "path": "litgpt/lora.py",
    "chars": 30616,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n# Derived from https://github.com/mi"
  },
  {
    "path": "litgpt/model.py",
    "chars": 59176,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"Full definition of a decoder-only"
  },
  {
    "path": "litgpt/parser_config.py",
    "chars": 1631,
    "preview": "import sys\nfrom pathlib import Path\nfrom typing import List, Optional\n\nfrom litgpt.utils import CLI\n\n\ndef parser_command"
  },
  {
    "path": "litgpt/pretrain.py",
    "chars": 21179,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport math\nimport pprint\nimport tim"
  },
  {
    "path": "litgpt/prompts.py",
    "chars": 23504,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport importlib\nimport re\nfrom abc i"
  },
  {
    "path": "litgpt/scripts/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "litgpt/scripts/convert_hf_checkpoint.py",
    "chars": 41764,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport gc\nimport json\nimport os\nimpo"
  },
  {
    "path": "litgpt/scripts/convert_lit_checkpoint.py",
    "chars": 27333,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport gc\nfrom collections import de"
  },
  {
    "path": "litgpt/scripts/convert_pretrained_checkpoint.py",
    "chars": 2202,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom pathlib import Path\nfrom pprint"
  },
  {
    "path": "litgpt/scripts/download.py",
    "chars": 6123,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport importlib.util\nimport os\nfrom"
  },
  {
    "path": "litgpt/scripts/merge_lora.py",
    "chars": 4668,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"This script merges the LoRA weigh"
  },
  {
    "path": "litgpt/tokenizer.py",
    "chars": 8413,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport json\nfrom pathlib import Path"
  },
  {
    "path": "litgpt/types.py",
    "chars": 583,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\"\"\"Type aliases used across LitGPT mo"
  },
  {
    "path": "litgpt/utils.py",
    "chars": 36611,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\n\"\"\"Utility functions for training an"
  },
  {
    "path": "pyproject.toml",
    "chars": 3789,
    "preview": "[build-system]\nbuild-backend = \"setuptools.build_meta\"\n\nrequires = [\n  \"setuptools>=68.2.2\",\n  \"wheel>=0.41.2\",\n]\n\n[proj"
  },
  {
    "path": "tests/conftest.py",
    "chars": 5918,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport shutil\nimport sys\nf"
  },
  {
    "path": "tests/convert/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/convert/test_hf_checkpoint.py",
    "chars": 9047,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom unittest import mock\n\nimport py"
  },
  {
    "path": "tests/convert/test_lit_checkpoint.py",
    "chars": 29148,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nfrom dataclasses import as"
  },
  {
    "path": "tests/convert/test_pretrained_checkpoint.py",
    "chars": 1076,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\n\nimport torch\n\nfrom litgpt"
  },
  {
    "path": "tests/data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/data/_fixtures/alpaca.json",
    "chars": 11762,
    "preview": "[\n  {\n    \"instruction\": \"Give three tips for staying healthy.\",\n    \"input\": \"\",\n    \"output\": \"1. Eat a balanced diet "
  },
  {
    "path": "tests/data/_fixtures/dolly.json",
    "chars": 16066,
    "preview": "[\n  {\n    \"instruction\": \"When did Virgin Australia start operating?\",\n    \"context\": \"Virgin Australia, the trading nam"
  },
  {
    "path": "tests/data/_fixtures/longform_train.json",
    "chars": 26457,
    "preview": "[\n  {\n    \"input\": \"What are the positions held by Beto O'Rourke, Lupe Valdez, and Veronica Escobar on decriminalizing u"
  },
  {
    "path": "tests/data/_fixtures/longform_val.json",
    "chars": 31861,
    "preview": "[\n  {\n    \"input\": \"The Big Mistake\\n\\nThis day was full of joy and happiness, but something went wrong after when she t"
  },
  {
    "path": "tests/data/test_alpaca.py",
    "chars": 1349,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom litgpt.data import Alpaca\nfrom l"
  },
  {
    "path": "tests/data/test_base.py",
    "chars": 3981,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom typing import Optional\n\nimport "
  },
  {
    "path": "tests/data/test_deita.py",
    "chars": 2773,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom unittest import mock\n\nfrom litgp"
  },
  {
    "path": "tests/data/test_json.py",
    "chars": 5141,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport json\nfrom typing import Option"
  },
  {
    "path": "tests/data/test_lit_data.py",
    "chars": 2175,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport sys\nfrom unittest import mock\n"
  },
  {
    "path": "tests/data/test_longform.py",
    "chars": 1326,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom litgpt.data import LongForm\nfrom"
  },
  {
    "path": "tests/data/test_openwebtext.py",
    "chars": 2202,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport sys\nfrom unittest import mock\n"
  },
  {
    "path": "tests/data/test_textfiles.py",
    "chars": 3738,
    "preview": "import json\n\nimport torch\nfrom litdata import TokensLoader, optimize\nfrom torch.utils._pytree import tree_map\n\nfrom litg"
  },
  {
    "path": "tests/data/test_tinyllama.py",
    "chars": 1470,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nfrom unittest import mock\n\nimport pyt"
  },
  {
    "path": "tests/data/test_tinystories.py",
    "chars": 2961,
    "preview": "import json\n\nimport pytest\nimport torch\nfrom litdata import optimize\nfrom litdata.streaming import StreamingDataset, Tok"
  },
  {
    "path": "tests/ext_thunder/__init__.py",
    "chars": 314,
    "preview": "import sys\nfrom pathlib import Path\n\n# support running without installing as a package, adding extensions to the Python "
  },
  {
    "path": "tests/ext_thunder/test_thunder_distributed.py",
    "chars": 16154,
    "preview": "import os\nimport sys\nfrom pathlib import Path\nfrom typing import Optional, Tuple, Union\n\nimport pytest\nimport torch\nfrom"
  },
  {
    "path": "tests/ext_thunder/test_thunder_networks.py",
    "chars": 255,
    "preview": "\"\"\"Run thunder tests as part of LitGPT CI\"\"\"\n\nfrom litgpt.constants import _THUNDER_AVAILABLE\n\nif _THUNDER_AVAILABLE:\n  "
  },
  {
    "path": "tests/ext_thunder/test_thunder_pretrain.py",
    "chars": 1983,
    "preview": "import os\nfrom contextlib import redirect_stdout\nfrom io import StringIO\nfrom unittest.mock import Mock\n\nimport torch\nfr"
  },
  {
    "path": "tests/ext_thunder/test_unsloth_executor.py",
    "chars": 5102,
    "preview": "import pytest\nimport torch\n\nfrom litgpt import GPT, Config\nfrom litgpt.model import apply_rope, build_rope_cache\nfrom li"
  },
  {
    "path": "tests/generate/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/generate/test_adapter.py",
    "chars": 2882,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport re\nimport subproces"
  },
  {
    "path": "tests/generate/test_main.py",
    "chars": 5095,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport re\nimport subproces"
  },
  {
    "path": "tests/generate/test_sequentially.py",
    "chars": 11936,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport itertools\nimport subprocess\ni"
  },
  {
    "path": "tests/generate/test_tp.py",
    "chars": 5470,
    "preview": "import subprocess\nimport sys\nfrom dataclasses import asdict, replace\nfrom pathlib import Path\nfrom unittest.mock import "
  },
  {
    "path": "tests/generate/utils.py",
    "chars": 558,
    "preview": "from collections import defaultdict\n\n\ndef find_forward_hooks(module):\n    mapping = defaultdict(list)\n    for name, subm"
  },
  {
    "path": "tests/test_adapter.py",
    "chars": 18030,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nfrom contextlib import redi"
  },
  {
    "path": "tests/test_adapter_v2.py",
    "chars": 22912,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nfrom contextlib import redi"
  },
  {
    "path": "tests/test_api.py",
    "chars": 16260,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport re\nimport sys\nfrom "
  },
  {
    "path": "tests/test_args.py",
    "chars": 1953,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport pytest\n\nfrom litgpt.args impor"
  },
  {
    "path": "tests/test_batch.py",
    "chars": 9929,
    "preview": "import warnings\nfrom pathlib import Path\n\nimport lightning as L\nimport pytest\nimport torch\n\nimport litgpt\nfrom litgpt.ap"
  },
  {
    "path": "tests/test_chat.py",
    "chars": 7817,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nimport re\nimport subprocess"
  },
  {
    "path": "tests/test_ci.py",
    "chars": 334,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom lightning.fabric.plugins.precis"
  },
  {
    "path": "tests/test_cli.py",
    "chars": 2956,
    "preview": "import sys\nfrom contextlib import redirect_stdout\nfrom io import StringIO\nfrom unittest import mock\n\nimport pytest\nfrom "
  },
  {
    "path": "tests/test_config.py",
    "chars": 4190,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport pytest\nimport yaml\n\nimport li"
  },
  {
    "path": "tests/test_config_hub.py",
    "chars": 2614,
    "preview": "import importlib\nimport importlib.util\nfrom pathlib import Path\nfrom unittest import mock\nfrom unittest.mock import Mock"
  },
  {
    "path": "tests/test_deepseek_moe.py",
    "chars": 4059,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport pytest\nimport torch\nfrom tran"
  },
  {
    "path": "tests/test_distributed.py",
    "chars": 1657,
    "preview": "import pytest\nimport torch\nfrom lightning import Fabric\n\nfrom litgpt.utils import _RunIf\n\n\n@_RunIf(min_cuda_gpus=2, stan"
  },
  {
    "path": "tests/test_evaluate.py",
    "chars": 2750,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport subprocess\nfrom contextlib im"
  },
  {
    "path": "tests/test_full.py",
    "chars": 2937,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nfrom contextlib import red"
  },
  {
    "path": "tests/test_generate_speculatively.py",
    "chars": 8197,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport re\nimport subprocess\nfrom con"
  },
  {
    "path": "tests/test_lora.py",
    "chars": 45072,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\nimport os\nfrom contextlib import redi"
  },
  {
    "path": "tests/test_merge_lora.py",
    "chars": 3924,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport shutil\nfrom context"
  },
  {
    "path": "tests/test_model.py",
    "chars": 63221,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom copy import deepcopy\nfrom funct"
  },
  {
    "path": "tests/test_multihead_latent_attention.py",
    "chars": 4998,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport pytest\nimport torch\nfrom tran"
  },
  {
    "path": "tests/test_pretrain.py",
    "chars": 5161,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nfrom contextlib import red"
  },
  {
    "path": "tests/test_prompts.py",
    "chars": 7242,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nfrom typing import Optional\n\nimport "
  },
  {
    "path": "tests/test_readme.py",
    "chars": 8459,
    "preview": "# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.\n\nimport os\nimport platform\nimport sub"
  }
]

// ... and 33 more files (download for full content)

About this extraction

This page contains the full source code of the Lightning-AI/litgpt GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 233 files (1.8 MB), approximately 484.2k tokens, and a symbol index with 1090 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo