Full Code of rasbt/LLMs-from-scratch for AI

main 130cc1f63cb7 cached

306 files

4.9 MB

1.3M tokens

1376 symbols

2 requests

Download .txt

Showing preview only (5,209K chars total). Download the full file or copy to clipboard to get everything.

Repository: rasbt/LLMs-from-scratch
Branch: main
Commit: 130cc1f63cb7
Files: 306
Total size: 4.9 MB

Directory structure:
gitextract_2il4qejt/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── ask-a-question.md
│   │   └── bug-report.yaml
│   ├── scripts/
│   │   └── check_double_quotes.py
│   └── workflows/
│       ├── basic-tests-latest-python.yml
│       ├── basic-tests-linux-uv.yml
│       ├── basic-tests-macos-uv.yml
│       ├── basic-tests-old-pytorch.yml
│       ├── basic-tests-pip.yml
│       ├── basic-tests-pixi.yml
│       ├── basic-tests-pytorch-rc.yml
│       ├── basic-tests-windows-uv-pip.yml
│       ├── basic-tests-windows-uv-pip.yml.disabled
│       ├── basic-tests-windows-uv.yml.disabled
│       ├── check-links.yml
│       ├── check-spelling-errors.yml
│       └── pep8-linter.yml
├── .gitignore
├── .gitmodules
├── CITATION.cff
├── LICENSE.txt
├── README.md
├── appendix-A/
│   ├── 01_main-chapter-code/
│   │   ├── DDP-script-torchrun.py
│   │   ├── DDP-script.py
│   │   ├── README.md
│   │   ├── code-part1.ipynb
│   │   ├── code-part2.ipynb
│   │   └── exercise-solutions.ipynb
│   ├── 02_setup-recommendations/
│   │   └── README.md
│   └── README.md
├── appendix-B/
│   └── README.md
├── appendix-C/
│   └── README.md
├── appendix-D/
│   ├── 01_main-chapter-code/
│   │   ├── appendix-D.ipynb
│   │   └── previous_chapters.py
│   └── README.md
├── appendix-E/
│   ├── 01_main-chapter-code/
│   │   ├── appendix-E.ipynb
│   │   ├── gpt_download.py
│   │   └── previous_chapters.py
│   └── README.md
├── ch01/
│   ├── README.md
│   └── reading-recommendations.md
├── ch02/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch02.ipynb
│   │   ├── dataloader.ipynb
│   │   └── exercise-solutions.ipynb
│   ├── 02_bonus_bytepair-encoder/
│   │   ├── README.md
│   │   ├── bpe_openai_gpt2.py
│   │   ├── compare-bpe-tiktoken.ipynb
│   │   └── requirements-extra.txt
│   ├── 03_bonus_embedding-vs-matmul/
│   │   ├── README.md
│   │   └── embeddings-and-linear-layers.ipynb
│   ├── 04_bonus_dataloader-intuition/
│   │   ├── README.md
│   │   └── dataloader-intuition.ipynb
│   ├── 05_bpe-from-scratch/
│   │   ├── README.md
│   │   ├── bpe-from-scratch-simple.ipynb
│   │   ├── bpe-from-scratch.ipynb
│   │   └── tests.py
│   └── README.md
├── ch03/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch03.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── multihead-attention.ipynb
│   │   └── small-text-sample.txt
│   ├── 02_bonus_efficient-multihead-attention/
│   │   ├── README.md
│   │   ├── mha-implementations.ipynb
│   │   └── tests/
│   │       └── test_mha_implementations.py
│   ├── 03_understanding-buffers/
│   │   ├── README.md
│   │   └── understanding-buffers.ipynb
│   └── README.md
├── ch04/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch04.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_performance-analysis/
│   │   ├── README.md
│   │   ├── flops-analysis.ipynb
│   │   └── requirements-extra.txt
│   ├── 03_kv-cache/
│   │   ├── README.md
│   │   ├── gpt_ch04.py
│   │   ├── gpt_with_kv_cache.py
│   │   ├── gpt_with_kv_cache_optimized.py
│   │   └── tests.py
│   ├── 04_gqa/
│   │   ├── README.md
│   │   ├── gpt_with_kv_gqa.py
│   │   ├── gpt_with_kv_mha.py
│   │   ├── memory_estimator_gqa.py
│   │   └── plot_memory_estimates_gqa.py
│   ├── 05_mla/
│   │   ├── README.md
│   │   ├── gpt_with_kv_mha.py
│   │   ├── gpt_with_kv_mla.py
│   │   ├── memory_estimator_mla.py
│   │   └── plot_memory_estimates_mla.py
│   ├── 06_swa/
│   │   ├── README.md
│   │   ├── gpt_with_kv_mha.py
│   │   ├── gpt_with_kv_swa.py
│   │   ├── memory_estimator_swa.py
│   │   └── plot_memory_estimates_swa.py
│   ├── 07_moe/
│   │   ├── README.md
│   │   ├── gpt_with_kv_ffn.py
│   │   ├── gpt_with_kv_moe.py
│   │   ├── memory_estimator_moe.py
│   │   └── plot_memory_estimates_moe.py
│   ├── 08_deltanet/
│   │   ├── README.md
│   │   └── plot_memory_estimates_gated_deltanet.py
│   └── README.md
├── ch05/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch05.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt_download.py
│   │   ├── gpt_generate.py
│   │   ├── gpt_train.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_alternative_weight_loading/
│   │   ├── README.md
│   │   ├── weight-loading-hf-safetensors.ipynb
│   │   ├── weight-loading-hf-transformers.ipynb
│   │   └── weight-loading-pytorch.ipynb
│   ├── 03_bonus_pretraining_on_gutenberg/
│   │   ├── README.md
│   │   ├── prepare_dataset.py
│   │   ├── pretraining_simple.py
│   │   └── tests.py
│   ├── 04_learning_rate_schedulers/
│   │   └── README.md
│   ├── 05_bonus_hparam_tuning/
│   │   ├── README.md
│   │   └── hparam_search.py
│   ├── 06_user_interface/
│   │   ├── README.md
│   │   ├── app_orig.py
│   │   ├── app_own.py
│   │   └── requirements-extra.txt
│   ├── 07_gpt_to_llama/
│   │   ├── README.md
│   │   ├── converting-gpt-to-llama2.ipynb
│   │   ├── converting-llama2-to-llama3.ipynb
│   │   ├── previous_chapters.py
│   │   ├── requirements-extra.txt
│   │   ├── standalone-llama32.ipynb
│   │   └── tests/
│   │       ├── test-requirements-extra.txt
│   │       ├── test_llama32_nb.py
│   │       └── tests_rope_and_parts.py
│   ├── 08_memory_efficient_weight_loading/
│   │   ├── README.md
│   │   ├── memory-efficient-state-dict.ipynb
│   │   └── previous_chapters.py
│   ├── 09_extending-tokenizers/
│   │   ├── README.md
│   │   └── extend-tiktoken.ipynb
│   ├── 10_llm-training-speed/
│   │   ├── 00_orig.py
│   │   ├── 01_opt_single_gpu.py
│   │   ├── 02_opt_multi_gpu_ddp.py
│   │   └── README.md
│   ├── 11_qwen3/
│   │   ├── README.md
│   │   ├── qwen3-chat-interface/
│   │   │   ├── README.md
│   │   │   ├── qwen3-chat-interface-multiturn.py
│   │   │   ├── qwen3-chat-interface.py
│   │   │   └── requirements-extra.txt
│   │   ├── standalone-qwen3-moe-plus-kvcache.ipynb
│   │   ├── standalone-qwen3-moe.ipynb
│   │   ├── standalone-qwen3-plus-kvcache.ipynb
│   │   ├── standalone-qwen3.ipynb
│   │   └── tests/
│   │       ├── test_qwen3_kvcache_nb.py
│   │       └── test_qwen3_nb.py
│   ├── 12_gemma3/
│   │   ├── README.md
│   │   ├── standalone-gemma3-plus-kvcache.ipynb
│   │   ├── standalone-gemma3.ipynb
│   │   └── tests/
│   │       ├── test_gemma3_kv_nb.py
│   │       └── test_gemma3_nb.py
│   ├── 13_olmo3/
│   │   ├── README.md
│   │   ├── standalone-olmo3-plus-kv-cache.ipynb
│   │   ├── standalone-olmo3.ipynb
│   │   └── tests/
│   │       ├── olmo3_layer_debugger.py
│   │       ├── test_olmo3_kvcache_nb.py
│   │       └── test_olmo3_nb.py
│   ├── 14_ch05_with_other_llms/
│   │   ├── README.md
│   │   ├── ch05-llama32.ipynb
│   │   └── ch05-qwen3.ipynb
│   ├── 15_tiny-aya/
│   │   ├── README.md
│   │   ├── standalone-tiny-aya-plus-kv-cache.ipynb
│   │   ├── standalone-tiny-aya.ipynb
│   │   └── tests/
│   │       ├── test_tiny_aya_kvcache_nb.py
│   │       ├── test_tiny_aya_nb.py
│   │       └── tiny_aya_layer_debugger.py
│   ├── 16_qwen3.5/
│   │   ├── README.md
│   │   ├── qwen3.5-plus-kv-cache.ipynb
│   │   ├── qwen3.5.ipynb
│   │   ├── qwen3_5_transformers.py
│   │   └── tests/
│   │       ├── qwen3_5_layer_debugger.py
│   │       └── test_qwen3_5_nb.py
│   └── README.md
├── ch06/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch06.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt_class_finetune.py
│   │   ├── gpt_download.py
│   │   ├── load-finetuned-model.ipynb
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_bonus_additional-experiments/
│   │   ├── README.md
│   │   ├── additional_experiments.py
│   │   ├── gpt_download.py
│   │   └── previous_chapters.py
│   ├── 03_bonus_imdb-classification/
│   │   ├── README.md
│   │   ├── download_prepare_dataset.py
│   │   ├── gpt_download.py
│   │   ├── previous_chapters.py
│   │   ├── requirements-extra.txt
│   │   ├── sklearn-baseline.ipynb
│   │   ├── train_bert_hf.py
│   │   ├── train_bert_hf_spam.py
│   │   ├── train_gpt.py
│   │   └── train_sklearn_logreg.py
│   ├── 04_user_interface/
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements-extra.txt
│   └── README.md
├── ch07/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch07.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── exercise_experiments.py
│   │   ├── gpt_download.py
│   │   ├── gpt_instruction_finetuning.py
│   │   ├── instruction-data-with-response.json
│   │   ├── instruction-data.json
│   │   ├── load-finetuned-model.ipynb
│   │   ├── ollama_evaluate.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_dataset-utilities/
│   │   ├── README.md
│   │   ├── create-passive-voice-entries.ipynb
│   │   ├── find-near-duplicates.py
│   │   ├── instruction-examples.json
│   │   └── requirements-extra.txt
│   ├── 03_model-evaluation/
│   │   ├── README.md
│   │   ├── eval-example-data.json
│   │   ├── llm-instruction-eval-ollama.ipynb
│   │   ├── llm-instruction-eval-openai.ipynb
│   │   ├── requirements-extra.txt
│   │   └── scores/
│   │       ├── correlation-analysis.ipynb
│   │       ├── gpt4-model-1-response.json
│   │       ├── gpt4-model-2-response.json
│   │       ├── llama3-8b-model-1-response.json
│   │       └── llama3-8b-model-2-response.json
│   ├── 04_preference-tuning-with-dpo/
│   │   ├── README.md
│   │   ├── create-preference-data-ollama.ipynb
│   │   ├── dpo-from-scratch.ipynb
│   │   ├── instruction-data-with-preference.json
│   │   └── previous_chapters.py
│   ├── 05_dataset-generation/
│   │   ├── README.md
│   │   ├── instruction-data-llama3-7b.json
│   │   ├── llama3-ollama.ipynb
│   │   ├── reflection-gpt4.ipynb
│   │   └── requirements-extra.txt
│   ├── 06_user_interface/
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements-extra.txt
│   └── README.md
├── conftest.py
├── pixi.toml
├── pkg/
│   └── llms_from_scratch/
│       ├── README.md
│       ├── __init__.py
│       ├── appendix_a.py
│       ├── appendix_d.py
│       ├── appendix_e.py
│       ├── ch02.py
│       ├── ch03.py
│       ├── ch04.py
│       ├── ch05.py
│       ├── ch06.py
│       ├── ch07.py
│       ├── generate.py
│       ├── kv_cache/
│       │   ├── __init__.py
│       │   ├── generate.py
│       │   ├── gpt2.py
│       │   ├── llama3.py
│       │   ├── qwen3.py
│       │   └── utils.py
│       ├── kv_cache_batched/
│       │   ├── __init__.py
│       │   ├── generate.py
│       │   ├── qwen3.py
│       │   └── utils.py
│       ├── llama3.py
│       ├── qwen3.py
│       ├── tests/
│       │   ├── test_appendix_a.py
│       │   ├── test_appendix_d.py
│       │   ├── test_appendix_e.py
│       │   ├── test_ch02.py
│       │   ├── test_ch03.py
│       │   ├── test_ch04.py
│       │   ├── test_ch05.py
│       │   ├── test_ch06.py
│       │   ├── test_ch07.py
│       │   ├── test_generate.py
│       │   ├── test_llama3.py
│       │   └── test_qwen3.py
│       └── utils.py
├── pyproject.toml
├── requirements.txt
└── setup/
    ├── 01_optional-python-setup-preferences/
    │   ├── README.md
    │   ├── native-pixi.md
    │   └── native-uv.md
    ├── 02_installing-python-libraries/
    │   ├── README.md
    │   ├── python_environment_check.ipynb
    │   ├── python_environment_check.py
    │   └── tests.py
    ├── 03_optional-docker-environment/
    │   ├── .devcontainer/
    │   │   ├── Dockerfile
    │   │   ├── README.md
    │   │   └── devcontainer.json
    │   └── README.md
    ├── 04_optional-aws-sagemaker-notebook/
    │   ├── README.md
    │   └── cloudformation-template.yml
    └── README.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/ask-a-question.md
================================================
---
name: Ask a Question
about: Ask questions related to the book
title: ''
labels: [question]
assignees: rasbt

---

If you have a question that is not a bug, please consider asking it in this GitHub repository's [discussion forum](https://github.com/rasbt/LLMs-from-scratch/discussions).


================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.yaml
================================================
name: Bug Report
description: Report errors related to the book content or code
title: "Description"
labels: [bug]
assignees: rasbt
body:
  - type: markdown
    attributes:
      value: |
        Thank you for taking the time to report an issue. Please fill out the details below to help resolve it.

  - type: textarea
    id: bug_description
    attributes:
      label: Bug description
      description: A description of the issue.
      placeholder: |
        Please provide a description of what the bug or issue is.
    validations:
      required: true

  - type: dropdown
    id: operating_system
    attributes:
      label: What operating system are you using?
      description: If applicable, please select the operating system where you experienced this issue.
      options:
        - "Unknown"
        - "macOS"
        - "Linux"
        - "Windows"
    validations:
      required: False

  - type: dropdown
    id: compute_environment
    attributes:
      label: Where do you run your code?
      description: Please select the computing environment where you ran this code.
      options:
        - "Local (laptop, desktop)"
        - "Lightning AI Studio"
        - "Google Colab"
        - "Other cloud environment (AWS, Azure, GCP)"
    validations:
      required: False

  - type: textarea
    id: environment
    attributes:
      label: Environment
      description: |
        Please provide details about your Python environment via the environment collection script or notebook located at
        https://github.com/rasbt/LLMs-from-scratch/tree/main/setup/02_installing-python-libraries.
        For your convenience, you can download and run the script from your terminal as follows:

        ```bash
        curl --ssl-no-revoke -O https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/setup/02_installing-python-libraries/python_environment_check.py  \
        -O https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/requirements.txt

        python python_environment_check.py
        ```

        The script will print your Python environment information in the following format
        ```console
        [OK] Your Python version is 3.11.4
        [OK] torch 2.3.1
        [OK] jupyterlab 4.2.2
        [OK] tiktoken 0.7.0
        [OK] matplotlib 3.9.0
        [OK] numpy 1.26.4
        [OK] tensorflow 2.16.1
        [OK] tqdm 4.66.4
        [OK] pandas 2.2.2
        [OK] psutil 5.9.8
        ```
        You can simply copy and paste the outputs of this script below.
      value: |
        ```



        ```
    validations:
      required: false


================================================
FILE: .github/scripts/check_double_quotes.py
================================================
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt)
# Source for "Build a Reasoning Model (From Scratch)": https://mng.bz/lZ5B
# Code repository: https://github.com/rasbt/reasoning-from-scratch

# Verify that Python source files (and optionally notebooks) use double quotes for strings.

import argparse
import ast
import io
import json
import sys
import tokenize
from pathlib import Path

EXCLUDED_DIRS = {
    ".git",
    ".hg",
    ".mypy_cache",
    ".pytest_cache",
    ".ruff_cache",
    ".svn",
    ".tox",
    ".venv",
    "__pycache__",
    "build",
    "dist",
    "node_modules",
}

PREFIX_CHARS = {"r", "u", "f", "b"}
SINGLE_QUOTE = "'"
DOUBLE_QUOTE = "\""
TRIPLE_SINGLE = SINGLE_QUOTE * 3
TRIPLE_DOUBLE = DOUBLE_QUOTE * 3


def should_skip(path):
    parts = set(path.parts)
    return bool(EXCLUDED_DIRS & parts)


def collect_fstring_expr_string_positions(source):
    """
    Return set of (lineno, col_offset) for string literals that appear inside
    formatted expressions of f-strings. These should be exempt from the double
    quote check, since enforcing double quotes there is unnecessarily strict.
    """
    try:
        tree = ast.parse(source)
    except SyntaxError:
        return set()

    positions = set()

    class Collector(ast.NodeVisitor):
        def visit_JoinedStr(self, node):
            for value in node.values:
                if isinstance(value, ast.FormattedValue):
                    self._collect_from_expr(value.value)
            # Continue walking to catch nested f-strings within expressions
            self.generic_visit(node)

        def _collect_from_expr(self, node):
            if isinstance(node, ast.Constant) and isinstance(node.value, str):
                positions.add((node.lineno, node.col_offset))
            elif isinstance(node, ast.Str):  # Python <3.8 compatibility
                positions.add((node.lineno, node.col_offset))
            else:
                for child in ast.iter_child_nodes(node):
                    self._collect_from_expr(child)

    Collector().visit(tree)
    return positions


def check_quotes_in_source(source, path):
    violations = []
    ignored_positions = collect_fstring_expr_string_positions(source)
    tokens = tokenize.generate_tokens(io.StringIO(source).readline)
    for tok_type, tok_str, start, _, _ in tokens:
        if tok_type == tokenize.STRING:
            if start in ignored_positions:
                continue
            lowered = tok_str.lower()
            # ignore triple-quoted strings
            if lowered.startswith((TRIPLE_DOUBLE, TRIPLE_SINGLE)):
                continue

            # find the prefix and quote type
            # prefix = ""
            for c in PREFIX_CHARS:
                if lowered.startswith(c):
                    # prefix = c
                    lowered = lowered[1:]
                    break

            # report if not using double quotes
            if lowered.startswith(SINGLE_QUOTE):
                line, col = start
                violations.append(f"{path}:{line}:{col}: uses single quotes")
    return violations


def check_file(path):
    try:
        if path.suffix == ".ipynb":
            return check_notebook(path)
        else:
            text = path.read_text(encoding="utf-8")
            return check_quotes_in_source(text, path)
    except Exception as e:
        return [f"{path}: failed to check ({e})"]


def check_notebook(path):
    violations = []
    with open(path, encoding="utf-8") as f:
        nb = json.load(f)
    for cell in nb.get("cells", []):
        if cell.get("cell_type") == "code":
            src = "".join(cell.get("source", []))
            violations.extend(check_quotes_in_source(src, path))
    return violations


def parse_args():
    parser = argparse.ArgumentParser(description="Verify double-quoted string literals.")
    parser.add_argument(
        "--include-notebooks",
        action="store_true",
        help="Also scan Jupyter notebooks (.ipynb files) for single-quoted strings.",
    )
    return parser.parse_args()


def main():
    args = parse_args()
    project_root = Path(".").resolve()
    py_files = sorted(project_root.rglob("*.py"))
    notebook_files = sorted(project_root.rglob("*.ipynb")) if args.include_notebooks else []

    violations = []
    for path in py_files + notebook_files:
        if should_skip(path):
            continue
        violations.extend(check_file(path))

    if violations:
        print("\n".join(violations))
        print(f"\n{len(violations)} violations found.")
        return 1

    print("All files use double quotes correctly.")
    return 0


if __name__ == "__main__":
    sys.exit(main())


================================================
FILE: .github/workflows/basic-tests-latest-python.yml
================================================
name: Test latest PyTorch-compatible Python version
on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'    # Run workflow for changes in Python files
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v6

    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.13"

    - name: Install dependencies
      run: |
        curl -LsSf https://astral.sh/uv/install.sh | sh
        uv sync --dev --python=3.13
        uv add pytest-ruff nbval

    - name: Test Selected Python Scripts
      run: |
        source .venv/bin/activate
        pytest setup/02_installing-python-libraries/tests.py
        pytest ch04/01_main-chapter-code/tests.py
        pytest ch05/01_main-chapter-code/tests.py
        pytest ch06/01_main-chapter-code/tests.py

    - name: Validate Selected Jupyter Notebooks
      run: |
        source .venv/bin/activate
        pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
        pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
        pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-linux-uv.yml
================================================
name: Code tests Linux

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  uv-tests:
    name: Code tests (Linux)
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6

      - name: Set up Python (uv)
        uses: actions/setup-python@v6
        with:
          python-version: "3.13"

      - name: Install uv and dependencies
        shell: bash
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
          uv sync --dev  # tests for backwards compatibility
          uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
          uv add pytest-ruff nbval

      - name: Test Selected Python Scripts (uv)
        shell: bash
        run: |
          source .venv/bin/activate
          pytest setup/02_installing-python-libraries/tests.py
          pytest ch03/02_bonus_efficient-multihead-attention/tests/test_mha_implementations.py
          pytest ch04/01_main-chapter-code/tests.py
          pytest ch04/03_kv-cache/tests.py
          pytest ch05/01_main-chapter-code/tests.py
          pytest ch05/07_gpt_to_llama/tests/tests_rope_and_parts.py
          pytest ch05/07_gpt_to_llama/tests/test_llama32_nb.py
          pytest ch05/11_qwen3/tests/test_qwen3_nb.py
          pytest ch05/12_gemma3/tests/test_gemma3_nb.py
          pytest ch05/12_gemma3/tests/test_gemma3_kv_nb.py
          pytest ch05/13_olmo3/tests/test_olmo3_nb.py
          pytest ch05/13_olmo3/tests/test_olmo3_kvcache_nb.py
          pytest ch06/01_main-chapter-code/tests.py

      - name: Validate Selected Jupyter Notebooks (uv)
        shell: bash
        run: |
          source .venv/bin/activate
          pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb

      - name: Test Selected Bonus Materials
        shell: bash
        run: |
          source .venv/bin/activate
          pytest ch02/05_bpe-from-scratch/tests.py

      - name: Test Selected Bonus Materials
        shell: bash
        run: |
          source .venv/bin/activate
          pytest pkg/llms_from_scratch/tests/


================================================
FILE: .github/workflows/basic-tests-macos-uv.yml
================================================
name: Code tests macOS

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  uv-tests:
    name: Code tests (macOS)
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v6

      - name: Set up Python (uv)
        uses: actions/setup-python@v6
        with:
          python-version: "3.13"

      - name: Install uv and dependencies
        shell: bash
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
          uv sync --dev --python=3.10  # tests for backwards compatibility
          uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
          uv add pytest-ruff nbval

      - name: Test Selected Python Scripts (uv)
        shell: bash
        run: |
          source .venv/bin/activate
          pytest setup/02_installing-python-libraries/tests.py
          pytest ch04/01_main-chapter-code/tests.py
          pytest ch05/01_main-chapter-code/tests.py
          pytest ch05/07_gpt_to_llama/tests/tests_rope_and_parts.py
          pytest ch05/07_gpt_to_llama/tests/test_llama32_nb.py
          pytest ch05/11_qwen3/tests/test_qwen3_nb.py
          pytest ch05/12_gemma3/tests/test_gemma3_nb.py
          pytest ch05/12_gemma3/tests/test_gemma3_kv_nb.py
          pytest ch06/01_main-chapter-code/tests.py

      - name: Validate Selected Jupyter Notebooks (uv)
        shell: bash
        run: |
          source .venv/bin/activate
          pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-old-pytorch.yml
================================================
name: Test PyTorch 2.3 and 2.5

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'    # Run workflow for changes in Python files
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        pytorch-version: [ 2.3.0, 2.5.0 ]

    steps:
    - uses: actions/checkout@v6

    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.13"

    - name: Install dependencies
      run: |
        curl -LsSf https://astral.sh/uv/install.sh | sh
        uv sync --dev --python=3.10  # tests for backwards compatibility
        uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
        uv pip install torch==${{ matrix.pytorch-version }} pytest-ruff nbval

    - name: Test Selected Python Scripts
      run: |
        source .venv/bin/activate
        pytest setup/02_installing-python-libraries/tests.py
        pytest ch04/01_main-chapter-code/tests.py
        pytest ch05/01_main-chapter-code/tests.py
        pytest ch06/01_main-chapter-code/tests.py

    - name: Validate Selected Jupyter Notebooks
      run: |
        source .venv/bin/activate
        pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
        pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
        pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-pip.yml
================================================
name: Code tests (plain pip)

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  pip-tests:
    name: Pip Tests (Ubuntu Only)
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6

      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: "3.10"  # tests for backwards compatibility

      - name: Create Virtual Environment and Install Dependencies
        run: |
          python -m venv .venv
          source .venv/bin/activate
          pip install --upgrade pip
          # Necessary because there is not much storage space on this runner:
          pip install torch==2.9.1+cpu --index-url https://download.pytorch.org/whl/cpu
          pip install -r requirements.txt --no-deps
          pip install jupyterlab pandas tensorflow matplotlib
          pip install pytest pytest-ruff nbval

      - name: Test Selected Python Scripts
        run: |
          source .venv/bin/activate
          pytest setup/02_installing-python-libraries/tests.py
          pytest ch04/01_main-chapter-code/tests.py
          pytest ch05/01_main-chapter-code/tests.py
          pytest ch06/01_main-chapter-code/tests.py

      - name: Validate Selected Jupyter Notebooks
        run: |
          source .venv/bin/activate
          pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb

================================================
FILE: .github/workflows/basic-tests-pixi.yml
================================================
name: Code tests (pixi)

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest]

    steps:
      - uses: actions/checkout@v6

      - name: Set up pixi (without caching)
        uses: prefix-dev/setup-pixi@v0.8.2
        with:
          environments: tests
          cache: false

      - name: List installed packages
        run: |
          pixi list --environment tests
          pixi run --environment tests pip install "huggingface-hub>=0.30.0,<1.0"

      - name: Test Selected Python Scripts
        shell: pixi run --environment tests bash -e {0}
        run: |
          pytest setup/02_installing-python-libraries/tests.py
          pytest ch04/01_main-chapter-code/tests.py
          pytest ch05/01_main-chapter-code/tests.py
          pytest ch06/01_main-chapter-code/tests.py

      - name: Validate Selected Jupyter Notebooks
        shell: pixi run --environment tests bash -e {0}
        run: |
          pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-pytorch-rc.yml
================================================
name: Test latest PyTorch nightly / release candidate
on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'    # Run workflow for changes in Python files
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v6

    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.13"

    - name: Install dependencies
      run: |
        curl -LsSf https://astral.sh/uv/install.sh | sh
        uv sync --dev  # tests for backwards compatibility
        uv add pytest-ruff nbval
        uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

    - name: Test Selected Python Scripts
      run: |
        source .venv/bin/activate
        pytest setup/02_installing-python-libraries/tests.py
        pytest ch04/01_main-chapter-code/tests.py
        pytest ch05/01_main-chapter-code/tests.py
        pytest ch06/01_main-chapter-code/tests.py

    - name: Validate Selected Jupyter Notebooks
      run: |
        source .venv/bin/activate
        pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
        pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
        pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-windows-uv-pip.yml
================================================
name: Code tests Windows (uv/pip)

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: windows-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v6

      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: '3.11'

      - name: Install dependencies
        shell: bash
        run: |
          export PATH="$HOME/.local/bin:$PATH"
          python -m pip install --upgrade pip
          pip install uv
          uv venv --python=python3.11
          source .venv/Scripts/activate
          pip install -r requirements.txt  # because of dependency issue on Windows when using `uv pip`
          pip install tensorflow-io-gcs-filesystem==0.31.0  # Explicit for Windows
          pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
          pip install pytest-ruff nbval
          pip install -e .

      - name: Run Python Tests
        shell: bash
        run: |
          source .venv/Scripts/activate
          pytest setup/02_installing-python-libraries/tests.py
          pytest ch04/01_main-chapter-code/tests.py
          pytest ch05/01_main-chapter-code/tests.py
          pytest ch05/07_gpt_to_llama/tests/tests_rope_and_parts.py
          pytest ch05/07_gpt_to_llama/tests/test_llama32_nb.py
          pytest ch05/11_qwen3/tests/test_qwen3_nb.py
          pytest ch06/01_main-chapter-code/tests.py

      - name: Run Jupyter Notebook Tests
        shell: bash
        run: |
          source .venv/Scripts/activate
          pytest --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb

================================================
FILE: .github/workflows/basic-tests-windows-uv-pip.yml.disabled
================================================
name: Code tests Windows (uv/pip)

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: windows-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.13"

      - name: Install dependencies
        shell: pwsh
        run: |
          $env:Path = "C:\Users\runneradmin\.local\bin;$env:Path"
          python -m pip install --upgrade pip
          python -m pip install uv
          uv venv --python=python3.11
          . .\.venv\Scripts\Activate.ps1
          $env:UV_PIP_OPTS="--no-binary tensorflow-io-gcs-filesystem"
          uv pip install -r requirements.txt
          uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
          uv pip install pytest-ruff nbval
          uv pip install --force-reinstall matplotlib "numpy<2.1"

      - name: Run Python Tests
        shell: pwsh
        run: |
          $env:Path = "C:\Users\runneradmin\.local\bin;$env:Path"
          . .\.venv\Scripts\Activate.ps1
          pytest --ruff setup/02_installing-python-libraries/tests.py
          pytest --ruff ch04/01_main-chapter-code/tests.py
          pytest --ruff ch05/01_main-chapter-code/tests.py
          pytest --ruff ch05/07_gpt_to_llama/tests/tests.py
          pytest --ruff ch06/01_main-chapter-code/tests.py

      - name: Run Jupyter Notebook Tests
        shell: pwsh
        run: |
          $env:Path = "C:\Users\runneradmin\.local\bin;$env:Path"
          . .\.venv\Scripts\Activate.ps1
          pytest --ruff --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --ruff --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --ruff --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/basic-tests-windows-uv.yml.disabled
================================================
name: Code tests Windows (uv)

on:
  push:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'
  pull_request:
    branches: [ main ]
    paths:
      - '**/*.py'
      - '**/*.ipynb'
      - '**/*.yaml'
      - '**/*.yml'
      - '**/*.sh'

jobs:
  test:
    runs-on: windows-latest

    steps:
      - name: Checkout Code
        uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.13"

      - name: Install dependencies
        shell: pwsh
        run: |
          # Prepend local bin directory to PATH
          powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
          $env:Path = "C:\Users\runneradmin\.local\bin;$env:Path"
          uv sync --dev --python=3.10
          $env:UV_PIP_OPTS="--no-binary tensorflow-io-gcs-filesystem"
          uv pip install -r requirements.txt
          uv pip install matplotlib  # for some reason Windows requires this
          uv pip install -r ch05/07_gpt_to_llama/tests/test-requirements-extra.txt
          uv add pytest-ruff nbval

      - name: Run Python Tests
        shell: pwsh
        run: |
          . .\.venv\Scripts\Activate.ps1
          pytest --ruff setup/02_installing-python-libraries/tests.py
          pytest --ruff ch04/01_main-chapter-code/tests.py
          pytest --ruff ch05/01_main-chapter-code/tests.py
          pytest --ruff ch06/01_main-chapter-code/tests.py

      - name: Run Jupyter Notebook Tests
        shell: pwsh
        run: |
          . .\.venv\Scripts\Activate.ps1
          pytest --ruff --nbval ch02/01_main-chapter-code/dataloader.ipynb
          pytest --ruff --nbval ch03/01_main-chapter-code/multihead-attention.ipynb
          pytest --ruff --nbval ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb


================================================
FILE: .github/workflows/check-links.yml
================================================
name: Check hyperlinks

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v6

    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.10"

    - name: Install dependencies
      run: |
        curl -LsSf https://astral.sh/uv/install.sh | sh
        uv sync --dev
        uv add pytest-check-links

    - name: Check links
      env:
        CHECK_LINKS_TIMEOUT: "10"
      run: |
        source .venv/bin/activate
        pytest --check-links ./ \
          --check-links-ignore "https://platform.openai.com/*" \
          --check-links-ignore "https://openai.com/*" \
          --check-links-ignore "https://arena.lmsys.org" \
          --check-links-ignore "https?://localhost(:\\d+)?/.*" \
          --check-links-ignore "https?://127[.]0[.]0[.]1(:\\d+)?/.*" \
          --check-links-ignore "https://mng\\.bz/.*" \
          --check-links-ignore "https://github\\.com/.*" \
          --check-links-ignore "https://unsloth.ai/blog/gradient" \
          --check-links-ignore "https://www.reddit.com/r/*" \
          --check-links-ignore "https://code.visualstudio.com/*" \
          --check-links-ignore "https://arxiv.org/*" \
          --check-links-ignore "https://ai.stanford.edu/~amaas/data/sentiment/" \
          --check-links-ignore "https://x.com/*" \
          --check-links-ignore "https://scholar.google.com/*"


================================================
FILE: .github/workflows/check-spelling-errors.yml
================================================
name: Spell Check

on:
  push:
    branches:
      - main
  pull_request:
    branches:
      - main

jobs:
  spellcheck:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v6

      - name: Set up Python
        uses: actions/setup-python@v6
        with:
          python-version: "3.10"

      - name: Install codespell
        run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
          uv sync --dev --python=3.10
          uv add codespell

      - name: Run codespell
        run: |
          source .venv/bin/activate
          codespell -L "ocassion,occassion,ot,te,tje" **/*.{txt,md,py,ipynb}


================================================
FILE: .github/workflows/pep8-linter.yml
================================================
name: PEP8 Style checks

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  flake8:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v6
    - name: Set up Python
      uses: actions/setup-python@v6
      with:
        python-version: "3.13"
    - name: Install ruff (a faster flake 8 equivalent)
      run: |
          curl -LsSf https://astral.sh/uv/install.sh | sh
          uv sync --dev --python=3.10
          uv add ruff

    - name: Run ruff with exceptions
      run: |
        source .venv/bin/activate
        ruff check .


================================================
FILE: .gitignore
================================================
# Reports
reports/

# Configs and keys
.chainlit
ch05/07_gpt_to_llama/config.json
ch07/02_dataset-utilities/config.json
ch07/03_model-evaluation/config.json

# Graphics
appendix-D/01_main-chapter-code/1.pdf
appendix-D/01_main-chapter-code/2.pdf
appendix-D/01_main-chapter-code/3.pdf

appendix-E/01_main-chapter-code/loss-plot.pdf

ch04/04_gqa/kv_bytes_vs_context_length.pdf
ch04/05_mla/kv_bytes_vs_context_length.pdf
ch04/06_swa/kv_bytes_vs_context_length.pdf
ch04/07_moe/ffn_vs_moe.pdf
ch04/08_deltanet/deltanet_memory_plot.pdf

ch05/01_main-chapter-code/loss-plot.pdf
ch05/01_main-chapter-code/temperature-plot.pdf
ch05/01_main-chapter-code/the-verdict.txt

ch06/01_main-chapter-code/loss-plot.pdf
ch06/01_main-chapter-code/accuracy-plot.pdf

ch07/01_main-chapter-code/loss-plot.pdf
ch07/01_main-chapter-code/loss-plot-standalone.pdf
ch07/01_main-chapter-code/loss-plot-baseline.pdf
ch07/01_main-chapter-code/loss-plot-mask-instructions.pdf
ch07/01_main-chapter-code/loss-plot-phi3-prompt.pdf
ch07/01_main-chapter-code/loss-plot-alpaca52k.pdf
ch07/04_preference-tuning-with-dpo/reward margins-plot.pdf

# Checkpoint files
appendix-A/01_main-chapter-code/model.pth

appendix-E/01_main-chapter-code/gpt2

ch05/01_main-chapter-code/gpt2/
ch05/02_alternative_weight_loading/checkpoints
ch05/02_alternative_weight_loading/*.safetensors
ch05/01_main-chapter-code/model.pth
ch05/01_main-chapter-code/model_and_optimizer.pth
ch05/03_bonus_pretraining_on_gutenberg/model_checkpoints
ch05/06_user_interface/gpt2
ch05/07_gpt_to_llama/.cache
ch05/07_gpt_to_llama/Llama-2-7b
ch05/07_gpt_to_llama/Llama-2-7b-chat
ch05/07_gpt_to_llama/Llama-3-8B
ch05/07_gpt_to_llama/Llama-3-8B-Instruct
ch05/07_gpt_to_llama/Llama-3.1-8B
ch05/07_gpt_to_llama/Llama-3.1-8B-Instruct
ch05/07_gpt_to_llama/Llama-3.2-1B
ch05/07_gpt_to_llama/Llama-3.2-1B-Instruct
ch05/07_gpt_to_llama/Llama-3.2-3B
ch05/07_gpt_to_llama/Llama-3.2-3B-Instruct
ch05/07_gpt_to_llama/llama3.2-1B-instruct.pth
ch05/07_gpt_to_llama/tokenizer.model
ch05/10_llm-training-speed/middlemarch.txt
ch05/10_llm-training-speed/loss.pdf
ch05/10_llm-training-speed/model.pth
ch05/11_qwen3/Qwen3-0.6B
ch05/11_qwen3/Qwen3-0.6B-Base
ch05/11_qwen3/Qwen3-1.7B
ch05/11_qwen3/Qwen3-1.7B-Base
ch05/11_qwen3/Qwen3-4B
ch05/11_qwen3/Qwen3-4B-Base
ch05/11_qwen3/Qwen3-8B
ch05/11_qwen3/Qwen3-8B-Base
ch05/11_qwen3/Qwen3-32B
ch05/11_qwen3/Qwen3-32B-Base
ch05/12_gemma3/gemma-3-270M-it
ch05/12_gemma3/gemma-3-270M
ch05/13_olmo3/Olmo-3-1025-7B
ch05/13_olmo3/Olmo-3-1125-32B
ch05/13_olmo3/Olmo-3-7B-Instruct
ch05/13_olmo3/Olmo-3-32B-Instruct
ch05/13_olmo3/Olmo-3-7B-Think
ch05/13_olmo3/Olmo-3-32B-Think
ch05/13_olmo3/Olmo-3-7B-RLZero-IF
ch05/13_olmo3/Olmo-3-32B-RLZero-IF

ch06/01_main-chapter-code/gpt2
ch06/02_bonus_additional-experiments/gpt2
ch06/03_bonus_imdb-classification/gpt2

ch07/01_main-chapter-code/gpt2-medium355M-sft-baseline.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft-mask-instructions.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft-phi3-prompt.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft-alpaca52k.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft-lora.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft.pth
ch07/01_main-chapter-code/gpt2-medium355M-sft-standalone.pth
ch07/01_main-chapter-code/Smalltestmodel-sft-standalone.pth
ch07/01_main-chapter-code/gpt2/

gemma-3-270m/
gemma-3-270m-it/
Qwen3-0.6B-Base/
Qwen3-0.6B/
tokenizer-base.json
tokenizer-reasoning.json
tokenizer.json
config.json
bpe_merges.txt

# Datasets
the-verdict.txt

appendix-E/01_main-chapter-code/sms_spam_collection.zip
appendix-E/01_main-chapter-code/sms_spam_collection
appendix-E/01_main-chapter-code/train.csv
appendix-E/01_main-chapter-code/test.csv
appendix-E/01_main-chapter-code/validation.csv

ch02/01_main-chapter-code/number-data.txt
ch02/05_bpe-from-scratch/the-verdict.txt

ch05/03_bonus_pretraining_on_gutenberg/gutenberg
ch05/03_bonus_pretraining_on_gutenberg/gutenberg_preprocessed

ch06/01_main-chapter-code/sms_spam_collection.zip
ch06/01_main-chapter-code/sms_spam_collection
ch06/01_main-chapter-code/test.csv
ch06/01_main-chapter-code/train.csv
ch06/01_main-chapter-code/validation.csv
ch06/01_main-chapter-code/review_classifier.pth
ch06/02_bonus_additional-experiments/test.csv
ch06/02_bonus_additional-experiments/train.csv
ch06/02_bonus_additional-experiments/validation.csv
ch06/02_bonus_additional-experiments/sms_spam_collection.zip
ch06/02_bonus_additional-experiments/sms_spam_collection
ch06/03_bonus_imdb-classification/aclImdb/
ch06/03_bonus_imdb-classification/aclImdb_v1.tar.gz
ch06/03_bonus_imdb-classification/test.csv
ch06/03_bonus_imdb-classification/train.csv
ch06/03_bonus_imdb-classification/validation.csv

ch07/01_main-chapter-code/instruction-data-with-response-standalone.json
ch07/01_main-chapter-code/instruction-data-with-response-baseline.json
ch07/01_main-chapter-code/instruction-data-with-response-mask-instructions.json
ch07/01_main-chapter-code/loss-plot-lora.pdf
ch07/01_main-chapter-code/instruction-data-with-response-alpaca52k.json
ch07/01_main-chapter-code/instruction-data-with-response-lora.json
ch07/01_main-chapter-code/instruction-data-with-response-phi3-prompt.json
ch07/02_dataset-utilities/instruction-examples-modified.json
ch07/04_preference-tuning-with-dpo/gpt2-medium355M-sft.pth
ch07/04_preference-tuning-with-dpo/loss-plot.pdf

# Tokenizer files
ch02/05_bpe-from-scratch/bpe_merges.txt
ch02/05_bpe-from-scratch/encoder.json
ch02/05_bpe-from-scratch/vocab.bpe
ch02/05_bpe-from-scratch/vocab.json
encoder.json
vocab.bpe
vocab.json

# Other
ch0?/0?_user_interface/.chainlit/
ch0?/0?_user_interface/chainlit.md
ch0?/0?_user_interface/.files
*.lock

# Temporary and OS-related files
chainlit.md
Untitled.ipynb
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
*.key
solution/

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
.python-version
uv.lock
pixi.lock
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# vscode
.vscode/

# pixi environments
.pixi
*.egg-info




================================================
FILE: .gitmodules
================================================
[submodule "reasoning-from-scratch"]
	path = reasoning-from-scratch
	url = https://github.com/rasbt/reasoning-from-scratch
	branch = main


================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this book or its accompanying code, please cite it as follows."
title: "Build A Large Language Model (From Scratch), Published by Manning, ISBN 978-1633437166"
abstract: "This book provides a comprehensive, step-by-step guide to implementing a ChatGPT-like large language model from scratch in PyTorch."
date-released: 2024-09-12
authors:
  - family-names: "Raschka"
    given-names: "Sebastian"
license: "Apache-2.0"
url: "https://www.manning.com/books/build-a-large-language-model-from-scratch"
repository-code: "https://github.com/rasbt/LLMs-from-scratch"
keywords:
  - large language models
  - natural language processing
  - artificial intelligence
  - PyTorch
  - machine learning
  - deep learning


================================================
FILE: LICENSE.txt
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      explicitly excluding any books specific to this software and any related images,
      and includes but is not limited to software source code,
      documentation source (excluding books and images related to this software),
      and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2023-2026 Sebastian Raschka

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

================================================
FILE: README.md
================================================
# Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book [Build a Large Language Model (From Scratch)](https://amzn.to/4fqvn0D).

<br>
<br>

<a href="https://amzn.to/4fqvn0D"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/cover.jpg?123" width="250px"></a>

<br>

In [*Build a Large Language Model (From Scratch)*](http://mng.bz/orYv), you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.

- Link to the official [source code repository](https://github.com/rasbt/LLMs-from-scratch)
- [Link to the book at Manning (the publisher's website)](http://mng.bz/orYv)
- [Link to the book page on Amazon.com](https://www.amazon.com/gp/product/1633437167)
- ISBN 9781633437166

<a href="http://mng.bz/orYv#reviews"><img src="https://sebastianraschka.com//images/LLMs-from-scratch-images/other/reviews.png" width="220px"></a>


<br>
<br>

To download a copy of this repository, click on the [Download ZIP](https://github.com/rasbt/LLMs-from-scratch/archive/refs/heads/main.zip) button or execute the following command in your terminal:

```bash
git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
```

<br>

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at [https://github.com/rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch) for the latest updates.)

<br>
<br>


# Table of Contents

Please note that this `README.md` file is a Markdown (`.md`) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, [Ghostwriter](https://ghostwriter.kde.org) is a good free option.

You can alternatively view this and other files on GitHub at [https://github.com/rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch) in your browser, which renders Markdown automatically.

<br>
<br>


> **Tip:**
> If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the [README.md](setup/README.md) file located in the [setup](setup) directory.

<br>
<br>

[![Code tests Linux](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-linux-uv.yml/badge.svg)](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-linux-uv.yml)
[![Code tests Windows](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-windows-uv-pip.yml/badge.svg)](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-windows-uv-pip.yml)
[![Code tests macOS](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-macos-uv.yml/badge.svg)](https://github.com/rasbt/LLMs-from-scratch/actions/workflows/basic-tests-macos-uv.yml)



| Chapter Title                                              | Main Code (for Quick Access)                                                                                                    | All Code + Supplementary      |
|------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
| [Setup recommendations](setup) <br/>[How to best read this book](https://sebastianraschka.com/blog/2025/reading-books.html)                            | -                                                                                                                               | -                             |
| Ch 1: Understanding Large Language Models                  | No code                                                                                                                         | -                             |
| Ch 2: Working with Text Data                               | - [ch02.ipynb](ch02/01_main-chapter-code/ch02.ipynb)<br/>- [dataloader.ipynb](ch02/01_main-chapter-code/dataloader.ipynb) (summary)<br/>- [exercise-solutions.ipynb](ch02/01_main-chapter-code/exercise-solutions.ipynb)               | [./ch02](./ch02)            |
| Ch 3: Coding Attention Mechanisms                          | - [ch03.ipynb](ch03/01_main-chapter-code/ch03.ipynb)<br/>- [multihead-attention.ipynb](ch03/01_main-chapter-code/multihead-attention.ipynb) (summary) <br/>- [exercise-solutions.ipynb](ch03/01_main-chapter-code/exercise-solutions.ipynb)| [./ch03](./ch03)             |
| Ch 4: Implementing a GPT Model from Scratch                | - [ch04.ipynb](ch04/01_main-chapter-code/ch04.ipynb)<br/>- [gpt.py](ch04/01_main-chapter-code/gpt.py) (summary)<br/>- [exercise-solutions.ipynb](ch04/01_main-chapter-code/exercise-solutions.ipynb) | [./ch04](./ch04)           |
| Ch 5: Pretraining on Unlabeled Data                        | - [ch05.ipynb](ch05/01_main-chapter-code/ch05.ipynb)<br/>- [gpt_train.py](ch05/01_main-chapter-code/gpt_train.py) (summary) <br/>- [gpt_generate.py](ch05/01_main-chapter-code/gpt_generate.py) (summary) <br/>- [exercise-solutions.ipynb](ch05/01_main-chapter-code/exercise-solutions.ipynb) | [./ch05](./ch05)              |
| Ch 6: Finetuning for Text Classification                   | - [ch06.ipynb](ch06/01_main-chapter-code/ch06.ipynb)  <br/>- [gpt_class_finetune.py](ch06/01_main-chapter-code/gpt_class_finetune.py)  <br/>- [exercise-solutions.ipynb](ch06/01_main-chapter-code/exercise-solutions.ipynb) | [./ch06](./ch06)              |
| Ch 7: Finetuning to Follow Instructions                    | - [ch07.ipynb](ch07/01_main-chapter-code/ch07.ipynb)<br/>- [gpt_instruction_finetuning.py](ch07/01_main-chapter-code/gpt_instruction_finetuning.py) (summary)<br/>- [ollama_evaluate.py](ch07/01_main-chapter-code/ollama_evaluate.py) (summary)<br/>- [exercise-solutions.ipynb](ch07/01_main-chapter-code/exercise-solutions.ipynb) | [./ch07](./ch07)  |
| Appendix A: Introduction to PyTorch                        | - [code-part1.ipynb](appendix-A/01_main-chapter-code/code-part1.ipynb)<br/>- [code-part2.ipynb](appendix-A/01_main-chapter-code/code-part2.ipynb)<br/>- [DDP-script.py](appendix-A/01_main-chapter-code/DDP-script.py)<br/>- [exercise-solutions.ipynb](appendix-A/01_main-chapter-code/exercise-solutions.ipynb) | [./appendix-A](./appendix-A) |
| Appendix B: References and Further Reading                 | No code                                                                                                                         | [./appendix-B](./appendix-B) |
| Appendix C: Exercise Solutions                             | - [list of exercise solutions](appendix-C)                                                                 | [./appendix-C](./appendix-C) |
| Appendix D: Adding Bells and Whistles to the Training Loop | - [appendix-D.ipynb](appendix-D/01_main-chapter-code/appendix-D.ipynb)                                                          | [./appendix-D](./appendix-D)  |
| Appendix E: Parameter-efficient Finetuning with LoRA       | - [appendix-E.ipynb](appendix-E/01_main-chapter-code/appendix-E.ipynb)                                                          | [./appendix-E](./appendix-E) |

<br>
&nbsp;

The mental model below summarizes the contents covered in this book.

<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/mental-model.jpg" width="650px">


<br>
&nbsp;

## Prerequisites

The most important prerequisite is a strong foundation in Python programming.
With this knowledge, you will be well prepared to explore the fascinating world of LLMs
and understand the concepts and code examples presented in this book.

If you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.

This book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, [PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs](https://sebastianraschka.com/teaching/pytorch-1h/), helpful for learning about the essentials.



<br>
&nbsp;

## Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the [setup](https://github.com/rasbt/LLMs-from-scratch/blob/main/setup/README.md) doc for additional recommendations.)


&nbsp;
## Video Course

[A 17-hour and 15-minute companion video course](https://www.manning.com/livevideo/master-and-build-large-language-models) where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book's structure so that it can be used as a standalone alternative to the book or complementary code-along resource.

<a href="https://www.manning.com/livevideo/master-and-build-large-language-models"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/video-screenshot.webp?123" width="350px"></a>


&nbsp;


## Companion Book / Sequel

[*Build A Reasoning Model (From Scratch)*](https://mng.bz/lZ5B), while a standalone book, can be considered as a sequel to *Build A Large Language Model (From Scratch)*.

It starts with a pretrained model and implements different reasoning approaches, including inference-time scaling, reinforcement learning, and distillation, to improve the model's reasoning capabilities.

Similar to *Build A Large Language Model (From Scratch)*, [*Build A Reasoning Model (From Scratch)*](https://mng.bz/lZ5B) takes a hands-on approach implementing these methods from scratch.

<a href="https://mng.bz/lZ5B"><img src="https://sebastianraschka.com/images/reasoning-from-scratch-images/cover.webp?123" width="120px"></a>

- Amazon link (TBD)
- [Manning link](https://mng.bz/lZ5B)
- [GitHub repository](https://github.com/rasbt/reasoning-from-scratch)

<br>

&nbsp;
## Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example,  [./ch02/01_main-chapter-code/exercise-solutions.ipynb](./ch02/01_main-chapter-code/exercise-solutions.ipynb).

In addition to the code exercises, you can download a free 170-page PDF titled  [Test Yourself On Build a Large Language Model (From Scratch)](https://www.manning.com/books/test-yourself-on-build-a-large-language-model-from-scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.

<a href="https://www.manning.com/books/test-yourself-on-build-a-large-language-model-from-scratch"><img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/test-yourself-cover.jpg?123" width="150px"></a>

&nbsp;
## Bonus Material

Several folders contain optional materials as a bonus for interested readers:
- **Setup**
  - [Python Setup Tips](setup/01_optional-python-setup-preferences)
  - [Installing Python Packages and Libraries Used in This Book](setup/02_installing-python-libraries)
  - [Docker Environment Setup Guide](setup/03_optional-docker-environment)

- **Chapter 2: Working With Text Data**
  - [Byte Pair Encoding (BPE) Tokenizer From Scratch](ch02/05_bpe-from-scratch/bpe-from-scratch-simple.ipynb)
  - [Comparing Various Byte Pair Encoding (BPE) Implementations](ch02/02_bonus_bytepair-encoder)
  - [Understanding the Difference Between Embedding Layers and Linear Layers](ch02/03_bonus_embedding-vs-matmul)
  - [Dataloader Intuition With Simple Numbers](ch02/04_bonus_dataloader-intuition)

- **Chapter 3: Coding Attention Mechanisms**
  - [Comparing Efficient Multi-Head Attention Implementations](ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb)
  - [Understanding PyTorch Buffers](ch03/03_understanding-buffers/understanding-buffers.ipynb)

- **Chapter 4: Implementing a GPT Model From Scratch**
  - [FLOPs Analysis](ch04/02_performance-analysis/flops-analysis.ipynb)
  - [KV Cache](ch04/03_kv-cache)
  - [Attention Alternatives](ch04/#attention-alternatives)
    - [Grouped-Query Attention](ch04/04_gqa)
    - [Multi-Head Latent Attention](ch04/05_mla)
    - [Sliding Window Attention](ch04/06_swa)
    - [Gated DeltaNet](ch04/08_deltanet)
  - [Mixture-of-Experts (MoE)](ch04/07_moe)

- **Chapter 5: Pretraining on Unlabeled Data**
  - [Alternative Weight Loading Methods](ch05/02_alternative_weight_loading/)
  - [Pretraining GPT on the Project Gutenberg Dataset](ch05/03_bonus_pretraining_on_gutenberg)
  - [Adding Bells and Whistles to the Training Loop](ch05/04_learning_rate_schedulers)
  - [Optimizing Hyperparameters for Pretraining](ch05/05_bonus_hparam_tuning)
  - [Building a User Interface to Interact With the Pretrained LLM](ch05/06_user_interface)
  - [Converting GPT to Llama](ch05/07_gpt_to_llama)
  - [Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
  - [Extending the Tiktoken BPE Tokenizer with New Tokens](ch05/09_extending-tokenizers/extend-tiktoken.ipynb)
  - [PyTorch Performance Tips for Faster LLM Training](ch05/10_llm-training-speed)
  - [LLM Architectures](ch05/#llm-architectures-from-scratch)
    - [Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
    - [Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05/11_qwen3/)
    - [Gemma 3 From Scratch](ch05/12_gemma3/)
    - [Olmo 3 From Scratch](ch05/13_olmo3/)
    - [Tiny Aya From Scratch](ch05/15_tiny-aya/)
    - [Qwen3.5 From Scratch](ch05/16_qwen3.5/)
  - [Chapter 5 with other LLMs as Drop-In Replacement (e.g., Llama 3, Qwen 3)](ch05/14_ch05_with_other_llms/)
- **Chapter 6: Finetuning for classification**
  - [Additional Experiments Finetuning Different Layers and Using Larger Models](ch06/02_bonus_additional-experiments)
  - [Finetuning Different Models on the 50k IMDb Movie Review Dataset](ch06/03_bonus_imdb-classification)
  - [Building a User Interface to Interact With the GPT-based Spam Classifier](ch06/04_user_interface)
- **Chapter 7: Finetuning to follow instructions**
  - [Dataset Utilities for Finding Near Duplicates and Creating Passive Voice Entries](ch07/02_dataset-utilities)
  - [Evaluating Instruction Responses Using the OpenAI API and Ollama](ch07/03_model-evaluation)
  - [Generating a Dataset for Instruction Finetuning](ch07/05_dataset-generation/llama3-ollama.ipynb)
  - [Improving a Dataset for Instruction Finetuning](ch07/05_dataset-generation/reflection-gpt4.ipynb)
  - [Generating a Preference Dataset With Llama 3.1 70B and Ollama](ch07/04_preference-tuning-with-dpo/create-preference-data-ollama.ipynb)
  - [Direct Preference Optimization (DPO) for LLM Alignment](ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb)
  - [Building a User Interface to Interact With the Instruction-Finetuned GPT Model](ch07/06_user_interface)

More bonus material from the [Reasoning From Scratch](https://github.com/rasbt/reasoning-from-scratch) repository:

- **Qwen3 (From Scratch) Basics**
  - [Qwen3 Source Code Walkthrough](https://github.com/rasbt/reasoning-from-scratch/blob/main/chC/01_main-chapter-code/chC_main.ipynb)
  - [Optimized Qwen3](https://github.com/rasbt/reasoning-from-scratch/tree/main/ch02/03_optimized-LLM)

- **Evaluation**
  - [Verifier-Based Evaluation (MATH-500)](https://github.com/rasbt/reasoning-from-scratch/tree/main/ch03)
  - [Multiple-Choice Evaluation (MMLU)](https://github.com/rasbt/reasoning-from-scratch/blob/main/chF/02_mmlu)
  - [LLM Leaderboard Evaluation](https://github.com/rasbt/reasoning-from-scratch/blob/main/chF/03_leaderboards)
  - [LLM-as-a-Judge Evaluation](https://github.com/rasbt/reasoning-from-scratch/blob/main/chF/04_llm-judge)
- **Inference Scaling**
  - [Self-Consistency](https://github.com/rasbt/reasoning-from-scratch/blob/main/ch04/01_main-chapter-code/ch04_main.ipynb)
  - [Self-Refinement](https://github.com/rasbt/reasoning-from-scratch/blob/main/ch05/01_main-chapter-code/ch05_main.ipynb)

- **Reinforcement Learning** (RL)
  - [RLVR with GRPO From Scratch](https://github.com/rasbt/reasoning-from-scratch/blob/main/ch06/01_main-chapter-code/ch06_main.ipynb)


<br>
&nbsp;

## Questions, Feedback, and Contributing to This Repository


I welcome all sorts of feedback, best shared via the [Manning Forum](https://livebook.manning.com/forum?product=raschka&page=1) or [GitHub Discussions](https://github.com/rasbt/LLMs-from-scratch/discussions). Likewise, if you have any questions or just want to bounce ideas off others, please don't hesitate to post these in the forum as well.

Please note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.


&nbsp;
## Citation

If you find this book or code useful for your research, please consider citing it.

Chicago-style citation:

> Raschka, Sebastian. *Build A Large Language Model (From Scratch)*. Manning, 2024. ISBN: 978-1633437166.

BibTeX entry:

```
@book{build-llms-from-scratch-book,
  author       = {Sebastian Raschka},
  title        = {Build A Large Language Model (From Scratch)},
  publisher    = {Manning},
  year         = {2024},
  isbn         = {978-1633437166},
  url          = {https://www.manning.com/books/build-a-large-language-model-from-scratch},
  github       = {https://github.com/rasbt/LLMs-from-scratch}
}
```


================================================
FILE: appendix-A/01_main-chapter-code/DDP-script-torchrun.py
================================================
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
# Source for "Build a Large Language Model From Scratch"
#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
# Code: https://github.com/rasbt/LLMs-from-scratch

# Appendix A: Introduction to PyTorch (Part 3)

import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

# NEW imports:
import os
import platform
from torch.utils.data.distributed import DistributedSampler
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed import init_process_group, destroy_process_group


# NEW: function to initialize a distributed process group (1 process / GPU)
# this allows communication among processes
def ddp_setup(rank, world_size):
    """
    Arguments:
        rank: a unique process ID
        world_size: total number of processes in the group
    """
    # Only set MASTER_ADDR and MASTER_PORT if not already defined by torchrun
    if "MASTER_ADDR" not in os.environ:
        os.environ["MASTER_ADDR"] = "localhost"
    if "MASTER_PORT" not in os.environ:
        os.environ["MASTER_PORT"] = "12345"

    # initialize process group
    if platform.system() == "Windows":
        # Disable libuv because PyTorch for Windows isn't built with support
        os.environ["USE_LIBUV"] = "0"
        # Windows users may have to use "gloo" instead of "nccl" as backend
        # gloo: Facebook Collective Communication Library
        init_process_group(backend="gloo", rank=rank, world_size=world_size)
    else:
        # nccl: NVIDIA Collective Communication Library
        init_process_group(backend="nccl", rank=rank, world_size=world_size)

    torch.cuda.set_device(rank)


class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    def __getitem__(self, index):
        one_x = self.features[index]
        one_y = self.labels[index]
        return one_x, one_y

    def __len__(self):
        return self.labels.shape[0]


class NeuralNetwork(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()

        self.layers = torch.nn.Sequential(
            # 1st hidden layer
            torch.nn.Linear(num_inputs, 30),
            torch.nn.ReLU(),

            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),

            # output layer
            torch.nn.Linear(20, num_outputs),
        )

    def forward(self, x):
        logits = self.layers(x)
        return logits


def prepare_dataset():
    X_train = torch.tensor([
        [-1.2, 3.1],
        [-0.9, 2.9],
        [-0.5, 2.6],
        [2.3, -1.1],
        [2.7, -1.5]
    ])
    y_train = torch.tensor([0, 0, 0, 1, 1])

    X_test = torch.tensor([
        [-0.8, 2.8],
        [2.6, -1.6],
    ])
    y_test = torch.tensor([0, 1])

    # Uncomment these lines to increase the dataset size to run this script on up to 8 GPUs:
    # factor = 4
    # X_train = torch.cat([X_train + torch.randn_like(X_train) * 0.1 for _ in range(factor)])
    # y_train = y_train.repeat(factor)
    # X_test = torch.cat([X_test + torch.randn_like(X_test) * 0.1 for _ in range(factor)])
    # y_test = y_test.repeat(factor)

    train_ds = ToyDataset(X_train, y_train)
    test_ds = ToyDataset(X_test, y_test)

    train_loader = DataLoader(
        dataset=train_ds,
        batch_size=2,
        shuffle=False,  # NEW: False because of DistributedSampler below
        pin_memory=True,
        drop_last=True,
        # NEW: chunk batches across GPUs without overlapping samples:
        sampler=DistributedSampler(train_ds)  # NEW
    )
    test_loader = DataLoader(
        dataset=test_ds,
        batch_size=2,
        shuffle=False,
    )
    return train_loader, test_loader


# NEW: wrapper
def main(rank, world_size, num_epochs):

    ddp_setup(rank, world_size)  # NEW: initialize process groups

    train_loader, test_loader = prepare_dataset()
    model = NeuralNetwork(num_inputs=2, num_outputs=2)
    model.to(rank)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

    model = DDP(model, device_ids=[rank])  # NEW: wrap model with DDP
    # the core model is now accessible as model.module

    for epoch in range(num_epochs):
        # NEW: Set sampler to ensure each epoch has a different shuffle order
        train_loader.sampler.set_epoch(epoch)

        model.train()
        for features, labels in train_loader:

            features, labels = features.to(rank), labels.to(rank)  # New: use rank
            logits = model(features)
            loss = F.cross_entropy(logits, labels)  # Loss function

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # LOGGING
            print(f"[GPU{rank}] Epoch: {epoch+1:03d}/{num_epochs:03d}"
                  f" | Batchsize {labels.shape[0]:03d}"
                  f" | Train/Val Loss: {loss:.2f}")

    model.eval()

    try:
        train_acc = compute_accuracy(model, train_loader, device=rank)
        print(f"[GPU{rank}] Training accuracy", train_acc)
        test_acc = compute_accuracy(model, test_loader, device=rank)
        print(f"[GPU{rank}] Test accuracy", test_acc)

    ####################################################
    # NEW (not in the book):
    except ZeroDivisionError as e:
        raise ZeroDivisionError(
            f"{e}\n\nThis script is designed for 2 GPUs. You can run it as:\n"
            "torchrun --nproc_per_node=2 DDP-script-torchrun.py\n"
            f"Or, to run it on {torch.cuda.device_count()} GPUs, uncomment the code on lines 103 to 107."
        )
    ####################################################

    destroy_process_group()  # NEW: cleanly exit distributed mode


def compute_accuracy(model, dataloader, device):
    model = model.eval()
    correct = 0.0
    total_examples = 0

    for idx, (features, labels) in enumerate(dataloader):
        features, labels = features.to(device), labels.to(device)

        with torch.no_grad():
            logits = model(features)
        predictions = torch.argmax(logits, dim=1)
        compare = labels == predictions
        correct += torch.sum(compare)
        total_examples += len(compare)
    return (correct / total_examples).item()


if __name__ == "__main__":
    # NEW: Use environment variables set by torchrun if available, otherwise default to single-process.
    if "WORLD_SIZE" in os.environ:
        world_size = int(os.environ["WORLD_SIZE"])
    else:
        world_size = 1

    if "LOCAL_RANK" in os.environ:
        rank = int(os.environ["LOCAL_RANK"])
    elif "RANK" in os.environ:
        rank = int(os.environ["RANK"])
    else:
        rank = 0

    # Only print on rank 0 to avoid duplicate prints from each GPU process
    if rank == 0:
        print("PyTorch version:", torch.__version__)
        print("CUDA available:", torch.cuda.is_available())
        print("Number of GPUs available:", torch.cuda.device_count())

    torch.manual_seed(123)
    num_epochs = 3
    main(rank, world_size, num_epochs)


================================================
FILE: appendix-A/01_main-chapter-code/DDP-script.py
================================================
# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).
# Source for "Build a Large Language Model From Scratch"
#   - https://www.manning.com/books/build-a-large-language-model-from-scratch
# Code: https://github.com/rasbt/LLMs-from-scratch

# Appendix A: Introduction to PyTorch (Part 3)

import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

# NEW imports:
import os
import platform
import torch.multiprocessing as mp
from torch.utils.data.distributed import DistributedSampler
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.distributed import init_process_group, destroy_process_group


# NEW: function to initialize a distributed process group (1 process / GPU)
# this allows communication among processes
def ddp_setup(rank, world_size):
    """
    Arguments:
        rank: a unique process ID
        world_size: total number of processes in the group
    """
    # rank of machine running rank:0 process
    # here, we assume all GPUs are on the same machine
    os.environ["MASTER_ADDR"] = "localhost"
    # any free port on the machine
    os.environ["MASTER_PORT"] = "12345"

    # initialize process group
    if platform.system() == "Windows":
        # Disable libuv because PyTorch for Windows isn't built with support
        os.environ["USE_LIBUV"] = "0"
        # Windows users may have to use "gloo" instead of "nccl" as backend
        # gloo: Facebook Collective Communication Library
        init_process_group(backend="gloo", rank=rank, world_size=world_size)
    else:
        # nccl: NVIDIA Collective Communication Library
        init_process_group(backend="nccl", rank=rank, world_size=world_size)

    torch.cuda.set_device(rank)


class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.features = X
        self.labels = y

    def __getitem__(self, index):
        one_x = self.features[index]
        one_y = self.labels[index]
        return one_x, one_y

    def __len__(self):
        return self.labels.shape[0]


class NeuralNetwork(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()

        self.layers = torch.nn.Sequential(
            # 1st hidden layer
            torch.nn.Linear(num_inputs, 30),
            torch.nn.ReLU(),

            # 2nd hidden layer
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),

            # output layer
            torch.nn.Linear(20, num_outputs),
        )

    def forward(self, x):
        logits = self.layers(x)
        return logits


def prepare_dataset():
    X_train = torch.tensor([
        [-1.2, 3.1],
        [-0.9, 2.9],
        [-0.5, 2.6],
        [2.3, -1.1],
        [2.7, -1.5]
    ])
    y_train = torch.tensor([0, 0, 0, 1, 1])

    X_test = torch.tensor([
        [-0.8, 2.8],
        [2.6, -1.6],
    ])
    y_test = torch.tensor([0, 1])

    # Uncomment these lines to increase the dataset size to run this script on up to 8 GPUs:
    # factor = 4
    # X_train = torch.cat([X_train + torch.randn_like(X_train) * 0.1 for _ in range(factor)])
    # y_train = y_train.repeat(factor)
    # X_test = torch.cat([X_test + torch.randn_like(X_test) * 0.1 for _ in range(factor)])
    # y_test = y_test.repeat(factor)

    train_ds = ToyDataset(X_train, y_train)
    test_ds = ToyDataset(X_test, y_test)

    train_loader = DataLoader(
        dataset=train_ds,
        batch_size=2,
        shuffle=False,  # NEW: False because of DistributedSampler below
        pin_memory=True,
        drop_last=True,
        # NEW: chunk batches across GPUs without overlapping samples:
        sampler=DistributedSampler(train_ds)  # NEW
    )
    test_loader = DataLoader(
        dataset=test_ds,
        batch_size=2,
        shuffle=False,
    )
    return train_loader, test_loader


# NEW: wrapper
def main(rank, world_size, num_epochs):

    ddp_setup(rank, world_size)  # NEW: initialize process groups

    train_loader, test_loader = prepare_dataset()
    model = NeuralNetwork(num_inputs=2, num_outputs=2)
    model.to(rank)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

    model = DDP(model, device_ids=[rank])  # NEW: wrap model with DDP
    # the core model is now accessible as model.module

    for epoch in range(num_epochs):
        # NEW: Set sampler to ensure each epoch has a different shuffle order
        train_loader.sampler.set_epoch(epoch)

        model.train()
        for features, labels in train_loader:

            features, labels = features.to(rank), labels.to(rank)  # New: use rank
            logits = model(features)
            loss = F.cross_entropy(logits, labels)  # Loss function

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # LOGGING
            print(f"[GPU{rank}] Epoch: {epoch+1:03d}/{num_epochs:03d}"
                  f" | Batchsize {labels.shape[0]:03d}"
                  f" | Train/Val Loss: {loss:.2f}")

    model.eval()

    try:
        train_acc = compute_accuracy(model, train_loader, device=rank)
        print(f"[GPU{rank}] Training accuracy", train_acc)
        test_acc = compute_accuracy(model, test_loader, device=rank)
        print(f"[GPU{rank}] Test accuracy", test_acc)

    ####################################################
    # NEW (not in the book):
    except ZeroDivisionError as e:
        raise ZeroDivisionError(
            f"{e}\n\nThis script is designed for 2 GPUs. You can run it as:\n"
            "CUDA_VISIBLE_DEVICES=0,1 python DDP-script.py\n"
            f"Or, to run it on {torch.cuda.device_count()} GPUs, uncomment the code on lines 103 to 107."
        )
    ####################################################

    destroy_process_group()  # NEW: cleanly exit distributed mode


def compute_accuracy(model, dataloader, device):
    model = model.eval()
    correct = 0.0
    total_examples = 0

    for idx, (features, labels) in enumerate(dataloader):
        features, labels = features.to(device), labels.to(device)

        with torch.no_grad():
            logits = model(features)
        predictions = torch.argmax(logits, dim=1)
        compare = labels == predictions
        correct += torch.sum(compare)
        total_examples += len(compare)
    return (correct / total_examples).item()


if __name__ == "__main__":
    # This script may not work for GPUs > 2 due to the small dataset
    # Run `CUDA_VISIBLE_DEVICES=0,1 python DDP-script.py` if you have GPUs > 2
    print("PyTorch version:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("Number of GPUs available:", torch.cuda.device_count())
    torch.manual_seed(123)

    # NEW: spawn new processes
    # note that spawn will automatically pass the rank
    num_epochs = 3
    world_size = torch.cuda.device_count()
    mp.spawn(main, args=(world_size, num_epochs), nprocs=world_size)
    # nprocs=world_size spawns one process per GPU


================================================
FILE: appendix-A/01_main-chapter-code/README.md
================================================
# Appendix A: Introduction to PyTorch

### Main Chapter Code

- [code-part1.ipynb](code-part1.ipynb) contains all the section A.1 to A.8 code as it appears in the chapter
- [code-part2.ipynb](code-part2.ipynb) contains all the section A.9 GPU code as it appears in the chapter 
- [DDP-script.py](DDP-script.py) contains the script to demonstrate multi-GPU usage (note that Jupyter Notebooks only support single GPUs, so this is a script, not a notebook). You can run it as `python DDP-script.py`. If your machine has more than 2 GPUs, run it as `CUDA_VISIBLE_DEVIVES=0,1 python DDP-script.py`.
- [exercise-solutions.ipynb](exercise-solutions.ipynb) contains the exercise solutions for this chapter

### Optional Code

- [DDP-script-torchrun.py](DDP-script-torchrun.py) is an optional version of the `DDP-script.py` script that runs via the PyTorch `torchrun` command instead of spawning and managing multiple processes ourselves via `multiprocessing.spawn`. The `torchrun` command has the advantage of automatically handling distributed initialization, including multi-node coordination, which slightly simplifies the setup process. You can use this script via `torchrun --nproc_per_node=2 DDP-script-torchrun.py`


================================================
FILE: appendix-A/01_main-chapter-code/code-part1.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f896245e-57c4-48fd-854f-9e43f22e10c9",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca7fc8a0-280c-4979-b0c7-fc3a99b3b785",
   "metadata": {},
   "source": [
    "# Appendix A: Introduction to PyTorch (Part 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5bf13d2-8fc2-483e-88cc-6b4310221e68",
   "metadata": {},
   "source": [
    "## A.1 What is PyTorch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "96ee5660-5327-48e2-9104-a882b3b2afa4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.9.1\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "print(torch.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f73ad4e4-7ec6-4467-a9e9-0cdf6d195264",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n"
     ]
    }
   ],
   "source": [
    "print(torch.cuda.is_available())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "397ba1ab-3306-4965-8618-1ed5f24fb939",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/1.webp\" width=\"400px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e3c0555-88f6-4515-8c99-aa56b0769d54",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/2.webp\" width=\"300px\">\n",
    "\n",
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/3.webp\" width=\"300px\">\n",
    "\n",
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/4.webp\" width=\"500px\">\n",
    "\n",
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/5.webp\" width=\"500px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2100cf2e-7459-4ab3-92a8-43e86ab35a9b",
   "metadata": {},
   "source": [
    "## A.2 Understanding tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c484e87-bfc9-4105-b0a7-1e23b2a72a30",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/6.webp\" width=\"400px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26d7f785-e048-42bc-9182-a556af6bb7f4",
   "metadata": {},
   "source": [
    "### A.2.1 Scalars, vectors, matrices, and tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "a3a464d6-cec8-4363-87bd-ea4f900baced",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import numpy as np\n",
    "\n",
    "# create a 0D tensor (scalar) from a Python integer\n",
    "tensor0d = torch.tensor(1)\n",
    "\n",
    "# create a 1D tensor (vector) from a Python list\n",
    "tensor1d = torch.tensor([1, 2, 3])\n",
    "\n",
    "# create a 2D tensor from a nested Python list\n",
    "tensor2d = torch.tensor([[1, 2], \n",
    "                         [3, 4]])\n",
    "\n",
    "# create a 3D tensor from a nested Python list\n",
    "tensor3d_1 = torch.tensor([[[1, 2], [3, 4]], \n",
    "                           [[5, 6], [7, 8]]])\n",
    "\n",
    "# create a 3D tensor from NumPy array\n",
    "ary3d = np.array([[[1, 2], [3, 4]], \n",
    "                  [[5, 6], [7, 8]]])\n",
    "tensor3d_2 = torch.tensor(ary3d)  # Copies NumPy array\n",
    "tensor3d_3 = torch.from_numpy(ary3d)  # Shares memory with NumPy array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "dbe14c47-499a-4d48-b354-a0e6fd957872",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[1, 2],\n",
      "         [3, 4]],\n",
      "\n",
      "        [[5, 6],\n",
      "         [7, 8]]])\n"
     ]
    }
   ],
   "source": [
    "ary3d[0, 0, 0] = 999\n",
    "print(tensor3d_2) # remains unchanged"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "e3e4c23a-cdba-46f5-a2dc-5fb32bf9117b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[999,   2],\n",
      "         [  3,   4]],\n",
      "\n",
      "        [[  5,   6],\n",
      "         [  7,   8]]])\n"
     ]
    }
   ],
   "source": [
    "print(tensor3d_3) # changes because of memory sharing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63dec48d-2b60-41a2-ac06-fef7e718605a",
   "metadata": {},
   "source": [
    "### A.2.2 Tensor data types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3f48c014-e1a2-4a53-b5c5-125812d4034c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.int64\n"
     ]
    }
   ],
   "source": [
    "tensor1d = torch.tensor([1, 2, 3])\n",
    "print(tensor1d.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5429a086-9de2-4ac7-9f14-d087a7507394",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.float32\n"
     ]
    }
   ],
   "source": [
    "floatvec = torch.tensor([1.0, 2.0, 3.0])\n",
    "print(floatvec.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a9a438d1-49bb-481c-8442-7cc2bb3dd4af",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.float32\n"
     ]
    }
   ],
   "source": [
    "floatvec = tensor1d.to(torch.float32)\n",
    "print(floatvec.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2020deb5-aa02-4524-b311-c010f4ad27ff",
   "metadata": {},
   "source": [
    "### A.2.3 Common PyTorch tensor operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c02095f2-8a48-4953-b3c9-5313d4362ce7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 2, 3],\n",
       "        [4, 5, 6]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d = torch.tensor([[1, 2, 3], \n",
    "                         [4, 5, 6]])\n",
    "tensor2d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f33e1d45-5b2c-4afe-b4b2-66ac4099fd1a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([2, 3])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f3a4129d-f870-4e03-9c32-cd8521cb83fe",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 2],\n",
       "        [3, 4],\n",
       "        [5, 6]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d.reshape(3, 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "589ac0a7-adc7-41f3-b721-155f580e9369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 2],\n",
       "        [3, 4],\n",
       "        [5, 6]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d.view(3, 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "344e307f-ba5d-4f9a-a791-2c75a3d1417e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1, 4],\n",
       "        [2, 5],\n",
       "        [3, 6]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "19a75030-6a41-4ca8-9aae-c507ae79225c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[14, 32],\n",
       "        [32, 77]])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d.matmul(tensor2d.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "e7c950bc-d640-4203-b210-3ac8932fe4d4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[14, 32],\n",
       "        [32, 77]])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tensor2d @ tensor2d.T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c15bdeb-78e2-4870-8a4f-a9f591666f38",
   "metadata": {},
   "source": [
    "## A.3 Seeing models as computation graphs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f3e16c3-07df-44b6-9106-a42fb24452a9",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/7.webp\" width=\"600px\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "22af61e9-0443-4705-94d7-24c21add09c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(0.0852)\n"
     ]
    }
   ],
   "source": [
    "import torch.nn.functional as F\n",
    "\n",
    "y = torch.tensor([1.0])  # true label\n",
    "x1 = torch.tensor([1.1]) # input feature\n",
    "w1 = torch.tensor([2.2]) # weight parameter\n",
    "b = torch.tensor([0.0])  # bias unit\n",
    "\n",
    "z = x1 * w1 + b          # net input\n",
    "a = torch.sigmoid(z)     # activation & output\n",
    "\n",
    "loss = F.binary_cross_entropy(a, y)\n",
    "print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9424f26-2bac-47e7-b834-92ece802247c",
   "metadata": {},
   "source": [
    "## A.4 Automatic differentiation made easy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33aa2ee4-6f1d-448d-8707-67cd5278233c",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/8.webp\" width=\"600px\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "ebf5cef7-48d6-4d2a-8ab0-0fb10bdd7d1a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(tensor([-0.0898]),)\n",
      "(tensor([-0.0817]),)\n"
     ]
    }
   ],
   "source": [
    "import torch.nn.functional as F\n",
    "from torch.autograd import grad\n",
    "\n",
    "y = torch.tensor([1.0])\n",
    "x1 = torch.tensor([1.1])\n",
    "w1 = torch.tensor([2.2], requires_grad=True)\n",
    "b = torch.tensor([0.0], requires_grad=True)\n",
    "\n",
    "z = x1 * w1 + b \n",
    "a = torch.sigmoid(z)\n",
    "\n",
    "loss = F.binary_cross_entropy(a, y)\n",
    "\n",
    "grad_L_w1 = grad(loss, w1, retain_graph=True)\n",
    "grad_L_b = grad(loss, b, retain_graph=True)\n",
    "\n",
    "print(grad_L_w1)\n",
    "print(grad_L_b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "93c5875d-f6b2-492c-b5ef-7e132f93a4e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([-0.0898])\n",
      "tensor([-0.0817])\n"
     ]
    }
   ],
   "source": [
    "loss.backward()\n",
    "\n",
    "print(w1.grad)\n",
    "print(b.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f53bdd7d-44e6-40ab-8a5a-4eef74ef35dc",
   "metadata": {},
   "source": [
    "## A.5 Implementing multilayer neural networks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6cb9787-2bc8-4379-9e8c-a3401ac63c51",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/9.webp\" width=\"500px\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "84b749e1-7768-4cfe-94d6-a08c7feff4a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "class NeuralNetwork(torch.nn.Module):\n",
    "    def __init__(self, num_inputs, num_outputs):\n",
    "        super().__init__()\n",
    "\n",
    "        self.layers = torch.nn.Sequential(\n",
    "                \n",
    "            # 1st hidden layer\n",
    "            torch.nn.Linear(num_inputs, 30),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # 2nd hidden layer\n",
    "            torch.nn.Linear(30, 20),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # output layer\n",
    "            torch.nn.Linear(20, num_outputs),\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        logits = self.layers(x)\n",
    "        return logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "c5b59e2e-1930-456d-93b9-f69263e3adbe",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = NeuralNetwork(50, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "39d02a21-33e7-4879-8fd2-d6309faf2f8d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NeuralNetwork(\n",
      "  (layers): Sequential(\n",
      "    (0): Linear(in_features=50, out_features=30, bias=True)\n",
      "    (1): ReLU()\n",
      "    (2): Linear(in_features=30, out_features=20, bias=True)\n",
      "    (3): ReLU()\n",
      "    (4): Linear(in_features=20, out_features=3, bias=True)\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "94535738-de02-4c2a-9b44-1cd186fa990a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of trainable model parameters: 2213\n"
     ]
    }
   ],
   "source": [
    "num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
    "print(\"Total number of trainable model parameters:\", num_params)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "2c394106-ad71-4ccb-a3c9-9b60af3fa748",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameter containing:\n",
      "tensor([[ 0.0979,  0.0412,  0.1005,  ..., -0.0544, -0.0804,  0.0842],\n",
      "        [-0.0115,  0.0382, -0.0261,  ...,  0.0573,  0.1094,  0.1364],\n",
      "        [ 0.0162, -0.0050,  0.0752,  ...,  0.1298,  0.1250, -0.0117],\n",
      "        ...,\n",
      "        [-0.0312,  0.1319, -0.0954,  ..., -0.1066, -0.0970, -0.0373],\n",
      "        [ 0.0563, -0.1373, -0.1226,  ...,  0.0154, -0.0969,  0.0113],\n",
      "        [-0.0872, -0.0098,  0.0322,  ..., -0.0108,  0.1091, -0.1043]],\n",
      "       requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "print(model.layers[0].weight)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b201882b-9285-4db9-bb63-43afe6a2ff9e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parameter containing:\n",
      "tensor([[-0.0577,  0.0047, -0.0702,  ...,  0.0222,  0.1260,  0.0865],\n",
      "        [ 0.0502,  0.0307,  0.0333,  ...,  0.0951,  0.1134, -0.0297],\n",
      "        [ 0.1077, -0.1108,  0.0122,  ...,  0.0108, -0.1049, -0.1063],\n",
      "        ...,\n",
      "        [-0.0787,  0.1259,  0.0803,  ...,  0.1218,  0.1303, -0.1351],\n",
      "        [ 0.1359,  0.0175, -0.0673,  ...,  0.0674,  0.0676,  0.1058],\n",
      "        [ 0.0790,  0.1343, -0.0293,  ...,  0.0344, -0.0971, -0.0509]],\n",
      "       requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "torch.manual_seed(123)\n",
    "\n",
    "model = NeuralNetwork(50, 3)\n",
    "print(model.layers[0].weight)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "1da9a35e-44f3-460c-90fe-304519736fd6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([30, 50])\n"
     ]
    }
   ],
   "source": [
    "print(model.layers[0].weight.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "57eadbae-90fe-43a3-a33f-c23a095ba42a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[-0.1262,  0.1080, -0.1792]], grad_fn=<AddmmBackward0>)\n"
     ]
    }
   ],
   "source": [
    "torch.manual_seed(123)\n",
    "\n",
    "X = torch.rand((1, 50))\n",
    "out = model(X)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "48d720cb-ef73-4b7b-92e0-8198a072defd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[-0.1262,  0.1080, -0.1792]])\n"
     ]
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    out = model(X)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "10df3640-83c3-4061-a74d-08f07a5cc6ac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0.3113, 0.3934, 0.2952]])\n"
     ]
    }
   ],
   "source": [
    "with torch.no_grad():\n",
    "    out = torch.softmax(model(X), dim=1)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19858180-0f26-43a8-b2c3-7ed40abf9f85",
   "metadata": {},
   "source": [
    "## A.6 Setting up efficient data loaders"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f98d8fc-5618-47a2-bc72-153818972a24",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/10.webp\" width=\"600px\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "b9dc2745-8be8-4344-80ef-325f02cda7b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = torch.tensor([\n",
    "    [-1.2, 3.1],\n",
    "    [-0.9, 2.9],\n",
    "    [-0.5, 2.6],\n",
    "    [2.3, -1.1],\n",
    "    [2.7, -1.5]\n",
    "])\n",
    "\n",
    "y_train = torch.tensor([0, 0, 0, 1, 1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "88283948-5fca-461a-98a1-788b6be191d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test = torch.tensor([\n",
    "    [-0.8, 2.8],\n",
    "    [2.6, -1.6],\n",
    "])\n",
    "\n",
    "y_test = torch.tensor([0, 1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "edf323e2-1789-41a0-8e44-f3cab16e5f5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import Dataset\n",
    "\n",
    "\n",
    "class ToyDataset(Dataset):\n",
    "    def __init__(self, X, y):\n",
    "        self.features = X\n",
    "        self.labels = y\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        one_x = self.features[index]\n",
    "        one_y = self.labels[index]        \n",
    "        return one_x, one_y\n",
    "\n",
    "    def __len__(self):\n",
    "        return self.labels.shape[0]\n",
    "\n",
    "train_ds = ToyDataset(X_train, y_train)\n",
    "test_ds = ToyDataset(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "b7014705-1fdc-4f72-b892-d8db8bebc331",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(train_ds)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "3ec6627a-4c3f-481a-b794-d2131be95eaf",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import DataLoader\n",
    "\n",
    "torch.manual_seed(123)\n",
    "\n",
    "train_loader = DataLoader(\n",
    "    dataset=train_ds,\n",
    "    batch_size=2,\n",
    "    shuffle=True,\n",
    "    num_workers=0\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "8c9446de-5e4b-44fa-bf9a-a63e2661027e",
   "metadata": {},
   "outputs": [],
   "source": [
    "test_ds = ToyDataset(X_test, y_test)\n",
    "\n",
    "test_loader = DataLoader(\n",
    "    dataset=test_ds,\n",
    "    batch_size=2,\n",
    "    shuffle=False,\n",
    "    num_workers=0\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "99d4404c-9884-419f-979c-f659742d86ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch 1: tensor([[ 2.3000, -1.1000],\n",
      "        [-0.9000,  2.9000]]) tensor([1, 0])\n",
      "Batch 2: tensor([[-1.2000,  3.1000],\n",
      "        [-0.5000,  2.6000]]) tensor([0, 0])\n",
      "Batch 3: tensor([[ 2.7000, -1.5000]]) tensor([1])\n"
     ]
    }
   ],
   "source": [
    "for idx, (x, y) in enumerate(train_loader):\n",
    "    print(f\"Batch {idx+1}:\", x, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "9d003f7e-7a80-40bf-a7fb-7a0d7dbba9db",
   "metadata": {},
   "outputs": [],
   "source": [
    "train_loader = DataLoader(\n",
    "    dataset=train_ds,\n",
    "    batch_size=2,\n",
    "    shuffle=True,\n",
    "    num_workers=0,\n",
    "    drop_last=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "4db4d7f4-82da-44a4-b94e-ee04665d9c3c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Batch 1: tensor([[-1.2000,  3.1000],\n",
      "        [-0.5000,  2.6000]]) tensor([0, 0])\n",
      "Batch 2: tensor([[ 2.3000, -1.1000],\n",
      "        [-0.9000,  2.9000]]) tensor([1, 0])\n"
     ]
    }
   ],
   "source": [
    "for idx, (x, y) in enumerate(train_loader):\n",
    "    print(f\"Batch {idx+1}:\", x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb03ed57-df38-4ee0-a553-0863450df39b",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/11.webp\" width=\"600px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d904ca82-e50f-4f3d-a3ac-fc6ca53dd00e",
   "metadata": {},
   "source": [
    "## A.7 A typical training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "93f1791a-d887-4fc5-a307-5e5bde9e06f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 001/003 | Batch 001/002 | Train/Val Loss: 0.75\n",
      "Epoch: 001/003 | Batch 002/002 | Train/Val Loss: 0.65\n",
      "Epoch: 002/003 | Batch 001/002 | Train/Val Loss: 0.44\n",
      "Epoch: 002/003 | Batch 002/002 | Train/Val Loss: 0.13\n",
      "Epoch: 003/003 | Batch 001/002 | Train/Val Loss: 0.03\n",
      "Epoch: 003/003 | Batch 002/002 | Train/Val Loss: 0.00\n"
     ]
    }
   ],
   "source": [
    "import torch.nn.functional as F\n",
    "\n",
    "\n",
    "torch.manual_seed(123)\n",
    "model = NeuralNetwork(num_inputs=2, num_outputs=2)\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.5)\n",
    "\n",
    "num_epochs = 3\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    \n",
    "    model.train()\n",
    "    for batch_idx, (features, labels) in enumerate(train_loader):\n",
    "\n",
    "        logits = model(features)\n",
    "        \n",
    "        loss = F.cross_entropy(logits, labels) # Loss function\n",
    "        \n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "    \n",
    "        ### LOGGING\n",
    "        print(f\"Epoch: {epoch+1:03d}/{num_epochs:03d}\"\n",
    "              f\" | Batch {batch_idx+1:03d}/{len(train_loader):03d}\"\n",
    "              f\" | Train/Val Loss: {loss:.2f}\")\n",
    "\n",
    "    model.eval()\n",
    "    # Optional model evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "00dcf57f-6a7e-4af7-aa5a-df2cb0866fa5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 2.8569, -4.1618],\n",
      "        [ 2.5382, -3.7548],\n",
      "        [ 2.0944, -3.1820],\n",
      "        [-1.4814,  1.4816],\n",
      "        [-1.7176,  1.7342]])\n"
     ]
    }
   ],
   "source": [
    "model.eval()\n",
    "\n",
    "with torch.no_grad():\n",
    "    outputs = model(X_train)\n",
    "\n",
    "print(outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "19be7390-18b8-43f9-9841-d7fb1919f6fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0.9991, 0.0009],\n",
      "        [0.9982, 0.0018],\n",
      "        [0.9949, 0.0051],\n",
      "        [0.0491, 0.9509],\n",
      "        [0.0307, 0.9693]])\n",
      "tensor([0, 0, 0, 1, 1])\n"
     ]
    }
   ],
   "source": [
    "torch.set_printoptions(sci_mode=False)\n",
    "probas = torch.softmax(outputs, dim=1)\n",
    "print(probas)\n",
    "\n",
    "predictions = torch.argmax(probas, dim=1)\n",
    "print(predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "07e7e530-f8d3-429c-9f5e-cf8078078c0e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([0, 0, 0, 1, 1])\n"
     ]
    }
   ],
   "source": [
    "predictions = torch.argmax(outputs, dim=1)\n",
    "print(predictions)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "5f756f0d-63c8-41b5-a5d8-01baa847e026",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([True, True, True, True, True])"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "predictions == y_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "da274bb0-f11c-4c81-a880-7a031fbf2943",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(5)"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.sum(predictions == y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "16d62314-8dee-45b0-8f55-9e5aae2b24f4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_accuracy(model, dataloader):\n",
    "\n",
    "    model.eval()\n",
    "    correct = 0.0\n",
    "    total_examples = 0\n",
    "    \n",
    "    for idx, (features, labels) in enumerate(dataloader):\n",
    "        \n",
    "        with torch.no_grad():\n",
    "            logits = model(features)\n",
    "        \n",
    "        predictions = torch.argmax(logits, dim=1)\n",
    "        compare = labels == predictions\n",
    "        correct += torch.sum(compare)\n",
    "        total_examples += len(compare)\n",
    "\n",
    "    return (correct / total_examples).item()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "4f6c9c17-2a5f-46c0-804b-873f169b729a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_accuracy(model, train_loader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "311ed864-e21e-4aac-97c7-c6086caef27a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_accuracy(model, test_loader)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d5cd469-3a45-4394-944b-3ce543f41dac",
   "metadata": {},
   "source": [
    "## A.8 Saving and loading models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "b013127d-a2c3-4b04-9fb3-a6a7c88d83c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.save(model.state_dict(), \"model.pth\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "b2b428c2-3a44-4d91-97c4-8298cf2b51eb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<All keys matched successfully>"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = NeuralNetwork(2, 2) # needs to match the original model exactly\n",
    "model.load_state_dict(torch.load(\"model.pth\", weights_only=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f891c013-43da-4a05-973d-997be313d2d8",
   "metadata": {},
   "source": [
    "## A.9 Optimizing training performance with GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e68ae888-cabf-49c9-bad6-ecdce774db57",
   "metadata": {},
   "source": [
    "### A.9.1 PyTorch computations on GPU devices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "141c845f-efe3-4614-b376-b8b7a9a2c887",
   "metadata": {},
   "source": [
    "See [code-part2.ipynb](code-part2.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99811829-b817-42ea-b03e-d35374debcc0",
   "metadata": {},
   "source": [
    "### A.9.2 Single-GPU training"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b21456c-4af7-440f-9e78-37770277b5bc",
   "metadata": {},
   "source": [
    "See [code-part2.ipynb](code-part2.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db6eb2d1-a341-4489-b04b-635c26945333",
   "metadata": {},
   "source": [
    "### A.9.3 Training with multiple GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d049a81-5fb0-49b5-9d6a-17a9976d8520",
   "metadata": {},
   "source": [
    "See [DDP-script.py](DDP-script.py)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: appendix-A/01_main-chapter-code/code-part2.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AAAnDw04iAm4"
   },
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "O9i6kzBsZVaZ"
   },
   "source": [
    "# Appendix A: Introduction to PyTorch (Part 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ppbG5d-NZezH"
   },
   "source": [
    "## A.9 Optimizing training performance with GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6jH0J_DPZhbn"
   },
   "source": [
    "### A.9.1 PyTorch computations on GPU devices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RM7kGhwMF_nO",
    "outputId": "b1872617-aacd-46fa-e5f3-f130fd81b246"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.4.0+cu121\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "print(torch.__version__)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "OXLCKXhiUkZt",
    "outputId": "e9ca3c58-d92c-4c8b-a9c9-cd7fcc1fedb4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "print(torch.cuda.is_available())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "MTTlfh53Va-T",
    "outputId": "bae76cb5-d1d3-441f-a7c5-93a161e2e86a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([5., 7., 9.])\n"
     ]
    }
   ],
   "source": [
    "tensor_1 = torch.tensor([1., 2., 3.])\n",
    "tensor_2 = torch.tensor([4., 5., 6.])\n",
    "\n",
    "print(tensor_1 + tensor_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Z4LwTNw7Vmmb",
    "outputId": "9ad97923-bc8e-4c49-88bf-48dc1de56804"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([5., 7., 9.], device='cuda:0')\n"
     ]
    }
   ],
   "source": [
    "tensor_1 = tensor_1.to(\"cuda\")\n",
    "tensor_2 = tensor_2.to(\"cuda\")\n",
    "\n",
    "print(tensor_1 + tensor_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 158
    },
    "id": "tKT6URN1Vuft",
    "outputId": "8396eb18-47c8-47a1-c1b6-8bcb9480fb52"
   },
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m/tmp/ipykernel_2321/2079609735.py\u001b[0m in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mtensor_1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtensor_1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mto\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"cpu\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtensor_1\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mtensor_2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"
     ]
    }
   ],
   "source": [
    "tensor_1 = tensor_1.to(\"cpu\")\n",
    "print(tensor_1 + tensor_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c8j1cWDcWAMf"
   },
   "source": [
    "### A.9.2 Single-GPU training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "GyY59cjieitv"
   },
   "outputs": [],
   "source": [
    "X_train = torch.tensor([\n",
    "    [-1.2, 3.1],\n",
    "    [-0.9, 2.9],\n",
    "    [-0.5, 2.6],\n",
    "    [2.3, -1.1],\n",
    "    [2.7, -1.5]\n",
    "])\n",
    "\n",
    "y_train = torch.tensor([0, 0, 0, 1, 1])\n",
    "\n",
    "X_test = torch.tensor([\n",
    "    [-0.8, 2.8],\n",
    "    [2.6, -1.6],\n",
    "])\n",
    "\n",
    "y_test = torch.tensor([0, 1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "id": "v41gKqEJempa"
   },
   "outputs": [],
   "source": [
    "from torch.utils.data import Dataset\n",
    "\n",
    "\n",
    "class ToyDataset(Dataset):\n",
    "    def __init__(self, X, y):\n",
    "        self.features = X\n",
    "        self.labels = y\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        one_x = self.features[index]\n",
    "        one_y = self.labels[index]\n",
    "        return one_x, one_y\n",
    "\n",
    "    def __len__(self):\n",
    "        return self.labels.shape[0]\n",
    "\n",
    "train_ds = ToyDataset(X_train, y_train)\n",
    "test_ds = ToyDataset(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "UPGVRuylep8Y"
   },
   "outputs": [],
   "source": [
    "from torch.utils.data import DataLoader\n",
    "\n",
    "torch.manual_seed(123)\n",
    "\n",
    "train_loader = DataLoader(\n",
    "    dataset=train_ds,\n",
    "    batch_size=2,\n",
    "    shuffle=True,\n",
    "    num_workers=1,\n",
    "    drop_last=True\n",
    ")\n",
    "\n",
    "test_loader = DataLoader(\n",
    "    dataset=test_ds,\n",
    "    batch_size=2,\n",
    "    shuffle=False,\n",
    "    num_workers=1\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "id": "drhg6IXofAXh"
   },
   "outputs": [],
   "source": [
    "class NeuralNetwork(torch.nn.Module):\n",
    "    def __init__(self, num_inputs, num_outputs):\n",
    "        super().__init__()\n",
    "\n",
    "        self.layers = torch.nn.Sequential(\n",
    "\n",
    "            # 1st hidden layer\n",
    "            torch.nn.Linear(num_inputs, 30),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # 2nd hidden layer\n",
    "            torch.nn.Linear(30, 20),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # output layer\n",
    "            torch.nn.Linear(20, num_outputs),\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        logits = self.layers(x)\n",
    "        return logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "7jaS5sqPWCY0",
    "outputId": "8a5cd93d-671c-4abf-d5cd-97845f300ffd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 001/003 | Batch 001/002 | Train/Val Loss: 0.75\n",
      "Epoch: 001/003 | Batch 002/002 | Train/Val Loss: 0.65\n",
      "Epoch: 002/003 | Batch 001/002 | Train/Val Loss: 0.44\n",
      "Epoch: 002/003 | Batch 002/002 | Train/Val Loss: 0.13\n",
      "Epoch: 003/003 | Batch 001/002 | Train/Val Loss: 0.03\n",
      "Epoch: 003/003 | Batch 002/002 | Train/Val Loss: 0.00\n"
     ]
    }
   ],
   "source": [
    "import torch.nn.functional as F\n",
    "\n",
    "\n",
    "torch.manual_seed(123)\n",
    "model = NeuralNetwork(num_inputs=2, num_outputs=2)\n",
    "\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\") # NEW\n",
    "model.to(device) # NEW\n",
    "\n",
    "# Note that the book originally used the following line, but the \"model =\" is redundant\n",
    "# model = model.to(device) # NEW\n",
    "\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.5)\n",
    "\n",
    "num_epochs = 3\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "\n",
    "    model.train()\n",
    "    for batch_idx, (features, labels) in enumerate(train_loader):\n",
    "\n",
    "        features, labels = features.to(device), labels.to(device) # NEW\n",
    "        logits = model(features)\n",
    "        loss = F.cross_entropy(logits, labels) # Loss function\n",
    "\n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        ### LOGGING\n",
    "        print(f\"Epoch: {epoch+1:03d}/{num_epochs:03d}\"\n",
    "              f\" | Batch {batch_idx+1:03d}/{len(train_loader):03d}\"\n",
    "              f\" | Train/Val Loss: {loss:.2f}\")\n",
    "\n",
    "    model.eval()\n",
    "    # Optional model evaluation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "id": "4qrlmnPPe7FO"
   },
   "outputs": [],
   "source": [
    "def compute_accuracy(model, dataloader, device):\n",
    "\n",
    "    model = model.eval()\n",
    "    correct = 0.0\n",
    "    total_examples = 0\n",
    "\n",
    "    for idx, (features, labels) in enumerate(dataloader):\n",
    "\n",
    "        features, labels = features.to(device), labels.to(device) # New\n",
    "\n",
    "        with torch.no_grad():\n",
    "            logits = model(features)\n",
    "\n",
    "        predictions = torch.argmax(logits, dim=1)\n",
    "        compare = labels == predictions\n",
    "        correct += torch.sum(compare)\n",
    "        total_examples += len(compare)\n",
    "\n",
    "    return (correct / total_examples).item()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "1_-BfkfEf4HX",
    "outputId": "9453154f-0a5b-4a44-a3c9-f010e08d5a2c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_accuracy(model, train_loader, device=device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "iYtXKBGEgKss",
    "outputId": "d6cc870a-34de-490e-e5d3-23e6956744bd"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_accuracy(model, test_loader, device=device)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "nc2LGFVbiAnB"
   },
   "source": [
    "### A.9.3 Training with multiple GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "cOUza9iQiAnC"
   },
   "source": [
    "See [DDP-script.py](DDP-script.py)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YOYk5Fh7iAnC"
   },
   "source": [
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/12.webp\" width=\"600px\">\n",
    "<img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/appendix-a_compressed/13.webp\" width=\"600px\">"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: appendix-A/01_main-chapter-code/exercise-solutions.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise A.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [Python Setup Tips](../../setup/01_optional-python-setup-preferences/README.md) document in this repository contains additional recommendations and tips to set up your Python environment.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise A.2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [Installing Libraries Used In This Book document](../../setup/02_installing-python-libraries/README.md) and [directory](../../setup/02_installing-python-libraries/) contains utilities to check whether your environment is set up correctly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise A.3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "class NeuralNetwork(torch.nn.Module):\n",
    "    def __init__(self, num_inputs, num_outputs):\n",
    "        super().__init__()\n",
    "\n",
    "        self.layers = torch.nn.Sequential(\n",
    "                \n",
    "            # 1st hidden layer\n",
    "            torch.nn.Linear(num_inputs, 30),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # 2nd hidden layer\n",
    "            torch.nn.Linear(30, 20),\n",
    "            torch.nn.ReLU(),\n",
    "\n",
    "            # output layer\n",
    "            torch.nn.Linear(20, num_outputs),\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        logits = self.layers(x)\n",
    "        return logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total number of trainable model parameters: 752\n"
     ]
    }
   ],
   "source": [
    "model = NeuralNetwork(2, 2)\n",
    "\n",
    "num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
    "print(\"Total number of trainable model parameters:\", num_params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise A.4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "qGgnamiyLJxp"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "a = torch.rand(100, 200)\n",
    "b = torch.rand(200, 300)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "CvGvIeVkLzXE",
    "outputId": "44d027be-0787-4348-9c06-4e559d94d0e1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "63.8 µs ± 8.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit a @ b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "OmRtZLa9L2ZG"
   },
   "outputs": [],
   "source": [
    "a, b = a.to(\"cuda\"), b.to(\"cuda\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "duLEhXDPL6k0",
    "outputId": "3486471d-fd62-446f-9855-2d01f41fd101"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "13.8 µs ± 425 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
     ]
    }
   ],
   "source": [
    "%timeit a @ b"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "V100",
   "machine_shape": "hm",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}


================================================
FILE: appendix-A/02_setup-recommendations/README.md
================================================
## Python and Environment Setup Recommendations



Please see the [README.md](../../setup/README.md) in the [setup](../../setup) directory for Python installation and setup recommendations.





================================================
FILE: appendix-A/README.md
================================================
# Appendix A: Introduction to PyTorch

&nbsp;
## Main Chapter Code

- [01_main-chapter-code](01_main-chapter-code) contains the main chapter code

&nbsp;
## Bonus Materials

- [02_setup-recommendations](02_setup-recommendations) contains Python installation and setup recommendations.

================================================
FILE: appendix-B/README.md
================================================
# Appendix B: References and Further Reading



- No code in this appendix

================================================
FILE: appendix-C/README.md
================================================
# Appendix C: Exercise Solutions



- [Chapter 2 exercise solutions](../ch02/01_main-chapter-code/exercise-solutions.ipynb)
- [Chapter 3 exercise solutions](../ch03/01_main-chapter-code/exercise-solutions.ipynb)
- [Chapter 4 exercise solutions](../ch04/01_main-chapter-code/exercise-solutions.ipynb)
- [Chapter 5 exercise solutions](../ch05/01_main-chapter-code/exercise-solutions.ipynb)
- [Chapter 6 exercise solutions](../ch06/01_main-chapter-code/exercise-solutions.ipynb)
- [Chapter 7 exercise solutions](../ch07/01_main-chapter-code/exercise-solutions.ipynb)

================================================
FILE: appendix-D/01_main-chapter-code/appendix-D.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9a5936bd-af17-4a7e-a4d2-e910411708ea",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"http://mng.bz/orYv\">Build a Large Language Model From Scratch</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/LLMs-from-scratch\">https://github.com/rasbt/LLMs-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af53bcb1-ff9d-49c7-a0bc-5b8d32ff975b",
   "metadata": {},
   "source": [
    "## Appendix D: Adding Bells and Whistles to the Training Loop"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f58c142-9434-49af-b33a-356b80a45b86",
   "metadata": {},
   "source": [
    "- In this appendix, we add a few more advanced features to the training function, which are used in typical pretraining and finetuning; finetuning is covered in chapters 6 and 7\n",
    "- The next three sections below discuss learning rate warmup, cosine decay, and gradient clipping\n",
    "- The final section adds these techniques to the training function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "744def4f-c03f-42ee-97bb-5d7d5b89b723",
   "metadata": {},
   "source": [
    "- We start by initializing a model reusing the code from chapter 5:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8755bd5e-bc06-4e6e-9e63-c7c82b816cbe",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch version: 2.9.0\n"
     ]
    }
   ],
   "source": [
    "from importlib.metadata import version\n",
    "import torch\n",
    "\n",
    "print(\"torch version:\", version(\"torch\"))\n",
    "\n",
    "\n",
    "from previous_chapters import GPTModel\n",
    "# If the `previous_chapters.py` file is not available locally,\n",
    "# you can import it from the `llms-from-scratch` PyPI package.\n",
    "# For details, see: https://github.com/rasbt/LLMs-from-scratch/tree/main/pkg\n",
    "# E.g.,\n",
    "# from llms_from_scratch.ch04 import GPTModel\n",
    "\n",
    "GPT_CONFIG_124M = {\n",
    "    \"vocab_size\": 50257,   # Vocabulary size\n",
    "    \"context_length\": 256, # Shortened context length (orig: 1024)\n",
    "    \"emb_dim\": 768,        # Embedding dimension\n",
    "    \"n_heads\": 12,         # Number of attention heads\n",
    "    \"n_layers\": 12,        # Number of layers\n",
    "    \"drop_rate\": 0.1,      # Dropout rate\n",
    "    \"qkv_bias\": False      # Query-key-value bias\n",
    "}\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    device = torch.device(\"cuda\")\n",
    "elif torch.backends.mps.is_available():\n",
    "    # Use PyTorch 2.9 or newer for stable mps results\n",
    "    major, minor = map(int, torch.__version__.split(\".\")[:2])\n",
    "    if (major, minor) >= (2, 9):\n",
    "        device = torch.device(\"mps\")\n",
    "    else:\n",
    "        device = torch.device(\"cpu\")\n",
    "else:\n",
    "    device = torch.device(\"cpu\")\n",
    "\n",
    "print(\"Device:\", device)\n",
    "\n",
    "torch.manual_seed(123)\n",
    "model = GPTModel(GPT_CONFIG_124M)\n",
    "model.eval();  # Disable dropout during inference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51574e57-a098-412c-83e8-66dafa5a0b99",
   "metadata": {},
   "source": [
    "- Next, using the same code we used in chapter 5, we initialize the data loaders:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "386ca110-2bb4-42f1-bd54-8836df80acaa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nimport os\\nimport urllib.request\\n\\nif not os.path.exists(file_path):\\n    with urllib.request.urlopen(url) as response:\\n        text_data = response.read().decode(\\'utf-8\\')\\n    with open(file_path, \"w\", encoding=\"utf-8\") as file:\\n        file.write(text_data)\\nelse:\\n    with open(file_path, \"r\", encoding=\"utf-8\") as file:\\n        text_data = file.read()\\n'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import requests\n",
    "\n",
    "file_path = \"the-verdict.txt\"\n",
    "url = \"https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt\"\n",
    "\n",
    "if not os.path.exists(file_path):\n",
    "    response = requests.get(url, timeout=30)\n",
    "    response.raise_for_status()\n",
    "    text_data = response.text\n",
    "    with open(file_path, \"w\", encoding=\"utf-8\") as file:\n",
    "        file.write(text_data)\n",
    "else:\n",
    "    with open(file_path, \"r\", encoding=\"utf-8\") as file:\n",
    "        text_data = file.read()\n",
    "\n",
    "# The book originally used the following code below\n",
    "# However, urllib uses older protocol settings that\n",
    "# can cause problems for some readers using a VPN.\n",
    "# The `requests` version above is more robust\n",
    "# in that regard.\n",
    "\n",
    "\"\"\"\n",
    "import os\n",
    "import urllib.request\n",
    "\n",
    "if not os.path.exists(file_path):\n",
    "    with urllib.request.urlopen(url) as response:\n",
    "        text_data = response.read().decode('utf-8')\n",
    "    with open(file_path, \"w\", encoding=\"utf-8\") as file:\n",
    "        file.write(text_data)\n",
    "else:\n",
    "    with open(file_path, \"r\", encoding=\"utf-8\") as file:\n",
    "        text_data = file.read()\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ae96992b-536a-4684-a924-658b9ffb7e9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from previous_chapters import create_dataloader_v1\n",
    "# Alternatively:\n",
    "# from llms_from_scratch.ch02 import create_dataloader_v1\n",
    "\n",
    "\n",
    "# Train/validation ratio\n",
    "train_ratio = 0.90\n",
    "split_idx = int(train_ratio * len(text_data))\n",
    "\n",
    "\n",
    "torch.manual_seed(123)\n",
    "\n",
    "train_loader = create_dataloader_v1(\n",
    "    text_data[:split_idx],\n",
    "    batch_size=2,\n",
    "    max_length=GPT_CONFIG_124M[\"context_length\"],\n",
    "    stride=GPT_CONFIG_124M[\"context_length\"],\n",
    "    drop_last=True,\n",
    "    shuffle=True,\n",
    "    num_workers=0\n",
    ")\n",
    "\n",
    "val_loader = create_dataloader_v1(\n",
    "    text_data[split_idx:],\n",
    "    batch_size=2,\n",
    "    max_length=GPT_CONFIG_124M[\"context_length\"],\n",
    "    stride=GPT_CONFIG_124M[\"context_length\"],\n",
    "    drop_last=False,\n",
    "    shuffle=False,\n",
    "    num_workers=0\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "939c08d8-257a-41c6-b842-019f7897ac74",
   "metadata": {},
   "source": [
    "## D.1 Learning rate warmup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7fafcd30-ddf7-4a9f-bcf4-b13c052b3133",
   "metadata": {},
   "source": [
    "- When training complex models like LLMs, implementing learning rate warmup can help stabilize the training\n",
    "- In learning rate warmup, we gradually increase the learning rate from a very low value (`initial_lr`) to a user-specified maximum (`peak_lr`)\n",
    "- This way, the model will start the training with small weight updates, which helps decrease the risk of large destabilizing updates during the training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2bb4790b-b8b6-4e9e-adf4-704a04b31ddf",
   "metadata": {},
   "outputs": [],
   "source": [
    "n_epochs = 15\n",
    "initial_lr = 0.0001\n",
    "peak_lr = 0.01"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bf3a8da-abc4-4b80-a5d8-f1cc1c7cc5f3",
   "metadata": {},
   "source": [
    "- Typically, the number of warmup steps is between 0.1% to 20% of the total number of steps\n",
    "- We can compute the increment as the difference between the `peak_lr` and `initial_lr` divided by the number of warmup steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "5f6d083f-1b25-4c23-b46d-ef7783446690",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "27\n"
     ]
    }
   ],
   "source": [
    "total_steps = len(train_loader) * n_epochs\n",
    "warmup_steps = int(0.2 * total_steps) # 20% warmup\n",
    "print(warmup_steps)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b6bbdc8-0104-459e-a7ed-b08be8578709",
   "metadata": {},
   "source": [
    "- Note that the print book accidentally includes a leftover code line, `warmup_steps = 20`, which is not used and can be safely ignored"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e075f80e-a398-4809-be1d-8019e1d31c90",
   "metadata": {},
   "outputs": [],
   "source": [
    "lr_increment = (peak_lr - initial_lr) / warmup_steps\n",
    "\n",
    "global_step = -1\n",
    "track_lrs = []\n",
    "\n",
    "optimizer = torch.optim.AdamW(model.parameters(), weight_decay=0.1)\n",
    "\n",
    "for epoch in range(n_epochs):\n",
    "    for input_batch, target_batch in train_loader:\n",
    "        optimizer.zero_grad()\n",
    "        global_step += 1\n",
    "    \n",
    "        if global_step < warmup_steps:\n",
    "            lr = initial_lr + global_step * lr_increment\n",
    "        else:\n",
    "            lr = peak_lr\n",
    "        \n",
    "        # Apply the calculated learning rate to the optimizer\n",
    "        for param_group in optimizer.param_groups:\n",
    "            param_group[\"lr\"] = lr\n",
    "        track_lrs.append(optimizer.param_groups[0][\"lr\"])\n",
    "    \n",
    "        # Calculate loss and update weights\n",
    "        # ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cb6da121-eeed-4023-bdd8-3666c594b4ed",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAekAAAEiCAYAAADd4SrgAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAANRFJREFUeJzt3Ql4U1XaB/C3e6F0oS10YytQLVBoFWwt9htEliJlE5Xl4xNEhsWpyjIjyq5+OBWQkYFBgdEPnGdksY4i7RS0grJDWUqhlE0piy1doQst3ZL7Pe8pySSQ1hbS3nuT/+958iT35iQ5N2nz5pz7nnNsJEmSCAAAABTHVu4KAAAAgGkI0gAAAAqFIA0AAKBQCNIAAAAKhSANAACgUAjSAAAACoUgDQAAoFAI0gAAAAplL3cF1Eqr1VJ2dja5urqSjY2N3NUBAACZ8dxgpaWl5O/vT7a25mkDI0g/IA7Q7du3N8uHAAAAluP69evUrl07szwXgvQD4ha07sNwc3Mzy4cBAADqVVJSIhpvuvhgDgjSD0jXxc0BGkEaAAB0zHkKFIljAAAACoUgDQAAoFAI0gAAAAole5Beu3YtderUiZydnSkiIoJSUlLqLR8fH0/BwcGifM+ePSkpKcno/q+//poGDx5MXl5e4rzAqVOn7nuOiooKio2NFWVatWpFzz//POXm5pr92AAAAFQbpLdt20Zz5syhJUuW0MmTJyk0NJSio6MpLy/PZPlDhw7R+PHjacqUKZSamkqjRo0Sl/T0dH2ZsrIyioqKomXLltX5urNnz6aEhAQR8Pfu3SuGU40ePbpJjhEAAOBB2Ug8+lom3HJ+4okn6G9/+5t+ghBOX3/99dfp7bffvq/82LFjRRBOTEzU73vyyScpLCyM1q1bZ1T2ypUrFBgYKII5369TXFxMbdq0oc2bN9MLL7wg9p0/f566detGhw8fFs/X0FR7d3d38XzI7gYAgJImiAuyDcGqqqqiEydO0Lx58/T7eIaWgQMHimBpCu/nlrchbnlv3769wa/Lr1ldXS1eR4e7zzt06FBvkK6srBQXww8Dmo5WK9H7SefofA7eZwBoeva2tvT5K+GkNLIF6YKCAtJoNOTj42O0n7e5ZWtKTk6OyfK8v6G4rKOjI3l4eDTqeeLi4ujdd99t8OvAwzlyuZA+O5CJtxEAmoWjnewpWiZhMpMG4ha/YSteN7MMNI0dadni+pngtjQyzB9vMwA0KVuFrsEgW5D29vYmOzu7+7KqedvX19fkY3h/Y8rX9Rzc1V5UVGTUmv6t53FychIXaHpVNVramV7bq/H7qEDq29UbbzsAWCXZ2vfc5dy7d2/avXu3fh8njvF2ZGSkycfwfsPyLDk5uc7ypvBrOjg4GD3PhQsX6Nq1a416Hmg6+y/lU/Gdamrr6kQRnb3wVgOA1ZK1u5u7jydNmkR9+vSh8PBwWrVqlcjenjx5srh/4sSJFBAQIM4Hs5kzZ1K/fv1o5cqVFBMTQ1u3bqXjx4/Thg0b9M958+ZNEXB5WJUuADNuJfOFM+94CBe/tqenp8jA42xyDtANzeyG5unqjunlR3a2yuyCAgCw+CDNQ6ry8/Np8eLFImmLh0rt2rVLnxzGwdZwTc6+ffuKoVMLFy6k+fPnU1BQkMjsDgkJ0ZfZsWOHPsizcePGiWsei/3OO++I2x999JF4Xp7EhDO2OUP8448/bsYjh7rcqdJQckbtKY3hoTgXDQDWTdZx0mqGcdJNIyEtm17fkkrtPVvQvjf7m3U1GQAAtcUFZeacg1UHaTa8lz8CNABYPQRpUAxOFvvpQr64PQLDrgAAEKRBOb47m0NVGi0FtW1Fj/q4yl0dAADZoSUNiuvqHhGKrm4AAIYgDYpQcLuSDv1SKG4jqxsAoBaCNChC0pkbpNFK1KudO3XydpG7OgAAioAgDYrr6gYAgFoI0iC7rKI7dOzKLeIh0cN6IUgDAOggSIPsEu+2op/o5Em+7s5yVwcAQDEQpEExc3WjqxsAwBiCNMjql/zbdDa7hOxtbWhoTz98GgAABhCkQREJY1FB3uTp4ohPAwDAAII0yIbXdtlhMFc3AAAYQ5AG2WTcKKHL+WXkZG9Lg3vULk8KAAD/gSANstG1op8Jbkuuzg74JAAA7oEgDbLQaiVKTLshbmMaUAAA0xCkQRap12+JSUxaOdmLljQAANwPQRpkseNUbVf34O4+5Oxgh08BAMAEBGlodjUaLf37DLq6AQB+C4I0NLsjl29Swe0qat3SQYyPBgAA0xCkodntSMsS18/29CMHO/wJAgDUBd+Q0KwqazS0Mz1H3MZc3QAA9UOQhma190I+lVbUkI+bk1j1CgAA6oYgDc0q4XRtwhivG21na4N3HwCgHgjS0GzKq2roh4xccRtd3QAAvw1BGppNckYu3anWUEevltSrnTveeQCA34AgDc0mQTcNaC9/srFBVzcAwG9BkIZmUVxeTXsv5onbI8KwLCUAQEMgSEOz2HX2BlVrJHrUx5Ue8XHFuw4A0AAI0tCsXd1oRQMAqChIr127ljp16kTOzs4UERFBKSkp9ZaPj4+n4OBgUb5nz56UlJRkdL8kSbR48WLy8/OjFi1a0MCBA+nSpUtGZS5evEgjR44kb29vcnNzo6ioKPrxxx+b5PiAKK+0gg79UqA/Hw0AACoI0tu2baM5c+bQkiVL6OTJkxQaGkrR0dGUl1d77vJehw4dovHjx9OUKVMoNTWVRo0aJS7p6en6MsuXL6fVq1fTunXr6OjRo+Ti4iKes6KiQl9m2LBhVFNTQ3v27KETJ06I1+V9OTm1M2GBeSWdvkFaiSisvQd18GqJtxcAoKEkGYWHh0uxsbH6bY1GI/n7+0txcXEmy48ZM0aKiYkx2hcRESFNnz5d3NZqtZKvr6+0YsUK/f1FRUWSk5OTtGXLFrGdn58v8WHv27dPX6akpETsS05ObnDdi4uLxWP4Gur33NoDUse3EqVP91/GWwUAFqu4CeKCbC3pqqoq0Yrl7mgdW1tbsX348GGTj+H9huUZt5J15TMzM0Vr2LCMu7u76EbXlfHy8qJHH32U/vGPf1BZWZloUa9fv57atm1LvXv3bqKjtV7Xb5bTyWtFxCOuhvXyk7s6AACqYi/XCxcUFJBGoyEfHx+j/bx9/vx5k4/hAGyqvK6bWnddXxken/vDDz+IbnJXV1fxw4AD9K5du6h169Z11reyslJcdEpKShp9zNYo8e40oE8GepGPm7Pc1QEAUBXZE8eaGyeWxcbGisC8f/9+kajGAXv48OF040ZtQDElLi5OtMp1l/bt2zdrvdVqR1q2uB4eioQxAADVBGnOrLazs6Pc3Nq5nHV429fX1+RjeH995XXX9ZXhZLHExETaunUrPfXUU/T444/Txx9/LDLBP//88zrrO2/ePCouLtZfrl+//oBHbj1+ziulczdKyN7Whp4NMf2ZAgCAAoO0o6OjOAe8e/du/T6tViu2IyMjTT6G9xuWZ8nJyfrygYGBIhgbluFuac7y1pUpLy8X19zNbYi3+fXr4uTkJIZrGV6gfjvujo3+3SNtqLWLI94uAAC1nJNmPPxq0qRJ1KdPHwoPD6dVq1aJZK7JkyeL+ydOnEgBAQGiq5nNnDmT+vXrRytXrqSYmBjRGj5+/Dht2LBBf7551qxZtHTpUgoKChJBe9GiReTv7y+6tBkHaz73zK/L46m5Bf33v/9dJJ3xc4L5Tisk6Lu6kTAGAKC6ID127FjKz88XwZITu8LCwkQCly7x69q1a0Yt3r59+9LmzZtp4cKFNH/+fBGIt2/fTiEhIfoyc+fOFYF+2rRpVFRUJCYq4efkyU903ey8vWDBAnrmmWeourqaevToQd9++60YLw3mcTa7hDILysjJ3pYGdUdXNwDAg7DhcVh46xqPu9E5gYzPT6Pr+35/TjpHG/ZdppiefrR2wuP4EwMAi1fSBHHB6rK7oelptYZd3cjqBgB4UAjSYHbHr96iG8UV5OpkT08/2gbvMADAA0KQBrPTtaIH9/AlZwc7vMMAAA8IQRrMqkajpaQzWJYSAMAcEKTBrA7+UkiFZVXk6eJIfbt44d0FAHgICNLQJF3dQ3v6koMd/rwAAB4GvkXBbCqqNfRdeu1CJiNCA/DOAgA8JARpMJufLuRTaWUN+bk7U5+Oda8oBgAADYMgDWaTcLq2q5vXjba1tcE7CwDwkBCkwSzKKmto97na1cfQ1Q0AIGOQrqmpoR9++IHWr19PpaWlYl92djbdvn3bTNUCtUnOyKWKai0FertQSABWCAMAkGWBjatXr9KQIUPE4heVlZU0aNAgcnV1pWXLlontdevWmaVioC47dNOA9vITq5EBAIAMLWleLpKXlrx165ZY5lHnueeeu2+tZ7AOReVVtO9ivrg9IgxzdQMAyNaS3r9/Px06dIgcHR2N9nfq1ImysrLMVjFQj53pOVSjlaibnxt1besqd3UAAKy3Ja3Vakmj0dy3/9dffxXd3mB9dpzSrXjlJ3dVAACsO0gPHjyYVq1apd/m84+cMLZkyRIaOnSouesHCpdXUkFHMgvF7eG90NUNACBrd/fKlSspOjqaunfvThUVFfTf//3fdOnSJfL29qYtW7aYtXKgfImnb5AkET3ewYPae7aUuzoAANYdpNu1a0dpaWm0bds2cc2t6ClTptCECROMEsnAyrK6Q9GKBgCQPUjv27eP+vbtK4IyXwzHTvN9v/vd78xdR1Co6zfL6dT1IuLJxWJ64Xw0AIDs56T79+9PN2/evG9/cXGxuA+srxUd2cWL2ro6y10dAACL0+ggLUmSyckqCgsLycXFxVz1AhUtS4mEMQAAmbu7R48eLa45QL/88svk5OSkv4+HZJ0+fVp0g4N1uJhbSudzSsnBzoaeDUFXNwCArEHa3d1d35Lm8dCGSWI8scmTTz5JU6dObZJKgnJb0f0eaUPuLR3krg4AgHUH6Y0bN+pnFvvTn/6Erm0rxj/UkNUNAKDA7G6etASs2+lfi+lqYTk5O9jSwG4+clcHAMBiNTpIs6+++oq+/PJLsRJWVVWV0X0nT540V91A4V3dHKBdnB7oTwgAAJoiu3v16tU0efJk8vHxodTUVAoPDycvLy+6fPkyPfvss419OlAZrVYSs4yxEZjABABAWUH6448/pg0bNtCaNWtEwtjcuXMpOTmZ3njjDTFWGixbypWblFNSQa7O9tTv0TZyVwcAwKI1OkhzF7duqBVneJeWlorbL730EubutqKu7iE9fMnJ3k7u6gAAWLRGB2lfX1/9jGMdOnSgI0eOiNuZmZki6xcsV7VGS0ln7nZ1h2GubgAAxQXpZ555hnbs2CFu87np2bNn06BBg2js2LH03HPPNUUdQSEO/FxAt8qrybuVI0V29pK7OgAAFq/RQZrPRy9YsEDcjo2Npf/7v/+jbt260XvvvUeffPJJoyuwdu1aMfba2dmZIiIiKCUlpd7y8fHxFBwcLMr37NmTkpKSjO7n1vzixYvJz89PdMcPHDhQLKV5r3//+9/i9bhM69atadSoUY2uu7VJOFXb1T20px/Z2zX6TwcAABqpUd+0vNLV0qVLKScnR79v3LhxIuP79ddfF4lkjcHLXc6ZM0eMveahW6GhoWKt6ry8PJPlDx06ROPHjxdLY3JmOQdWvqSnp+vLLF++XNRn3bp1dPToUTHpCj8nr32t869//UucQ+eeAF5u8+DBg2JdbKhbRbWGvs/IFbeR1Q0A0EykRnJxcZEyMzMlcwgPD5diY2P12xqNRvL395fi4uJMlh8zZowUExNjtC8iIkKaPn26uK3VaiVfX19pxYoV+vuLiookJycnacuWLWK7urpaCggIkD799NOHqntxcTGfgBfX1iDpdLbU8a1EqW/cbkmj0cpdHQAAxWmKuNDoPssBAwbQ3r17H/rHAU+CcuLECdEdrWNrayu2Dx8+bPIxvN+wPONWsq48J69xK9+wDM85zt3aujLcYs/KyhKv9dhjj4lucR7fbdgah/vppgEd1suPbHkBaQAAaHKNni6KA9rbb79NZ86cod69e983h/eIESMa9DwFBQVi9SyeFMUQb58/f97kYzgAmyqv637XXddXhiddYe+88w795S9/EefDV65cSU8//TRdvHiRPD09Tb52ZWWluOiUlJSQtSitqKY952tPQQzHBCYAAMoN0n/4wx/ENQe4e/Eylhx4lUyr1YprTn57/vnn9YuHtGvXTiSlTZ8+3eTj4uLi6N133yVrlJyRS5U1WurcxoV6+LvJXR0AAKth+yBBrq5LYwK0t7c32dnZUW5ubTKSDm/zWGxTeH995XXX9ZXh7m3WvXt3/f28Nnbnzp3FRC11mTdvnphRTXe5fv06WQv9ile9/MUPMQAAaB6yjaPhTHDuLt+9e7d+Hwd63o6MjDT5GN5vWJ7xlKS68oGBgSIYG5bhbmnO8taV4dfkoHzhwgV9merqarpy5Qp17NixzvryY9zc3Iwu1uBmWRUduFQgbmMCEwCA5iXrEkY8/GrSpEnUp08fsVDHqlWrqKysTAyNYhMnTqSAgADR1cxmzpxJ/fr1E+eQY2JiaOvWrXT8+HExdptxK2/WrFlimFhQUJAI2osWLSJ/f3/9OGgOrjNmzBDDvtq3by8C84oVK8R9L774omzvhVLtTL9BNVpJdHN3adNK7uoAAFgVWYM0z1KWn58vJh/hxK6wsDDatWuXPvGLu585C1uH5wzfvHkzLVy4kObPny8C8fbt2ykkJERfhhf84EA/bdo0KioqoqioKPGcPPmJDgdle3t7MVb6zp07Ivt7z549YlITMLbj7gQmSBgDAGh+NjwOS4bXVT3uRufhXXx+2lK7vnOKKyjyg93EfyEH336GAjxayF0lAACriguY2xHqlHg6WwToPh1bI0ADAKihu7uu8cF8PpiTqxo7NSgof1lKJIwBAKgkSHt4eNQ7DIfHG7/88ssiMcvwfDKoy5WCMkr7tZh4crFnQ2qHrQEAgMKD9KZNm8REIByIOSOb8cpVn3/+uUjo4kSwDz/8ULSqObkL1NvVzZ7q6k1tXJ3krg4AgFVqdJDmYMxDoMaMGaPfN3z4cLFs5Pr168UY5Q4dOtD777+PIG0JE5hgGlAAANk0uj+al4vkhSnuxft0i1jwsKf6Zu8CZTufU0IXc2+To50tRfcwPfsbAAAoMEjzBCCfffbZfft5H9/HCgsLMebYAhLG+j3ahtxbOMhdHQAAq9Xo7m4+38wzc+3cuZOeeOIJsY9n/eKVq7766iuxfezYMTFRCagPD5tPSLshbo9AVzcAgLqCNC9FyQGZzz/z0o665St55i9e9pG9+uqr5q8pNItT14vo2s1yauFgRwO6tcW7DgCgtmlBeU7sDz74wPy1AcUkjA3q7kMtHWWdNRYAwOo90Lcwz4nNw67y8vL06zPr8KIYoE4arUT/Po2ubgAA1QbphIQEmjBhAt2+fVvMTWo4sQnfRpBWr6OZhZRXWkluzvb0u0fayF0dAACr1+js7j/+8Y/0yiuviCDNLepbt27pLzdv3rT6N9QSsrp5hjFHe8wWBwAgt0Z/E2dlZdEbb7xBLVu2bJoagSyqarS0Mz1H3MZc3QAAKg3S0dHRYsgVWJYDP+dTUXk1ebdyoic7e8ldHQAAeJBz0jExMfTmm29SRkaGmArUwcHhviFaoD47TtV2dQ/r5Ud2vKoGAACoL0hPnTpVXL/33nv33ceJYxqNxjw1g2Zzp0pDyRm54jbm6gYAUHGQvnfIFajfnvN5VFaloQCPFvR4Bw+5qwMAAHchhRdoR1qWvhVd31rhAACgwJb06tWradq0aeTs7Cxu14czv0E9Siqq6ccL+eI25uoGAFAWG4lXVGjANKCc0e3l5SVu1/lkNjZ0+fJlsgYlJSXk7u5OxcXFYlIXtfrqxK/0p/g06tq2FSXP/h1a0gAACooLDWpJZ2ZmmrwNljNXN7ei0dUNAKAsOCdtxQpvV9LBnwvEbWR1AwBYQHY3D7HatGkT7d692+QCG3v27DFn/aAJJaXniEU1ega4U6C3C95rAAC1B+mZM2eKIM2TmoSEhKCLVMUS7k5ggoQxAAALCdJbt26lL7/8koYOHdo0NYJmkV10h1Ku1C6IEtPLD+86AIAlnJN2dHSkrl27Nk1toNno1o0O7+RJ/h4t8M4DAFjKUpV//etfqQEjt0AFWd3Dw/zlrgoAAJiru/vAgQP0448/0s6dO6lHjx73LbDx9ddfN/YpoZllFpTRmaxisZDG0BBfvP8AAJYSpD08POi5555rmtpAs0i424p+qqs3ebVywrsOAGAJQbqmpob69+9PgwcPJl9ftMDUiE9TGE5gAgAAFnJO2t7enmbMmEGVlZVmrcTatWupU6dOYm7wiIgISklJqbd8fHw8BQcHi/K8pnVSUtJ9gWjx4sXk5+dHLVq0oIEDB9KlS5dMPhcfS1hYmBhKdurUKbJ0526U0s95t8nR3pYG9/CRuzoAAGDOxLHw8HBKTU0lc9m2bRvNmTOHlixZQidPnqTQ0FCKjo4WE6WYcujQIRo/fjxNmTJF1GPUqFHikp6eri+zfPlysRDIunXr6OjRo+Ti4iKes6Ki4r7nmzt3Lvn7W0+LUteK7v9oG3JzNs4nAAAAhZEaadu2bVLnzp2lNWvWSIcOHZLS0tKMLo0VHh4uxcbG6rc1Go3k7+8vxcXFmSw/ZswYKSYmxmhfRESENH36dHFbq9VKvr6+0ooVK/T3FxUVSU5OTtKWLVuMHpeUlCQFBwdLZ8+e5VR1KTU1tcH1Li4uFo/ha7Xg96Zv3G6p41uJUmJattzVAQCwKMVNEBcanTg2bty4+5ak5K5i7mLma542tKGqqqroxIkTNG/ePP0+W1tb0T19+PBhk4/h/dzyNsSt5O3bt+sXAMnJyRHPocOrknA3Oj9WV//c3FyaOnWqeFzLli1/s67cLW7Yzc+rnajNyWtFlFV0h1wc7eiZ4LZyVwcAAH5Do4O0OVfBKigoEEHdx8f43Chvnz9/3uRjOACbKs/7dffr9tVVhn9QvPzyy+L8ep8+fejKlSu/Wde4uDh69913yRKyugd196EWjnZyVwcAAMwdpDt27Ehqt2bNGiotLTVqwf8WLmvYgueWdPv27UkteCGNxLuzjI3ABCYAAJYZpHUyMjLo2rVrosva0IgRIxr8HN7e3mRnZye6ng3xdl1DvHh/feV117yPs7sNy3AWt26lLu76dnIyHiPMreoJEybQ559/ft/rctl7y6vJkcuFVHC7kjxaOlBU1zZyVwcAAJoiSF++fFlMZnLmzBn9uWjGt1ljzknzPOC9e/cWy15yhjbjpS95+7XXXjP5mMjISHH/rFmz9PuSk5PFfhYYGCgCNZfRBWVu9XKW96uvviq2OfN76dKl+sdnZ2eL89qcac7nri3RjrsrXj0b4iuGXwEAgIUuVcmBkIMgX/OY5sLCQjGn94cfftjoCnAX8qRJk0Qrlod3rVq1isrKymjy5Mni/okTJ1JAQIA4J6x7/X79+tHKlSvFcpm8Ktfx48dpw4YN+h8LHMA5CAcFBYk6Llq0SAyz0v0Q6NChg1EdWrVqJa67dOlC7dq1I0tTVaOlnem1Xd3DMYEJAIDlBmnuJubuYu6q5kxsvkRFRYkgyhnfjR1DPXbsWMrPzxeTj3BiF7d+d+3apU/84i51fg2dvn370ubNm2nhwoU0f/58EYg5Q5vXtjYc+8yBftq0aVRUVCTqx8/Jk59Yo30X86mkoobaujpRRKCX3NUBAIAGsuFxWNQIrVu3FpOOcAuVW56ffvqpmCr0l19+EbN/lZeXkzXgLnQe2lVcXExubm6kZG9sSRWTmLzyVCAtHt5d7uoAAFikkiaIC41uSXOLNS0tTQRpPn/Ls3vxuWXubu7cubNZKgXmU15VQ8kZtYl2w0P/k0gHAADK1+ggzd3M3JXM3nvvPRo2bBj913/9F3l5eYnEK1CW3efy6E61hjp4tqSw9h5yVwcAAJoySHMWtE7Xrl3FpCM3b94U3eC6DG9Q3lzd3IrG5wMAoC4PPBbn559/pu+++47u3LlDnp6e5q0VmEXxnWraeyFf3EZWNwCAFQRpHm41YMAAeuSRR2jo0KF040bt0B5elYqHYYFyfHc2h6o0WnrEpxUF+yo7uQ0AAMwQpGfPnk0ODg5iaJThwhQ8lIqHOYHy5uoegbHRAADWcU76+++/F93c9076weOVr169as66wUPIL62kgz8XiNvDelnPetkAAFbdkubMblNLO3LymJrntrY0PMOYViIKbedOnbxd5K4OAAA0R5Dm4Vb/+Mc/9NucMczzbfN4aZ7UBJQ1VzcSxgAArKi7m4MxJ47xfNm8AhZPwXn27FnRkj548GDT1BIaJavoDh2/eot4RBy6ugEArKglzTOOXbx4UcyHPXLkSNH9PXr0aDFnN08TCspJGAvv5Em+7tY5XzkAgNWuJ81zky5YsMBo36+//ioWtNCtRgUKyOoOQ8IYAICamW1hYR4//dlnn5nr6eAB/ZJ/m85ml5C9rQ09G4K5ugEA1MxsQRqUlTAWFeRNni6OclcHAAAeAoK0BeFVRxNOYwITAABLgSBtQbib+3J+GTnZ29Kg7j5yVwcAAJorcYwzuOtTVFT0sHUBMyWMPRPcllydHfB+AgBYS5DmjO7fun/ixInmqBM8AK1WosTTtYudYK5uAAArC9IbN25s2prAQzl57ZaYxKSVkz31D26LdxMAwALgnLSF2HG3q3twDx9ydrCTuzoAAGAGCNIWoEajpaQztV3dmKsbAMByIEhbgMOXC6ngdhW1bulAUV295a4OAACYCYK0BU1gMrSnHznY4SMFALAU+EZXucoaDe06myNuo6sbAMCyIEir3N4L+VRaUUO+bs5i1SsAALAcCNIWktU9rJcf2drayF0dAAAwIwRpFSurrKEfzuWK2+jqBgCwPAjSKsYBuqJaSx29WlKvdvXPCAcAAOqDIG0Bc3XzNKA2NujqBgCwNAjSKlVUXkV7L+aL25irGwDAMikiSK9du5Y6depEzs7OFBERQSkpKfWWj4+Pp+DgYFG+Z8+elJSUdN+6yosXLyY/Pz9q0aIFDRw4kC5duqS//8qVKzRlyhQKDAwU93fp0oWWLFlCVVVVpBa70nOoWiNRsK8rBfm4yl0dAACwxCC9bds2mjNnjgiSJ0+epNDQUIqOjqa8vDyT5Q8dOkTjx48XQTY1NZVGjRolLunp6foyy5cvp9WrV9O6devo6NGj5OLiIp6zoqJC3H/+/HnSarW0fv16Onv2LH300Uei7Pz580ktEk7XdnUjYQwAwIJJMgsPD5diY2P12xqNRvL395fi4uJMlh8zZowUExNjtC8iIkKaPn26uK3VaiVfX19pxYoV+vuLiookJycnacuWLXXWY/ny5VJgYGCD611cXCzx28fXzS235I4U+Hai1PGtROlaYVmzvz4AADRPXJC1Jc3dyydOnBDd0Tq2trZi+/DhwyYfw/sNyzNuJevKZ2ZmUk5OjlEZXuuau9Hrek5WXFxMnp7qmAzk36dvkFYiCmvvQe09W8pdHQAAkHs96aZQUFBAGo2GfHx8jPbzNndJm8IB2FR53q+7X7evrjL3+vnnn2nNmjX04Ycf1lnXyspKcdEpKSkhJWR1AwCA5ZL9nLTcsrKyaMiQIfTiiy/S1KlT6ywXFxcnWuS6S/v27UkO12+W08lrRcQjrniWMQAAsFyyBmlvb2+ys7Oj3NzaWbN0eNvX19fkY3h/feV11w15zuzsbOrfvz/17duXNmzYUG9d582bJ7rEdZfr16+TnAljTwZ6UVs3Z1nqAAAAVhCkHR0dqXfv3rR79279Ps665u3IyEiTj+H9huVZcnKyvjwPq+JgbFiGu6Y5y9vwObkF/fTTT4vX37hxozgXXh8nJydyc3MzusghIe2GuB4Rhq5uAABLJ+s5acbDryZNmkR9+vSh8PBwWrVqFZWVldHkyZPF/RMnTqSAgADR3cxmzpxJ/fr1o5UrV1JMTAxt3bqVjh8/rm8J88xbs2bNoqVLl1JQUJAI2osWLSJ/f38xVMswQHfs2FGch87Pr50UhNXVgleCn/NK6dyNEnKws6FnQ5RbTwAAsJAgPXbsWBEkefIRTuwKCwujXbt26RO/rl27ZtTK5a7pzZs308KFC8W4Zg7E27dvp5CQEH2ZuXPnikA/bdo0KioqoqioKPGcPPmJruXNyWJ8adeu3X0ToSjVjlO1Xd2/C2pDHi0d5a4OAAA0MRseh9XUL2KJuAudE8j4/HRzdH3zx9T/w5/oSmE5rRobRqMeC2jy1wQAAHnjgtVnd6tFelaJCNDODrY0qLvx8DIAALBMCNIqsSMtS1wP6OZDLk6yn6UAAIBmgCCtAlqtRImna7O6h/dCVjcAgLVAkFaB41dv0Y3iCnJ1sqenH20jd3UAAKCZIEirqKs7OsSXnB3s5K4OAAA0EwRphavWaCnpTO2c41iWEgDAuiBIK9yhXwrpZlkVebk40lNdvOSuDgAANCMEaYXTTWAytKcf2dvh4wIAsCb41lewimoNfX8WXd0AANYKQVrBfrqQR6WVNeTn7kx9OraWuzoAANDMEKQVTLfiFSeM2drayF0dAABoZgjSCnW7soZ+OFe7JvaIUExgAgBgjRCkFSo5I4cqa7QU6O1CPfzlWbsaAADkhSCtgq5uXiMbAACsD4K0At0qq6J9F/PF7RGhfnJXBwAAZIIgrUA703OoRitRNz836trWVe7qAACATBCkFSghrXYCEySMAQBYNwRphcktqaAjmYXi9rBe6OoGALBmCNIKw+tGSxLR4x08qL1nS7mrAwAAMkKQVpgd6OoGAIC7EKQV5FphOaVdLyKeXCymFyYwAQCwdgjSCpJwujZhrG8Xb2rj6iR3dQAAQGYI0gpclnI4xkYDAACCtHJcyCmlC7ml5GBnQ0N6IKsbAADQklbc2Oh+j7Ql95YOclcHAAAUAN3dCiBJkj6rG13dAACggyCtAKd/LaZrN8uphYMdDeruI3d1AABAIRCkFUDXih7Y3YdaOtrLXR0AAFAIBGmZabQSJd4dejUc04ACAIABBGmZpWTepNySSnJztqd+j7aRuzoAAKAgCNIKmcBkSIgvOdnbyV0dAABQEEUE6bVr11KnTp3I2dmZIiIiKCUlpd7y8fHxFBwcLMr37NmTkpKS7suWXrx4Mfn5+VGLFi1o4MCBdOnSJaMyN2/epAkTJpCbmxt5eHjQlClT6Pbt29ScqjVa2nnmhrg9IjSgWV8bAACUT/YgvW3bNpozZw4tWbKETp48SaGhoRQdHU15eXkmyx86dIjGjx8vgmpqaiqNGjVKXNLT0/Vlli9fTqtXr6Z169bR0aNHycXFRTxnRUWFvgwH6LNnz1JycjIlJibSvn37aNq0adScDlwqoFvl1eTdypGe7OzZrK8NAAAqIMksPDxcio2N1W9rNBrJ399fiouLM1l+zJgxUkxMjNG+iIgIafr06eK2VquVfH19pRUrVujvLyoqkpycnKQtW7aI7YyMDIkP/dixY/oyO3fulGxsbKSsrKwG1bu4uFg8B18/qNlbU6WObyVKi7efeeDnAAAAZTBHXLiXrC3pqqoqOnHihOiO1rG1tRXbhw8fNvkY3m9YnnErWVc+MzOTcnJyjMq4u7uLbnRdGb7mLu4+ffroy3B5fm1ueZtSWVlJJSUlRpeHNePpLhTbvwu90Lv9Qz8XAABYHlmDdEFBAWk0GvLxMZ7Ag7c50JrC++srr7v+rTJt27Y1ut/e3p48PT3rfN24uDgR7HWX9u0fPrA+4uNKb0YHU8927g/9XAAAYHlkPyetFvPmzaPi4mL95fr163JXCQAALJysQdrb25vs7OwoNzfXaD9v+/r6mnwM76+vvO76t8rcm5hWU1MjMr7rel0nJyeRCW54AQAAsNgg7ejoSL1796bdu3fr92m1WrEdGRlp8jG837A84wxtXfnAwEARaA3L8PljPtesK8PXRUVF4ny4zp49e8Rr87lrAAAARZBktnXrVpF5vWnTJpF1PW3aNMnDw0PKyckR97/00kvS22+/rS9/8OBByd7eXvrwww+lc+fOSUuWLJEcHBykM2f+kyH9wQcfiOf49ttvpdOnT0sjR46UAgMDpTt37ujLDBkyRHrssceko0ePSgcOHJCCgoKk8ePHy5rFBwAA6lXcBHFB9tUcxo4dS/n5+WLyEU7aCgsLo127dukTv65duyayrnX69u1LmzdvpoULF9L8+fMpKCiItm/fTiEhIfoyc+fOpbKyMjHumVvMUVFR4jl58hOdL774gl577TUaMGCAeP7nn39ejK0GAABQChuO1HJXQo24C52zvDmJDOenAQCgpAniArK7AQAAFEr27m610nVAmGNSEwAAUL+Su/HAnB3UCNIPqLS0VFybY1ITAACwrPjg7m6eSapwTvoB8XCt7OxscnV1JRsbmwf+1cVBnidGsZTz2jgmdcDnpA74nNT1OXGiM8cDf39/o4Tnh4GW9APiD6Bdu3Zm+RAscXIUHJM64HNSB3xO6sCtZ3N/lyNxDAAAQKEQpAEAABQKQVpGPB/4kiVLxLWlwDGpAz4ndcDnpA5N+TkhcQwAAECh0JIGAABQKARpAAAAhUKQBgAAUCgEaRmtXbuWOnXqJFbn4nWsU1JSSC3i4uLoiSeeEJO5tG3blkaNGkUXLlwwKlNRUUGxsbHk5eVFrVq1EiuN5ebmkhp88MEHYlKCWbNmqfp4srKy6H/+539EnVu0aEE9e/ak48eP6+/n6Qt5BTo/Pz9x/8CBA+nSpUukVBqNhhYtWiTWjef6dunShf73f//XaBpGpR/Tvn37aPjw4WLCC/4b41X8DDWk/jdv3qQJEyaIMbkeHh40ZcoUun37NinxmKqrq+mtt94Sf3suLi6izMSJE8VkUGo9pnvNmDFDlFm1apXZjwlBWibbtm2jOXPmiIzAkydPUmhoKEVHR1NeXh6pwd69e0XAOnLkCCUnJ4t/xMGDB4slQnVmz55NCQkJFB8fL8rzP+Xo0aNJ6Y4dO0br16+nXr16Ge1X2/HcunWLnnrqKXJwcKCdO3dSRkYGrVy5klq3bq0vs3z5crFE67p16+jo0aPiS5T/DvkHiRItW7aMPvnkE/rb3/5G586dE9t8DGvWrFHNMfH/CP+/8490UxpSf/7iP3v2rPjfS0xMFAGFl+ZV4jGVl5eL7zj+ccXXX3/9tfhBP2LECKNyajomQ9988434HuRgfi+zHJPZVqaGRgkPD5diY2P12xqNRvL395fi4uJU+U7m5eWJxc737t0rtouKiiQHBwcpPj5eX+bcuXOizOHDhyWlKi0tlYKCgqTk5GSpX79+0syZM1V7PG+99ZYUFRVV5/1arVby9fWVVqxYod/Hx+nk5CRt2bJFUqKYmBjplVdeMdo3evRoacKECao8Jv77+eabb/TbDal/RkaGeNyxY8f0ZXbu3CnZ2NhIWVlZktKOyZSUlBRR7urVq6o+pl9//VUKCAiQ0tPTpY4dO0offfSR/j5zHRNa0jKoqqqiEydOiG4sw2lGefvw4cOkRrx+KvP09BTXfHzcujY8xuDgYOrQoYOij5F7B2JiYozqrdbj2bFjB/Xp04defPFFcUriscceo7///e/6+zMzMyknJ8fomHhaQz71otRj6tu3L+3evZsuXrwottPS0ujAgQP07LPPqvaYDDWk/nzNXaf82epwef4O4Za3Wr4vuHuYj0Otx6TVaumll16iN998k3r06HHf/eY6JszdLYOCggJxbs3Hx8doP2+fP3+e1Ib/WPncLXethoSEiH38RePo6Kj/JzQ8Rr5PibZu3Sq647i7+15qPJ7Lly+LrmE+rTJ//nxxXG+88YY4jkmTJunrbervUKnH9Pbbb4vFDPgHkp2dnfg/ev/990W3IlPjMRlqSP35mn90GbK3txc/kNVwjNxtz+eox48fr5/nWo3HtGzZMlFH/p8yxVzHhCANZml9pqenixaNWvFKZDNnzhTnjjiRzxLwjyf+Ff/nP/9ZbHNLmj8nPtfJQVqNvvzyS/riiy9o8+bNovVy6tQp8QORzweq9ZisCfdGjRkzRiTH8Q9ItTpx4gT99a9/FT/qH3QVxIZCd7cMvL29RSvg3sxg3vb19SU1ee2110RCxI8//mi0KhgfB3frFxUVqeIY+Z+Ok/Yef/xx8WuXL5wcxgk8fJtbMmo6HsbZwd27dzfa161bN7GcHtPVW01/h9y1yK3pcePGiWxh7m7khD4ebaDWYzLUkPrz9b0JpjU1NSKTWMnHqAvQV69eFT+GDVeLUtsx7d+/X9SXT3fpvi/4uP74xz+KETvmPCYEaRlwd2Pv3r3FuTXDVg9vR0ZGkhrwL2EO0JzZuGfPHjEkxhAfH2cVGx4jZ3RygFDiMQ4YMIDOnDkjWma6C7dCuRtVd1tNx8P49MO9w+L4XG7Hjh3Fbf7M+MvC8Ji4K5nPlyn1mDhT+N51evkHL///qPWYDDWk/nzNPxb5h6UO/w/ye8DnrpUcoHko2Q8//CCGBBpS2zG99NJLdPr0aaPvC+7N4R+R3333nXmPyQyJb/AAtm7dKjI2N23aJLIAp02bJnl4eEg5OTmqeD9fffVVyd3dXfrpp5+kGzdu6C/l5eX6MjNmzJA6dOgg7dmzRzp+/LgUGRkpLmphmN2txuPhDFp7e3vp/fffly5duiR98cUXUsuWLaV//vOf+jIffPCB+Lv79ttvpdOnT0sjR46UAgMDpTt37khKNGnSJJFNm5iYKGVmZkpff/215O3tLc2dO1c1x8QjCFJTU8WFv4L/8pe/iNu6TOeG1H/IkCHSY489Jh09elQ6cOCAGJEwfvx4RR5TVVWVNGLECKldu3bSqVOnjL4vKisrVXlMptyb3W2uY0KQltGaNWvEl76jo6MYknXkyBFJLfiP1tRl48aN+jL8pfKHP/xBat26tQgOzz33nPjHVGuQVuPxJCQkSCEhIeIHYXBwsLRhwwaj+3nIz6JFiyQfHx9RZsCAAdKFCxckpSopKRGfCf/fODs7S507d5YWLFhg9GWv9GP68ccfTf7v8A+Qhta/sLBQfNm3atVKcnNzkyZPniyCihKPiX9M1fV9wY9T4zE1NEib45iwChYAAIBC4Zw0AACAQiFIAwAAKBSCNAAAgEIhSAMAACgUgjQAAIBCIUgDAAAoFII0AACAQiFIAwAAKBSCNAAAgEIhSAOAXn5+Pr366qtidR8nJyex2EN0dDQdPHhQ3M/L8m3fvh3vGEAzwXrSAKD3/PPPiyU5P//8c+rcubNYIpFXZCosLMS7BCADzN0NAAIvq9e6dWv66aefqF+/fve9K7xOLq+Zq8NLXl65ckXc/vbbb+ndd9+ljIwMsWTfpEmTaMGCBWKdXfFFY2NDH3/8Me3YsUM8P691vXz5cnrhhRfw7gPUA93dACC0atVKXLg7u7Ky8r535dixY+J648aNdOPGDf32/v37aeLEiTRz5kwRpNevX0+bNm2i999/3+jxixYtEi31tLQ0sU73uHHj6Ny5c3j3AeqBljQA6P3rX/+iqVOn0p07d+jxxx8XLWoOpr169ar9wrCxoW+++YZGjRqlf8zAgQNpwIABNG/ePP2+f/7znzR37lzKzs7WP27GjBn0ySef6Ms8+eST4jW4hQ0ApqElDQB63NLlwMrd0kOGDBFd0xxIuWVcF24Zv/fee/qWOF840HNru7y8XF8uMjLS6HG8jZY0QP2QOAYARpydnWnQoEHiwl3Uv//972nJkiX08ssvm3ynbt++Lc5Hjx492uRzAcCDQ0saAOrVvXt3KisrE7cdHBxIo9EY3c8t7QsXLlDXrl3vu9ja/ucr5siRI0aP4+1u3brh3QeoB1rSACDwMKsXX3yRXnnlFXEO2tXVlY4fPy6ysEeOHKnP8OYhWU899ZQYR83Z4IsXL6Zhw4aJsdWcrc2BmbvA09PTaenSpfp3Nz4+nvr06UNRUVH0xRdfUEpKCn322Wd49wHqgcQxABA4o/udd96h77//nn755Reqrq6m9u3bi8A9f/58atGiBSUkJNCcOXPE0KuAgAD9EKzvvvtOnJdOTU0Vre3g4GDRTc7npsUXjY0NrV27VmSO79u3TwzBWrZsGY0ZMwbvPkA9EKQBoMmZygoHgN+Gc9IAAAAKhSANAACgUEgcA4AmJ0kS3mWAB4CWNAAAgEIhSAMAACgUgjQAAIBCIUgDAAAoFII0AACAQiFIAwAAKBSCNAAAgEIhSAMAACgUgjQAAAAp0/8DaegWV0FT+LQAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 500x300 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.figure(figsize=(5, 3))\n",
    "plt.ylabel(\"Learning rate\")\n",
    "plt.xlabel(\"Step\")\n",
    "total_training_steps = len(train_loader) * n_epochs\n",
    "plt.plot(range(total_training_steps), track_lrs)\n",
    "plt.tight_layout(); plt.savefig(\"1.pdf\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b3996b6-3f7a-420a-8584-c5760249f3d8",
   "metadata": {},
   "source": [
    "## D.2 Cosine decay"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5216214-de79-40cf-a733-b1049a73023c",
   "metadata": {},
   "source": [
    "- Another popular technique for training complex deep neural networks is cosine decay, which also adjusts the learning rate across training epochs\n",
    "- In cosine decay, the learning rate follows a cosine curve, decreasing from its initial value to near zero following a half-cosine cycle\n",
    "- This gradual reduction is designed to slow the pace of learning as the model begins to improve its weights; it reduces the risk of overshooting minima as the training progresses,  which is crucial for stabilizing the training in its later stages\n",
    "- Cosine decay is often preferred over linear decay for its smoother transition in learning rate adjustments, but linear decay is also used in practice (for example, [OLMo: Accelerating the Science of Language Models](https://arxiv.org/abs/2402.00838))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4e8d2068-a057-4abf-b478-f02cc37191f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import math\n",
    "\n",
    "min_lr = 0.1 * initial_lr\n",
    "track_lrs = []\n",
    "\n",
    "lr_increment = (peak_lr - initial_lr) / warmup_steps\n",
    "global_step = -1\n",
    "\n",
    "for epoch in range(n_epochs):\n",
    "    for input_batch, target_batch in train_loader:\n",
    "        optimizer.zero_grad()\n",
    "        global_step += 1\n",
    "    \n",
    "        # Adjust the learning rate based on the current phase (warmup or cosine annealing)\n",
    "        if global_step < warmup_steps:\n",
    "            # Linear warmup\n",
    "            lr = initial_lr + global_step * lr_increment  \n",
    "        else:\n",
    "            # Cosine annealing after warmup\n",
    "            progress = ((global_step - warmup_steps) / \n",
    "                        (total_training_steps - warmup_steps))\n",
    "            lr = min_lr + (peak_lr - min_lr) * 0.5 * (\n",
    "                1 + math.cos(math.pi * progress))\n",
    "        \n",
    "        # Apply the calculated learning rate to the optimizer\n",
    "        for param_group in optimizer.param_groups:\n",
    "            param_group[\"lr\"] = lr\n",
    "        track_lrs.append(optimizer.param_groups[0][\"lr\"])\n",
    "    \n",
    "        # Calculate loss and update weights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "0e779e33-8a44-4984-bb23-be0603dc4158",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAekAAAEiCAYAAADd4SrgAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjcsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvTLEjVAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAScBJREFUeJzt3Qd0VFXXBuCX9EIKJJBGC0UCJHQSAigiKCgg7ZciUiJKEZWi8oFSxE8/REARQYoiQQVBFGkCSpMaEjqEjnRIhxQC6fOvfWDGBAISSHJnMu+z1l0zc+dmcu4MZM85d599Sul0Oh2IiIjI6Fho3QAiIiLKH4M0ERGRkWKQJiIiMlIM0kREREaKQZqIiMhIMUgTEREZKQZpIiIiI8UgTUREZKSstG6AqcrJycHVq1fh5OSEUqVKad0cIiLSmNQGS0lJgbe3NywsCqcPzCD9iCRAV6xYsVA+BCIiKjkuXbqEChUqFMprMUg/IulB6z8MZ2fnQvkwiIjIdCUnJ6vOmz4+FAYG6UekH+KWAM0gTUREeoV5CZSJY0REREaKQZqIiMhIMUgTEREZKc2D9KxZs1ClShXY2dkhKCgIERERDzx+2bJl8PPzU8cHBARg7dq1eZ5fvnw5nnvuObi5uanrAgcPHrznNdLS0jB06FB1TOnSpdGtWzfExMQU+rkRERGZbJBeunQpRo4ciQkTJmD//v2oV68e2rZti9jY2HyP37VrF3r16oUBAwbgwIED6Ny5s9oiIyMNx6SmpqJFixaYPHnyfX/viBEjsHr1ahXwt27dqqZTde3atUjOkYiI6FGV0snsa41Iz7lJkyaYOXOmoUCIpK+/9dZbGD169D3H9+jRQwXhNWvWGPY1bdoU9evXx5w5c/Ice/78efj6+qpgLs/rJSUloVy5cli8eDH+7//+T+07ceIEatWqhbCwMPV6D5tq7+Liol6P2d1ERJRcBHFBsylYGRkZ2LdvH8aMGWPYJxVa2rRpo4JlfmS/9Lxzk573ihUrHvr3yu/MzMxUv0dPhs8rVar0wCCdnp6uttwfBhWdnBwdPll7HKdjb8DWygJ21pZ3bi3gaGuFcqVtUd7ZDuWdbOHhbAcPZ1s42HBGIRGVLJr9VYuPj0d2djY8PDzy7JfH0rPNT3R0dL7Hy/6HJcfa2NjA1dW1QK8zadIkTJw48aF/Dz2esLMJmL/jXIF+xtPZDjU8SqN6+dKoUd5J3a/l5YzStgzeRGSa+NfrIUmPP3cvXl9ZhorGqoNX1e3TNcvhudqeSM/KRlpmjrpNSctCbEo6YpLTEJeSjtjkNKRmZCM6OU1t20/HG17HohTg5+mMxlXKoFHl25uPqz3rrRORSdAsSLu7u8PS0vKerGp57Onpme/PyP6CHH+/15Ch9sTExDy96X97HVtbW7VR0ZNAvDYySt0f9FQ1BFdz+9efSbqViTOxN3AmNgWnY26oYfJTMSmISkrDsahktX0fdkEdK0Fagv8zfuXRrJo77G0si/yciIhMKkjLkHOjRo2wadMmlaGtTxyTx2+++Wa+PxMcHKyeHz58uGHfhg0b1P6HJb/T2tpavY5MvRInT57ExYsXC/Q6VHS2noxTvWUZvg70LftQP+Nib23oKecWnZSGvReuYd+F62o7ejUZVxJvYVH4RbXJde5m1dzwTC0PPO/vCffS/CJGRMZD0+FuGT7u168fGjdujMDAQEyfPl1lb4eEhKjn+/btCx8fH3U9WAwbNgwtW7bEtGnT0L59eyxZsgR79+7FvHnzDK957do1FXBlWpU+AAvpJcsmmXcyhUt+d9myZVUGnmSTS4B+2MxuKlorD93+7DrU9YKljFc/Bk8XO3So6602cTMjC2F/J2DziVhsORGLq0lp2HIyTm0frjqKJ2u4o3N9HzxXx4OJaERk3kFaplTFxcVh/PjxKmlLpkqtX7/ekBwmwTb3mpzNmjVTU6fGjh2L999/HzVq1FCZ3f7+/oZjVq1aZQjyomfPnupW5mJ/+OGH6v4XX3yhXld60pKxLRniX3/9dTGeOd3PjfQsbDp++5JGp/o+hf5GSQZ461oeapPZhydjUlTAXh8ZjcOXk/DXyTi1OdhY4rnaHujepCKCq94ujENEZFbzpE0Z50kXjd8OXMaIpYfg6+6Ize+0LNbg+HfcDaw8eBUrD17BhYSbhv01ypdGn+DK6NLAB0521sXWHiIyLUURFxikjejDICBkQYQaeh7WugZGPPuEJm+JfG89eCkRv+y7jN8OXMHNjGy139HGEl0bVkC/ZpVRvXzhrRdLRCVDMoO08WCQLnzXUjMQ+MlGZOXosOmdlqhWrjS0lpKWieX7r+D7sPP4Oy5V7ZPOfdvanhjaqjoCKrho3UQiMhIlquIY0d3WHolSAdrfx9koArSQ4e1+zaqgb3Bl7Po7AaG7zmPDsRisPxqttqeeKIehT1dDUNV/nyZGRFRQDNJkdAVMXqx3OxPbmMi18ebV3dUm869n//U3Vh26im2n4tQWWKUs3m1b86GnjBERmcRSlUTiauItRJy/poaSOxphkM7tCQ8nfNGjPra88zReDqoEG0sL1fbuc8MwIHQPTkanaN1EIiohGKTJKKy+Mze6SZWy8HKxhymo5OaA/3UJwLZRrVSwljndm07Eot2X2/DuskOqaAoR0eNgkCajIEPHolN94+5F369gigTrP0c8paqWyaRGyQxvNfUvTPnjhCqgQkT0KBikSXNSc1vKdVpZlMIL/l4wVZLsNvuVRlj+RjME+ZZFRlYOZm35G62nbcXvh6PU1C4iooJgkCaj6UVLpnQZRxuYuoaVymDJwKaY80ojtZiHLPIxdPF+vDI/XC0AQkT0sBikSVPSu1x18IrRZnU/TjZ4O39PbBzZEm+3rgEbKwvsPJOAdtO3Y9K640jLvF0ghYjoQRikSVNHriThfMJN2Flb4Nnat2u2lySyDObIZ5/AxhEt0aaWh5oHPnfrWTz/5XaEn03QunlEZOQYpElTUitbSABztC250/YlE/zbfo3xTd/G8HC2xbn4VPSYtxvjVkSqqmZERPlhkCbNZOfosObw1SJb8coYyWjBnyNaoldgRfX4h90X0PaLbdhyMlbrphGREWKQJs2En0tATHI6nO2s8NQT7mbzSbjYW2NS17pY/FoQKpV1UGtahyzYgw9+O8LpWkSUB4M0aV7A5Hl/L9haWZrdJ9GsujvWD38SIc2rqMeLwi+iw4wdOHQpUeumEZGRYJAmTcgc4rVHok22gElhcbCxwoSOdbDotSB4OtvhbHwqus3eha82nUZWdo7WzSMijTFIkyZkUYqkW5ko72TLFaQAtXCH9Krb1/VSGeDTNpxSiWWXrt3kv1AiM8YgTZpYeWeou0Ndb1XzmgBXBxvM7NUAX/SoBydbK+y7cB3tZ2xXS2MSkXlikKZil5qehY13As+LZjzUfb8iKF0aVMDaYU+iXkVXJKdl4fXv9+J/a48jk8PfRGaHQZqK3cbjMbiVmY3Kbg6oV8GFn0A+KpZ1wLJBwXi1ua96PG/bWfSYG6aW9CQi88EgTcVu1Z0CJlIGVHqOlD8pJTq+Y21VA9zJzgr7LybihRnb8RfnVBOZDQZpKlbXUzOw9VQczD2ruyCkBvjvbz2JAB8XJN7MREjoHszacoarahGZAQZpKlbrIqNV9nItL2dUL+/Ed78AZUV/GRKMXoGV1HrVU/44qVbWkuv7RFRyMUhTsVp5Z8Ur9qILTgq+TOoagP91CYC1ZSk1z7zr17twISG10D8nIjIODNJUbKKSbiHi/DV1v2MJWpayuL0cVEmtV13OyRYnY1LQ8asdhksIRFSyMEhTsVlzKEoN1TapUgY+rvZ85x9Do8plseatFmhQ6fY0rZAFEQjdeY7vKVEJwyBNxWbVoX+yuunxeTjbqR5198YVkKMDPlx9DONXRrKcKFEJwiBNxeJs3A0cuZKkqou9EODFd70Qr1NP7lYXY573g8xm+z7sAl5duBfJXKOaqERgkKZi7UW3qO4Ot9K2fNcLkcw1H9SymppPbW9tqeqid/t6F+t+E5UADNJU5HQ6naGACbO6i07bOp5YNjgYHs62OB17A51n7cSBi9eL8DcSUYkP0rNmzUKVKlVgZ2eHoKAgREREPPD4ZcuWwc/PTx0fEBCAtWvX3hMQxo8fDy8vL9jb26NNmzY4ffp0nmNOnTqFTp06wd3dHc7OzmjRogW2bNlSJOdHwNGryWoJRlsrCzxXx5NvSRHy93HByqEtUMfbGQmpGej1zW5sOs4FOohMlaZBeunSpRg5ciQmTJiA/fv3o169emjbti1iY2PzPX7Xrl3o1asXBgwYgAMHDqBz585qi4yMNBzz2WefYcaMGZgzZw7Cw8Ph6OioXjMtLc1wTIcOHZCVlYXNmzdj37596vfKvujo2+sbU9HMjW5TywOlba349hYxTxc7/DwoGC2fKIe0zBy1QMfi8It834lMkU5DgYGBuqFDhxoeZ2dn67y9vXWTJk3K9/ju3bvr2rdvn2dfUFCQbtCgQep+Tk6OztPTUzdlyhTD84mJiTpbW1vdTz/9pB7HxcXp5LS3bdtmOCY5OVnt27Bhw0O3PSkpSf2M3NL9ZWfn6Jr+b6Ou8n/W6NYdieJbVYwysrJ17/x8UL33sk3744T6P0JERaMo4oJmPemMjAzVi5XhaD0LCwv1OCwsLN+fkf25jxfSS9Yff+7cOdUbzn2Mi4uLGkbXH+Pm5oaaNWvi+++/R2pqqupRz507F+XLl0ejRo2K6GzN157z1xCVlKYWiHi6Zjmtm2NWrC0tMOX/6uLtZ6qrxzM2n8GoXw5zyUsiE6LZ2GN8fDyys7Ph4eGRZ788PnHiRL4/IwE4v+P1w9T62wcdI5mwGzduVMPkTk5O6ouBBOj169ejTJky921venq62vSSk5MLfM7maOWdrO52dTxhZ22pdXPMjvx7H/lcTXi42GHcikgs23cZ11IzMKt3Q34eRCZA88Sx4iaJZUOHDlWBefv27SpRTQJ2x44dERUVdd+fmzRpkuqV67eKFSsWa7tNUUZWDtYeuf2edqrvo3VzzFrvoMqY26exSt7bdCIW/RdE4AYX5yAyepoFacmstrS0RExM3sxTeezpmX8GsOx/0PH62wcdI8lia9aswZIlS9C8eXM0bNgQX3/9tcoEX7hw4X3bO2bMGCQlJRm2S5cuPeKZm48dZ+LU0orupW0RXM1N6+aYvWdre2Dhq4EqeW/32Wvo/c1utXQoERkvzYK0jY2Nuga8adMmw76cnBz1ODg4ON+fkf25jxcbNmwwHO/r66uCce5jZFhasrz1x9y8eVPdyjB3bvJYfv/92NraqulauTd6MP3c6A51vVSlMdJe06puWPx6EMo4WOPQ5SR0nxuGmOR/Zj4QkXHRdLhbpl998803qgd7/PhxDBkyRCVzhYSEqOf79u2rerB6w4YNU9eOp02bpq5bf/jhh9i7dy/efPNNw/W34cOH4+OPP8aqVatw5MgR9Rre3t5qSFtIsJZrz/369cOhQ4fUnOn33ntPJZ21b99eo3ei5LmVkY0/j90e0XixPmt1G5O6FVzVFC190ZP/m7MLFxNuf3klIuOiaZDu0aMHpk6dqoqP1K9fHwcPHlRBWJ/4dfHixTzXiZs1a4bFixdj3rx5am7zL7/8ghUrVsDf399wzKhRo/DWW29h4MCBaNKkCW7cuKFeU4qf6IfZ5bHsf+aZZ9C4cWPs2LEDK1euVK9JhWPj8RjczMhGxbL2aFDRlW+rkanh4YRfBjdDpbIOuHTtlgrUp2JStG4WEd2llMzDunsn/TsZRpcEMrk+zaHve722cK8K1ENbVcN7bf34T8pIxSan4ZX54TgVcwOuDtZYGBKIevxSRWQ0ccHssrup6CXdzMTWU7erxjGr27iVd7bD0oHBKjBLkt/L3+xG+NkErZtFRHcwSFOhWxcZhcxsHfw8nfCEhxPfYSNXxtEGi14LQnBVN6RmZKP/gj0I+5uBmsgYMEhTkS1LyYQx0yHTshaENMFTT5TDrcxshIRGYNeZeK2bRWT2GKSpUMl0nrA7w6Ud6zKr25RIRbh5fRqp8q2yMEdI6B7sOM1ATaQlBmkqVGsOR0FSERtVLoOKZR347ppgoJ7bpxGe8SuP9KwcDFi4B9tOxWndLCKzxSBNhWrVnWUpX6zHXrSpsrWyxOxXGqJNrduB+rXv9+Kvk/kvH0tERYtBmgrN+fhUVcVKqou9EODFd9bEA/XXvRupUqJSg33g9/uw5QQDNVFxY5CmQk8Ya1bNDeWcbPnOmjgbKwvMerkh2tbxQEZ2Dgb9sA+bjueti09ERYtBmgqF1MRZyaHuEhmoZ77cEM/7e6pAPfhHBmqi4sQgTYXiWFQy/o5LVX/U2/rnv4oZmSZrSwvM6NUA7QO81Pz3IT/uZzIZkTEH6aysLGzcuBFz585FSsrter9Xr15V9bDJvIe6n6lZHs521lo3h4ogUE/vWd8w9D3wh70seEJkjEH6woULCAgIQKdOnTB06FDExd2enjF58mS8++67RdFGMnI5OTqsvrMsZSeueFWiA/VXvRqq6Vkyj1qmZ+27cE3rZhGVaAUO0rJcpKwcdf36ddjb2xv2d+nS5Z61nsk87Lt4HVeT0lTVqlZ+5bVuDhUhuZzxde+GaFHdXa1y1v+7PTh8OZHvOZGxBOnt27dj7NixsLGxybO/SpUquHLl9hxZMi/6hLG2dTxVMQwyg8pkfRsh0LcsUtKz0Gd+BI5dTda6WUQlUoGDdE5ODrKzs+/Zf/nyZTg5cTEFc5OZnYO1R6LVfdbqNh8ONlb4rn8TNKjkiqRbmegzPxynuR41kfZB+rnnnsP06dMNj0uVKqUSxiZMmIAXXnihsNtHRm7HmXhcS82Am6MNmldz07o5VIzk8kZoSCD8fZyRkJqBl78Nx7n4VH4GRFoG6WnTpmHnzp2oXbs20tLS8PLLLxuGuiV5jMyLPmGsfV0vWFlyRp+5cbG3xg+vBqllSeNS0tV61Jeu3dS6WUQlRimdVKF4hClYS5cuxaFDh1QvumHDhujdu3eeRLKSLjk5GS4uLkhKSoKzszPM0a2MbDT+eINag/jXIcFoVLms1k0ijcTfSEePuWFqrnxlNwcsGxyM8k52/DzIrCQXQVwocJDetm0bmjVrBisrq3sC965du/DUU0/BHDBIA78fjsLQxfvh42qPHf9ppS59kHkvU9pt9i5cvn5L9ayXDgyGiwPnzJP5SC6CIF3g8clWrVrh2rV750ZKo+Q5Mh+GMqD1vRmgCR7Odlj0WpCq234iOgUhoRG4mZHFd4aoOIO0dLzz6zElJCTA0dHxcdpCJkQyev86ebuQDZelJL3Kbo74YUCgula9/2KiWpQjPeve2SBE9HDyjlk/QNeuXdWtBOj+/fvD1vafVY5kStbhw4fVMDiZhz8io1V5yCc8SquhTSI9P09nLAhpgle+Dcf20/EY9tNBzHy5ARMLiYqyJy3j7LJJT1rmQ+sfy+bp6YmBAwfixx9/fJQ2kAnX6pZeNK9F090aViqDeX0aw8bSAuuPRmPM8iOqfCwRFVFPesGCBepWpltJjW4ObZuv2JQ07Po7Xt1/sZ6P1s0hI9Wihjtm9KqPNxbtx7J9l+Fsb42x7WvxSx1RUV6TlqIlDNDmTbK6pVNUv6IrKrk5aN0cMmLt/L0wuVtddX/+jnP4avMZrZtEVDJ70rn98ssv+Pnnn3Hx4kVkZGTkeW7//v2F1TYyUiu54hUVwEuNKyI5LQv/XXMMn284hTIO1ugTXIXvIVFR9KRnzJiBkJAQeHh44MCBAwgMDISbmxvOnj2L559/vqAvRybmYsJNHLyUCItSt6uMET2MAS188XbrGur++FVH1WgMERVBkP76668xb948fPXVV2olrFGjRmHDhg14++231VxpKtlWHbo9N7pZNXdWlKICGdGmBl4OqgQpnzRi6UHsOnM7r4GICjFIyxC3fqqVlAFNSUlR9/v06YOffvqpoC9HJpzVTVQQMgvgv5380a6Op5q+N/CHfYi8wi/2RIUapGW6lb7iWKVKlbB79251/9y5c2p6FpVcJ6KTcSrmhppW09bfU+vmkAmytCiF6T3rI8i3LG6kZ6H/gj24kMCVs4gKLUg/88wzWLVqlbov16ZHjBiBZ599Fj169ECXLl0K+nKYNWuWmtZlZ2eHoKAgREREPPD4ZcuWwc/PTx0fEBCAtWvX5nleviiMHz8eXl5eqqffpk0bnD59+p7X+f3339Xvk2PKlCmDzp07F7jt5pow9nTNcqqiFNGjsLO2xDf9GqOWl7NamKPvdxFqBS0iKoQgLdejP/jgA3V/6NCh+O6771CrVi189NFHmD17doFeS1bSGjlypJrWJVnh9erVQ9u2bREbG5vv8bKAR69evTBgwACVtCaBVbbIyEjDMZ999plKbpszZw7Cw8PVdDF5TVlWU+/XX39Vw/PyJUNW8pKlN2XJTbo/+fKzypDVzbnR9Hic7ayxMKQJKpa1x4WEm+i/IAIpaZl8W4nupiuAzMxM3cSJE3WXLl3SFYbAwEDd0KFDDY+zs7N13t7eukmTJuV7fPfu3XXt27fPsy8oKEg3aNAgdT8nJ0fn6empmzJliuH5xMREna2tre6nn34ynIOPj4/u22+/fay2JyUlydi+ujUHe88n6Cr/Z42u9rh1ulsZWVo3h0qIs3E3dA0/+lP92+o1L0yXlsl/W2S6koogLhSoJy3LU0pPVZalfFwyv3rfvn1qOFrPwsJCPQ4LC8v3Z2R/7uOF9JL1x8t18ejo6DzHSNlSGdbWHyM99itXrqjf1aBBAzUsLlPHcvfG6V76XnTbOp5quJKoMPi6OyI0JBCONpbY9XcCRi49hGyWDyV69OHu1q1bY+vWrXhc8fHxamEOmW+dmzyWQJsf2f+g4/W3DzpG5nOLDz/8EGPHjsWaNWvUNemnn3463yU49dLT09Vaobk3c5GVnYPfj9ye19qxPrO6qXAFVHDB3D6NYW1ZSv07m7j6KJNQiR614pj0OkePHo0jR46gUaNG95QIffHFF2HMcnJy1K1cV+/WrZuhLnmFChVUUtqgQYPy/blJkyZh4sSJMEfSw4m/kYGyjjZoUd1d6+ZQCa3z/Xn3+nh7yQF8H3YB5Z1s8eYzt4ufEJmzAgfpN954Q91+/vnn+c6DlN7xw3B3d4elpSViYmLy7JfHMs0rP7L/Qcfrb2WfDGPnPqZ+/frqvn5/7dq1Dc/LsptVq1ZVc8DvZ8yYMSrJTU960hUrVoQ5ZXW/EOAJa8sCD74QPZSO9byRcCMdH64+hql/noJbaVv0CqzEd4/MmsWj9ETvtz1sgBZSrUx64ps2bcrz2vI4ODg435+R/bmPF1LtTH+8r6+vCtS5j5FgKlne+mPkd0pQPnnypOGYzMxMnD9/HpUrV75ve+VnnJ2d82zmIC0zG38cvX2pgFndVNT6N/fF0FbV1P0PfjuCTcfzfiknMjeadoukZ/rNN99g4cKFOH78OIYMGYLU1FQ1NUr07dtX9WD1hg0bhvXr12PatGk4ceKEuq68d+9evPnmm4ae/PDhw/Hxxx+rudwyJC+v4e3tbZgHLcF18ODBatrXn3/+qYK1/F7x0ksvafI+GLMtJ2JV0QlvFzs0qlRG6+aQGXj3uZro3riCWmntzcUHVK14InP1SKtgFRYpgBIXF6eKj0hilwxJSxDWJ37J8LNkYetJOdLFixerhK/3338fNWrUwIoVK+Dv7284RmqJS6AfOHAgEhMT0aJFC/WaUvxEb8qUKSpTXeZK37p1S2V/b968WSWQUf5lQCVhzEJW1SAqYvJl+5MuAYhJTsfWU3F4NXQPlg9phiruefNfiMxBKZmHpXUjTJEMo8v0LllUpKQOfSenZaLxxxuRkZWD399ugTreLlo3icxIanoWes7bjSNXklDZzQG/DmkG99K2WjeLqFjjArOA6L7+PBqjAnS1co6o7VUyv4iQ8XK0tcJ3/f+pSjYgdA9uZjx+jQYiU8IgTfe18uAVQ8KYDEESFbdyTraq2EkZB2scupykrlHLvH0ic1HgIH13QQ/9JktWShUxKhlkwQOZHy24LCVpqVq50vi2XxPYWllg84lYjFsZyWInZDYKHKRdXV1VgtXdm+yXFaVkGpNkTuuLhpBpWnskSpVnrFfBhQk7pLlGlctgRq8GkNzFnyIu4avNZ7RuEpFxBunQ0FA1pUmyqyWzWja57+Pjo1bBkqxqWYXq008/LZoWU/FmdddjGVAyDlI3fuKLddT9zzecws97L2ndJCLjm4Ilc5plnnL37t0N+zp27KjWdp47d64qJFKpUiV88sknKniT6bl07Sb2XbgOuQzNIE3GpE9wFVxNSsPsv/7GmOVHVPnQp2uW17pZRMbTk5Y1nWX1qLvJPv1KUzI3+UElNsm4rT58uxfd1NcNHs7/zC8nMgaj2tZElwY+6nLMG4v248jlJK2bRGQ8QVrqVc+fP/+e/bJPX8s6ISGBhUFKwLKUnbjiFRkhmWkwuVtdtdjLzYxshITuUaM/RCVRgYe7p06dqspnrlu3Dk2aNFH7pDSnlOn85Zdf1OM9e/aoamJkek5Gp+BEdIpaNvB5/38WKSEyJjZWFpj9SkN0n7sbx6OS0e+7CPwypJlaqY3IrHvSshSlBGRZslLWX5ZN7su+Dh06qGOkFnZ+q2SR8Vt16Pbc6JZPlIeLg7XWzSG6Lyc7a4SGNIGPqz3OxqfitYV7cCvj4Rf5ITIFLAv6iEpiWVCpEPvUlC24dO2Wmu7C+dFkCk7HpKDb7F1ITsvCc7U9MPuVRrBknXkqIXHhkRbYkIUrIiIiEBsbe898aFl1ikzTgUuJKkA72FiiTS1mzJJpqOHhpIqdvDI/HH8ei8GHq47io051WCWPSoQCB+nVq1ejd+/euHHjhvqmkLtcpNxnkDb9hLFna3vAwUbTBdKICiTQtyym96iPoYv344fdF+Dlaoc3nq7Od5HM75r0O++8g1dffVUFaelRX79+3bDJ9WkyTVIPec3hKHWfWd1kil4I8MK49rXV/c/Wn8RvBy5r3SSi4g/SV65cwdtvvw0HB4fH/+1kNHafvYb4G+lwdbBGi+rltG4O0SN5tYUvXn/SV91/b9lh7Dgdz3eSzCtIt23bVk25opK54pX0RmR6C5GpGvN8LVUpLytHh8E/7sPRqyx2QqarwBce27dvj/feew/Hjh1TpUCtra3vmaJFpiUtMxvrj0ar+8zoJlNnYVEKU1+qi7iUNDVCFLJgD5a/0QwVynD0j8xgCpaFxf17WZI4lp1tHvMUS9IUrPWR0arH4elsh12jn1F/5IhMXdKtTHSfE4aTMSmoVs4Rvw5pBlcHFjsh04oLBR7XlClX99vMJUCXNKsNK155MUBTieFib43QV5vAy8UOf8dJsZO9atSIyJTw4qOZS0nLxMbjMep+p/o+WjeHqFB5udgjNCQQTnZW2HvhOoYvOagW5iAqUdekZX1oWSfazs5O3X8Qyfwm07HhWAzSs3JQ1d0RdbxNe9ieKD81PZ3wTd/G6Ds/QuVe/HfNMUzoWJvFTqjkXJP29fVVGd1ubm7q/n1frFQpnD17FuagpFyTloUJtp6Kw/A2NTC8zRNaN4eoyKw5fBVvLj6g7o953g+DWlbju00loyzouXPn8r1Ppi3hRjp2nLk9j5RZ3VTSdajrjeikNHz8+3FMWncCni52vMRDRo/XpM3Y2iNR6vpcgI8LqpYrrXVziIrca09WxYAWt0cD3112CDvvfEklKjHzpCWDOzQ0FJs2bcp3gY3NmzcXZvuoCK26k9XNXjSZkw9eqIXo5DT8fjgKg37Yh58HBaM28zGopATpYcOGqSAtRU38/f2ZfGGiriTewp7z1yHro3So56V1c4iKjdQB+Lx7PcSnpCP83DWEhEZg+RvN1brURCYfpJcsWYKff/4ZL7zwQtG0iIp1bnRglbJqmgqRObG1ssS8vo3x0pxdOBVzQyVQ/jI4mMVOyPSvSdvY2KB6dS4BV1KWpeTcaDLrYichgarS3pnYGxj4/T4WO6GSsVTll19+iQJWEyUjciY2BceikmFlUQrP+3tq3RwizXi72quqZE62Vog4fw0jf2axEzLx4e4dO3Zgy5YtWLduHerUqXPPAhvLly8vzPZREfaiWz5RDmUcWcuYzJufpzPm9m2E/t/twdoj0SjvxGInZMJB2tXVFV26dCma1lCRkxGQlfqs7vrefMeJADSr5o6p3evh7Z8OIHTXeXi72mHgUyx2QiY23J2VlYVWrVph0qRJWLBgQb7bo5g1axaqVKmiyo4GBQUhIiLigccvW7YMfn5+6nhZLnPt2rX3BKLx48fDy8sL9vb2aNOmDU6fPp3va6Wnp6N+/foqS/3gwYMo6Q5fTsKFhJuwt7ZEm1oeWjeHyGjIVMSx7Wup+/9be8KwxjqRyQRpKysrDB48WAW2wrJ06VKMHDkSEyZMwP79+1GvXj20bdtWzcHOz65du9CrVy8MGDAABw4cQOfOndUWGRlpOOazzz5TNcbnzJmD8PBwODo6qtdMS0u75/VGjRoFb2/z6VGuvDPU3aa2BxxtCzyQQlTii5282vyfYie7WOyEtKYroJYtW+p+++03XWEJDAzUDR061PA4Oztb5+3trZs0aVK+x3fv3l3Xvn37PPuCgoJ0gwYNUvdzcnJ0np6euilTphieT0xM1Nna2up++umnPD+3du1anZ+fn+7o0aOSBac7cODAQ7c7KSlJ/Yzcmoqs7Bxdk4836Cr/Z41uw9ForZtDZJSys3N0byzap/6f+I9frzt21XT+j5O2iiIuFDi7+4033lAZ3jNnzkRYWBgOHz6cZyuIjIwM7Nu3Tw1H61lYWKjH8tr5kf25jxfSS9YfL7XFo6Oj8xwjBc9lGD33a8bExOD111/HDz/8AAcHh39tq4weSPH03JupCT+bgNiUdDX15KknymndHCKjLXYy7aV6CPQti5T0LPRfEKGK/xBpocDjnT179rxnSUq5nivXgeVWyoY+rPj4eHW8h0fea6Py+MSJE/n+jATg/I6X/frn9fvud4y0tX///mrovnHjxjh//vy/tlWuw0+cOBEloQyoTLuysWLZdqL7sbO2xDd9GuOlubeLnfSZH45fBjdDWc6GIGMP0iVhFayvvvoKKSkpGDNmzEP/jBwr1871pCddsWJFmIr0rGy1oIZgVjfRv3NxsMbCVwPR7etdOBuXipAFEVj8elPmcpBxB+nKlSsX2i93d3eHpaWlGnrOTR57euZfZEP2P+h4/a3sk+zu3MdIFrd+ERAZ+ra1tc3zOtKr7t27NxYuXHjP75Vj7z7elGw7FY/ktCyUd7JFkK+b1s0hMglSMvf7AUGqfOihy0kY/OM+zO/XhCNRVGweeczz2LFjWL9+PVatWpVnK2iJ0UaNGqkVtfRkVS15HBwcnO/PyP7cx4sNGzYYjvf19VWBOvcx0uuVLG/9MZL5fejQITXlSjb9FC7JNP/kk09QEumnk3Ss5w1Li1JaN4fIZFQvXxoLQgLhYGOJ7afjVVWynBxWXCQj7UmfPXtWFTM5cuSI4Vq0kPuiINekhQwh9+vXT/ViAwMDMX36dKSmpiIkJEQ937dvX/j4+KhrwvpVuFq2bIlp06aplbhkwY+9e/di3rx5hnYMHz4cH3/8MWrUqKGC9rhx49Q0K5mqJSpVqpSnDaVL315LuVq1aqhQoQJKmtT0LGw8fnv0gctSEhVc/YqumNunEV4N3YM1h6Pg5miDD1+sw1UAyfh60hIkJfDJPGbJij569Ci2bdumguxff/1V4Ab06NEDU6dOVcVHZDhaerbSQ9cnfl28eBFRUbevpYpmzZph8eLFKijLnOpffvkFK1asUMtm5p77/NZbb2HgwIFo0qQJbty4oV5Tip+Yow3HYpCWmYMqbg6oW8FF6+YQmaQna5TDtO5S+AhYGHYBX20+o3WTyAyUknlYBb2OLNd069atq6Y2SXWwmjVrqn0yNUsKjJgDGUKX809KSoKzszOMmXz733wiFm8/Ux0jn6updXOITFroznP4cPUxdf/jzv54pWnh5emQaUsugrhQ4J60DGc7OTkZAvbVq1cNCWUnT54slEZR4bmemoFtp+LUfWZ1Ez2+/s191RdeMW5lpGHWBJFRXJOWYWVJupIhbykQIiU4JQFMhp+rVq1aJI2kR7c2MgpZOTrU9nJG9fK3v1wR0eMZ8ewTiE/NwOLwixi+5KAqENS8ujvfVip0Be5Jjx07VmVgi48++kjNm37yySdVhrRkTZNx1uruxBWviAqNJKj+t5O/KgyUkZ2Dgd/vxZHLSXyHSftr0vm5du0aypQpY1aZjqZwTfpq4i00n7wZ8gnvHP0MfFzttW4SUYkiRYJCFuzBrr8TVMb3ssHBqFru9mwRMj/JxnBNWu/MmTP4448/cOvWLZQtW7ZQGkOFa83hqypAB1YpywBNVARsrSzV1Cx/H2ckpGagz/wI9eWYqLAUOEgnJCSgdevWeOKJJ/DCCy8YpkfJ0pGS3U3GV6u7I4e6iYqMk501QkMC4evuqBbieOXbcMTfKLzlfMm8FThIjxgxAtbW1mr+cu7Vo2S+s8xFJuPwd9wNRF5JhpVFKbQP+Kc8KhEVPvfStvjxtSB4u9jhbHyq6lEn3czkW03FH6T//PNPTJ48+Z7KXFLd68KFC4/fIioUq+4kjLWo4c6Ve4iKgeR8LHq9qQrYx6OSERIaoar9ERVrkJaSnfmtvyzJY6a8AEVJIrmA+qFuZnUTFR8Z8v5hQCCc7ayw/2IiBv6wF2mZBSuVTPRYQVqmW33//feGx5LRLVOyZL50q1atCvpyVARkmPtcfCpsrSzwbO38VxMjoqJRy8tZLXEpC3LsPJOAt346gMzs29NWiYq8mIkEY0kck0UtMjIyVJ1sqd8tPemdO3cWuAFUdCtetantgdK2Bf6IiegxNahUBt/2a4z+C/ao2vnvLjuEL7rXhwVXoKOi7klLxbFTp06hRYsW6NSpkxr+7tq1q6rZLatIkbayc3RYffj2UDdXvCLSTrNq7pjdu6FK3pSiQlJCtBDKUpCZeaRulkzW/uCDD/Lsu3z5slp1Sr9kJGkj4tw1xCSnw8nOCk/XLMePgUhDrWt54PMe9TFsyQEsCr8IR1srjHnez6wKP5FGxUzymz89f/78wno5ekT6hDEpVyiFFohIWzKiNalLgLo/b9tZTP3zJHvUVPxBmrSXkZVjWJGnU30frZtDRHf0DKyEDzvWVvdnbfkb0zee5ntDD4VBugTZfjoOSbcyUc7JFk2rumndHCK6a4nLse1rqftfbjqNmZsZqOnfMUiXwBWvOtT1giWzSImMzmtPVsV/2vmp+1P/PIU5W//WuklUUhLHJIP7QRITEwujPfSIbmZkqakeglndRMZryNPVkJWdg2kbTuHTdSdU9rcEb6LHCtKS0f1vz/ft2/dhX44KmQToW5nZqFTWAfUruvL9JTJib7WugawcnRr2/vj347C2tEC/ZlW0bhaZcpBesGBB0baEHsvqO1nd0ovm9A4i4ze8jQTqHJVINmHVUVhZlkLvoMpaN4uMDK9JlwCJNzOw9VScus9a3USmQb5Mv/tcTQx66vZQ9we/RWJJxEWtm0VGhkG6BFgXGY3MbB38PJ1Qw8NJ6+YQUQEC9ejn/fBqc1/1ePTyI/gh7DzfPzJgkC5Btbo5N5rINAP1uA61MKDF7UA9buVRzN9xTutmkZFgkDZx0UlpCD93Td3vWM9L6+YQ0SMGaplDLZnf4r9rjmH2X5yeRQzSJm/N4auQmv2NK5dBhTL3rvNNRKYTqEe1rYlhrWuox5PXn8CMTSx4Yu7Yky4htbpfrO+tdVOIqBAC9Yhnn8B7bWuqx59vOIWpf7DWtzljkDZh5+JTcfhykqou9kIAh7qJSoqhraobSojO3HIGk9ad4KIcZopB2oStulMGtHl1d7iXttW6OURUiKQK2cQX6xhWz5q4+hgDtRlikDZRsnj8ykO3s7pZBpSoZJIqZP/rEgBZfjp013mMWX4E2Tk6rZtFxYhB2kQdvZqMs3GpsLGyQNs6Hlo3h4iKyMtBlfBZt7qQNXOW7LmEoYv2Iy0zm++3mTCKID1r1ixUqVIFdnZ2CAoKQkRExAOPX7ZsGfz8/NTxAQEBWLt27T29zPHjx8PLywv29vZo06YNTp/+J0vy/PnzGDBgAHx9fdXz1apVw4QJE5CRkQFTKwPa2q88nOystW4OERWhlxpXxNe9G8LG0gLrj0YjZMEepKRl8j03A5oH6aVLl2LkyJEqSO7fvx/16tVD27ZtERsbm+/xu3btQq9evVSQPXDgADp37qy2yMhIwzGfffYZZsyYgTlz5iA8PByOjo7qNdPS0tTzJ06cQE5ODubOnYujR4/iiy++UMe+//77MAU5OTpDVjfLgBKZh3b+XggNaQJHG0uEnU3Ay9+EI/5GutbNoiJWSifdTg1Jz7lJkyaYOXOmeizBs2LFinjrrbcwevToe47v0aMHUlNTsWbNGsO+pk2bon79+irQyul4e3vjnXfewbvvvqueT0pKgoeHB0JDQ9GzZ8982zFlyhTMnj0bZ8+efah2Jycnq5W/5LWdnZ1RnCLOXUP3uWFwsrXCnrFtYGdtWay/n4i0c+RyEvotiMC11AxUdXfE9wMCWSPBSBRFXNC0Jy3Dy/v27VPD0YYGWViox2FhYfn+jOzPfbyQXrL++HPnziE6OjrPMfKmyZeB+72mkDe1bNmy930+PT1dfQC5N63LgLb192SAJjIzARVc8MvgYPi42uNsfCr+b3YYTsWkaN0sKiKaBun4+HhkZ2erXm5u8lgCbX5k/4OO198W5DXPnDmDr776CoMGDbpvWydNmqSCvX6T3r4WMrNzsPZIlLrPrG4i81S1XGn8MiQYNcqXRnRyGl6aE4b9F69r3SwqidektXblyhW0a9cOL730El5//fX7HjdmzBjV29Zvly5dghZ2nI7H9ZuZcC9tg2bV3DRpAxFpz8vFHj8PCkaDSq5IupWJ3t+EY8vJ/HN5yHRpGqTd3d1haWmJmJiYPPvlsaenZ74/I/sfdLz+9mFe8+rVq2jVqhWaNWuGefPmPbCttra26hpD7k0L+oSx9gFesLI0++9YRGatjKMNFr0WhKeeKIdbmdl4beFeLA7nmtQliaZ/5W1sbNCoUSNs2rTJsE8Sx+RxcHBwvj8j+3MfLzZs2GA4XqZVSTDOfYxcP5Ys79yvKT3op59+Wv3+BQsWqGvhxu5WRjb+OHp7yP7F+j5aN4eIjICDjRW+7dsY3RpWUIVO3v/tiFqcQ2aBkOmz0roBMv2qX79+aNy4MQIDAzF9+nSVvR0SEqKe79u3L3x8fNQ1YTFs2DC0bNkS06ZNQ/v27bFkyRLs3bvX0BOWAvXDhw/Hxx9/jBo1aqigPW7cOJXxLVO1cgfoypUrY+rUqYiLizO05349eGOw6UQMbmZko0IZezSs5Kp1c4jISEhRo6kv1UWlsg74YuMptczlpWs3MfWlekwuNXGaB2mZUiVBUoqPSGKXTKVav369IfHr4sWLeXq5MjS9ePFijB07Vs1rlkC8YsUK+Pv7G44ZNWqUCvQDBw5EYmIiWrRooV5Tip/oe96SLCZbhQoV8rRH4xlpD7TyTq1uSRiTLyNERHryN2FYmxrqS/zo5Yex5nCUWm/+m76N1bA4mSbN50mbquKeJ510MxNNPtmIjOwcrB/+JPw8tbkmTkTGb9eZeAz6cR9S0rJQxc0B3/ZrjOrlnbRuVomXXNLmSdPDW380SgXomh5ODNBE9EDNqrtj+ZBmqld9PuEmuszaxcxvE8UgbSL0Wd0v1vfWuilEZAJqeDhh5dDmCKxSFinpWRgQugffbDtr1Jf06F4M0iYgNjkNu/5OUPdZwISIHpZbaVv8+FoQejapCEn2/mTtcby77DBX0TIhDNImQBJA5MuvFC2oWNZB6+YQkYllfk/qGoAPO9aGpUUp/Lr/Mnp9sxsxybcXHCLjxiBtAlbqV7yqx6FuInq0zO/+zX3VKlrOdlY4cDER7WdsR9idEToyXgzSRu5CQioOXUpUC763r8sgTUSP7ska5bDqzRbw83RC/I0MvDI/HHO3/s3r1EaMQdrIrbozN7p5dXeUc7LVujlEZOKquDvitzeao2sDH1WhbNK6Exjy436kpGVq3TTKB4O0EZMsTP1Qd0cOdRNRIbG3scS07vXw387+sLYshfVHo9Fp5k6ciNZuCV7KH4O0ETselYIzsTdU4kc7f+MtV0pEpnmduk/TymolLS8XO7U29Yszd+KHsPMc/jYiDNImMDe6Vc1ycLaz1ro5RFQCNahUBmveaqH+zmRk5WDcyqMY9MM+JN7M0LppxCBtvGQFm9X6rG6ueEVERTyf+rv+TTCuQ201/P3nsRg8/+V2hJ9l9rfW2JM2UvsvXseVxFsobWuFZ/zKa90cIjKD4e8BLXxVUpmvuyOiktLUfOppf55UPWzSBoO0ka949VwdDy41R0TFxt/HRQ1/y/rUUqXsq81n0HnWThyPYlKZFhikjVBmdg7WHolS91kGlIiKm6Otlcr+nvlyA5RxsMaxqGS8OHMHZm05g6xs9qqLE4O0Edp5Jh4JqRlwc7RR86OJiLTQoa43/hzREs/W9kBmtg5T/jiJbnPCcCY2hR9IMWGQNuKs7hcCvGBtyY+IiLQjRZTm9WmEz7vXg5OdlaqA+MKXOzB94ymkZ2XzoylijABGJi0zG39ERqv7nbgsJREZSVJZ14YVsGFESzwtU7WyczB942mVAc7630WLQdrIbD4Ri9SMbPi42qNhpTJaN4eIyMDTxQ4L+jdR16qlh302LlVlgL/z8yFcS+W86qLAIG1kVh68YigDaiGrahARGVmvWq5VbxzZUlUsK1UKavnLZ6b9he/DzjOxrJAxSBuR5LRMbDkZp+4zq5uIjJmLvbWq/b18SDO1qlbizUyMX3kU7b7cjr9OxmrdvBKDQdqIyLVoKRpQvXxp1PJy0ro5REQPXVZUArZM15L1Bvov2IN+30XgdAyzwB8Xg7QRZnV3quethpSIiEyBlaWFGvr+671WeP1JX1VadOupONWrHv3rYVy+flPrJposBmkjEZeSruZHCy5LSUSmOgT+QfvaKgv8udoear3qJXsuodXUv/DBb0cQlXRL6yaaHAZpI/H74auqBF+9iq5qUXYiIlMlf8Pm9W2MX4cEo0V1d1UIZVH4RbT87C98uOooYpLTtG6iyWCQNrKhbiaMEVFJ0ahyWfz4WhCWDmyKIN+yan516K7zaDF5s5q2xXrg/45B2ghcunYT+y8mqqkMHet6ad0cIqJCFVTVDUsGNsXi14IQWKWs6lnLtC0phtJnfri6fq3T6fiu58Mqv52kTS86uKobyjvb8e0nohJHkmGbVXdX24GL1/Ht9nNYFxmF7afj1VajfGn0DKyELg18UNbRRuvmGo1SOn59eSTJyclwcXFBUlISnJ2dH+tDaPvFNpyMScHkbgHo0aTSY70WEZEpjSJ+t/Mclu65hJsZt+uA21ha4Nk6HujRuKK6nm1KRZ0KMy7oMUhr/GGciE5Gu+nb1ZSFvR88CxcH60d+LSIiU5R0KxOrDl5RmeBHr/6zbrWPqz061PVCO39P1K/oavRTU4siSHO4W2OrDt4e6n66ZnkGaCIy26lbfYKrqC3yShJ+3nsJKw5cwZXEW5i77azavF3s0NbfE8/7e6FR5TKwNKEetsknjs2aNQtVqlSBnZ0dgoKCEBER8cDjly1bBj8/P3V8QEAA1q5dm+d5GcEfP348vLy8YG9vjzZt2uD06dN5jrl27Rp69+6tvu24urpiwIABuHHjBoqTtJNZ3URE//D3ccFHnfwR8UEbzHq5oepJO9pY4mpSGhbsPI/uc8PQ8L8bMOiHvapWuFQ4K8lXbTUf7l66dCn69u2LOXPmqAA9ffp0FYRPnjyJ8uXL33P8rl278NRTT2HSpEno0KEDFi9ejMmTJ2P//v3w9/dXx8hjeX7hwoXw9fXFuHHjcOTIERw7dkwFdvH8888jKioKc+fORWZmJkJCQtCkSRP1esU1rLHvwnV0m70LDjaW2Df2WdjbWD7S6xARlfQlfLedisP6yGhsOB6DlLSsPM97ONsiyNcNAT4uqOPjjDreLqp3XtxK5DVpCcwSHGfOnKke5+TkoGLFinjrrbcwevToe47v0aMHUlNTsWbNGsO+pk2bon79+irQy+l4e3vjnXfewbvvvquelzfMw8MDoaGh6NmzJ44fP47atWtjz549aNy4sTpm/fr1eOGFF3D58mX188XxYcikfpkz2Lm+N6b3bPBIr0FEZE6ysnNw+EqSWsdaqjTuvXBdrXlwt8puDqjt5YzKbo6oVNbBsHm52sHasmgGkUvcNemMjAzs27cPY8aMMeyzsLBQw9NhYWH5/ozsHzlyZJ59bdu2xYoVK9T9c+fOITo6Wr2Gnrxp8mVAflaCtNzKELc+QAs5Xn53eHg4unTpguIQXM0NFxJS0bmBT7H8PiKiklAnvGGlMmob2qq66mXLqKRM64q8kowjV5LUtewLCTfVdje5lC29bP3mfGdTq3p18je6a92aBun4+HhkZ2erXm5u8vjEiRP5/owE4PyOl/365/X7HnTM3UPpVlZWKFu2rOGYu6Wnp6st9zemx9W2jqfaiIjo0dhZW6J5dXe16V1PzVBZ4jJ7RqZ5Xbx2E5eu31L307NycP1mptpys7GywP+6BBjdx8Ds7ock17gnTpxYtJ8GERE9tjKONmhRw11tueXk6BCfmq7WvpZpX0l3bpPTMpGZfe+QOcw9SLu7u8PS0hIxMTF59stjT8/8e5iy/0HH629ln2R35z5Grlvrj4mNzbsoeVZWlsr4vt/vlSH53MPs0pOWa+dERGQaLCxKobyTndpMhaZTsGxsbNCoUSNs2rTJsE8Sx+RxcHBwvj8j+3MfLzZs2GA4XrK5JdDmPkYCqlxr1h8jt4mJiep6uN7mzZvV75Zr1/mxtbVViQC5NyIioiKl09iSJUt0tra2utDQUN2xY8d0AwcO1Lm6uuqio6PV83369NGNHj3acPzOnTt1VlZWuqlTp+qOHz+umzBhgs7a2lp35MgRwzGffvqpeo2VK1fqDh8+rOvUqZPO19dXd+vWLcMx7dq10zVo0EAXHh6u27Fjh65GjRq6Xr16PXS7k5KSJCte3RIRESUVQVzQ/Jq0TKmKi4tTxUckaUuGpGU6lD7x6+LFiyrrWq9Zs2ZqLvPYsWPx/vvvo0aNGiqzWz9HWowaNUpN0xo4cKDqMbdo0UK9pn6OtFi0aBHefPNNtG7dWr1+t27dMGPGjGI+eyIiIiOeJ22qimI+HBERma7kIogLRlEWlIiIiO7FIE1ERGSkGKSJiIiMlOaJY6ZKfym/MCqPERGR6Uu+Ew8KM9WLQfoRpaSkqFsWNCEiorvjgySQFQZmdz8iKXxy9epVODk5oVSpRyvIrq9adunSpRKTIc5zMg38nEwDPyfT+pxkyrDEA1lJMffU4cfBnvQjkg+gQoUKhfIhlMQKZjwn08DPyTTwczIN0nsu7L/lTBwjIiIyUgzSRERERopBWkOyaMeECRPUbUnBczIN/JxMAz8n01CUnxMTx4iIiIwUe9JERERGikGaiIjISDFIExERGSkGaQ3NmjULVapUUetcBwUFISIiAqZi0qRJaNKkiSrmUr58eXTu3BknT57Mc0xaWhqGDh0KNzc3lC5dWq3ZHRMTA1Pw6aefqqIEw4cPN+nzuXLlCl555RXVZnt7ewQEBGDv3r2G56V8oazl7uXlpZ5v06YNTp8+DWOVnZ2NcePGwdfXV7W3WrVq+O9//5unDKOxn9O2bdvQsWNHVfBC/o2tWLEiz/MP0/5r166hd+/eak6uq6srBgwYgBs3bsAYzykzMxP/+c9/1L89R0dHdUzfvn1VMShTPae7DR48WB0zffr0Qj8nBmmNLF26FCNHjlQZgfv370e9evXQtm1bxMbGwhRs3bpVBazdu3djw4YN6j/ic889h9TUVMMxI0aMwOrVq7Fs2TJ1vPyn7Nq1K4zdnj17MHfuXNStWzfPflM7n+vXr6N58+awtrbGunXrcOzYMUybNg1lypQxHPPZZ59hxowZmDNnDsLDw9UfUfl3KF9IjNHkyZMxe/ZszJw5E8ePH1eP5Ry++uorkzkn+T8i/9/lS3p+Hqb98of/6NGj6v/emjVrVEAZOHAgjPGcbt68qf7GyZcruV2+fLn6Qv/iiy/mOc6Uzim33377Tf0dlGB+t0I5Jx1pIjAwUDd06FDD4+zsbJ23t7du0qRJJvmJxMbGSldGt3XrVvU4MTFRZ21trVu2bJnhmOPHj6tjwsLCdMYqJSVFV6NGDd2GDRt0LVu21A0bNsxkz+c///mPrkWLFvd9PicnR+fp6ambMmWKYZ+cp62tre6nn37SGaP27dvrXn311Tz7unbtquvdu7dJnpP8+/ntt98Mjx+m/ceOHVM/t2fPHsMx69at05UqVUp35coVnbGdU34iIiLUcRcuXDDpc7p8+bLOx8dHFxkZqatcubLuiy++MDxXWOfEnrQGMjIysG/fPjWMlbvMqDwOCwuDKUpKSlK3ZcuWVbdyftK7zn2Ofn5+qFSpklGfo4wOtG/fPk+7TfV8Vq1ahcaNG+Oll15SlyQaNGiAb775xvD8uXPnEB0dneecpKyhXHox1nNq1qwZNm3ahFOnTqnHhw4dwo4dO/D888+b7Dnl9jDtl1sZOpXPVk+Ol78h0vM2lb8XMjws52Gq55STk4M+ffrgvffeQ506de55vrDOibW7NRAfH6+urXl4eOTZL49PnDgBUyP/WOXarQyt+vv7q33yh8bGxsbwnzD3OcpzxmjJkiVqOE6Gu+9miudz9uxZNTQsl1Xef/99dV5vv/22Oo9+/foZ2p3fv0NjPafRo0erxQzkC5KlpaX6f/TJJ5+oYUVhiueU28O0X27lS1duVlZW6guyKZyjDNvLNepevXoZ6lyb4jlNnjxZtVH+T+WnsM6JQZoKpfcZGRmpejSmSlYiGzZsmLp2JIl8JYF8eZJv8f/73//UY+lJy+ck1zolSJuin3/+GYsWLcLixYtV7+XgwYPqC6JcDzTVczInMhrVvXt3lRwnXyBN1b59+/Dll1+qL/WPugriw+Jwtwbc3d1VL+DuzGB57OnpCVPy5ptvqoSILVu25FkVTM5DhvUTExNN4hzlP50k7TVs2FB925VNksMkgUfuS0/GlM5HSHZw7dq18+yrVauWWk5P6NttSv8OZWhRetM9e/ZU2cIy3CgJfTLbwFTPKbeHab/c3p1gmpWVpTKJjfkc9QH6woUL6stw7tWiTO2ctm/frtorl7v0fy/kvN555x01Y6cwz4lBWgMy3NioUSN1bS13r0ceBwcHwxTIN2EJ0JLZuHnzZjUlJjc5P8kqzn2OktEpAcIYz7F169Y4cuSI6pnpN+mFyjCq/r4pnY+Qyw93T4uTa7mVK1dW9+Uzkz8Wuc9JhpLlepmxnpNkCt+9Tq984ZX/P6Z6Trk9TPvlVr4syhdLPfk/KO+BXLs25gAtU8k2btyopgTmZmrn1KdPHxw+fDjP3wsZzZEvkX/88UfhnlMhJL7RI1iyZInK2AwNDVVZgAMHDtS5urrqoqOjTeL9HDJkiM7FxUX3119/6aKiogzbzZs3DccMHjxYV6lSJd3mzZt1e/fu1QUHB6vNVOTO7jbF85EMWisrK90nn3yiO336tG7RokU6BwcH3Y8//mg45tNPP1X/7lauXKk7fPiwrlOnTjpfX1/drVu3dMaoX79+Kpt2zZo1unPnzumWL1+uc3d3140aNcpkzklmEBw4cEBt8if4888/V/f1mc4P0/527drpGjRooAsPD9ft2LFDzUjo1auXUZ5TRkaG7sUXX9RVqFBBd/DgwTx/L9LT003ynPJzd3Z3YZ0Tg7SGvvrqK/VH38bGRk3J2r17t85UyD/a/LYFCxYYjpE/Km+88YauTJkyKjh06dJF/cc01SBtiuezevVqnb+/v/pC6Ofnp5s3b16e52XKz7hx43QeHh7qmNatW+tOnjypM1bJycnqM5H/N3Z2drqqVavqPvjggzx/7I39nLZs2ZLv/x35AvKw7U9ISFB/7EuXLq1zdnbWhYSEqKBijOckX6bu9/dCfs4Uz+lhg3RhnBNXwSIiIjJSvCZNRERkpBikiYiIjBSDNBERkZFikCYiIjJSDNJERERGikGaiIjISDFIExERGSkGaSIiIiPFIE1ERGSkGKSJyCAuLg5DhgxRq/vY2tqqxR7atm2LnTt3qudlWb4VK1bwHSMqJlxPmogMunXrppbkXLhwIapWraqWSJQVmRISEvguEWmAtbuJSJFl9cqUKYO//voLLVu2vOddkXVyZc1cPVny8vz58+r+ypUrMXHiRBw7dkwt2devXz988MEHap1d9YemVCl8/fXXWLVqlXp9Wev6s88+w//93//x3Sd6AA53E5FSunRptclwdnp6+j3vyp49e9TtggULEBUVZXi8fft29O3bF8OGDVNBeu7cuQgNDcUnn3yS5+fHjRuneuqHDh1S63T37NkTx48f57tP9ADsSRORwa+//orXX38dt27dQsOGDVWPWoJp3bp1b//BKFUKv/32Gzp37mz4mTZt2qB169YYM2aMYd+PP/6IUaNG4erVq4afGzx4MGbPnm04pmnTpup3SA+biPLHnjQRGUhPVwKrDEu3a9dODU1LIJWe8f1Iz/ijjz4y9MRlk0Avve2bN28ajgsODs7zc/KYPWmiB2PiGBHlYWdnh2effVZtMkT92muvYcKECejfv3++79SNGzfU9eiuXbvm+1pE9OjYkyaiB6pduzZSU1PVfWtra2RnZ+d5XnraJ0+eRPXq1e/ZLCz++ROze/fuPD8nj2vVqsV3n+gB2JMmIkWmWb300kt49dVX1TVoJycn7N27V2Vhd+rUyZDhLVOymjdvruZRSzb4+PHj0aFDBzW3WrK1JTDLEHhkZCQ+/vhjw7u7bNkyNG7cGC1atMCiRYsQERGB+fPn890negAmjhGRIhndH374If7880/8/fffyMzMRMWKFVXgfv/992Fvb4/Vq1dj5MiRauqVj4+PYQrWH3/8oa5LHzhwQPW2/fz81DC5XJtWf2hKlcKsWbNU5vi2bdvUFKzJkyeje/fufPeJHoBBmoiKXH5Z4UT073hNmoiIyEgxSBMRERkpJo4RUZHT6XR8l4keAXvSRERERopBmoiIyEgxSBMRERkpBmkiIiIjxSBNRERkpBikiYiIjBSDNBERkZFikCYiIjJSDNJEREQwTv8P1+nZhWjE2FcAAAAASUVORK5CYII=",
      "text/plain": [
       "<Figure size 500x300 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.figure(figsize=(5, 3))\n",
    "plt.ylabel(\"Learning rate\")\n",
    "plt.xlabel(\"Step\")\n",
    "plt.plot(range(total_training_steps), track_lrs)\n",
    "plt.tight_layout(); plt.savefig(\"2.pdf\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7512808-b48d-4146-86a1-5931b1e3aec1",
   "metadata": {},
   "source": [
    "## D.3 Gradient clipping"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0a74f76-8d2b-4974-a03c-d645445cdc21",
   "metadata": {},
   "source": [
    "- Gradient clipping is yet another technique used to stabilize the training when training LLMs\n",
    "- By setting a threshold, gradients exceeding this limit are scaled down to a maximum magnitude to ensure that the updates to the model's parameters during backpropagation remain within a manageable range\n",
    "- For instance, using the `max_norm=1.0` setting in PyTorch's `clip_grad_norm_` method means that the norm of the gradients is clipped such that their maximum norm does not exceed 1.0\n",
    "- the \"norm\" refers to a measure of the gradient vector's length (or magnitude) in the parameter space of the model\n",
    "- Specifically, it's the L2 norm, also known as the Euclidean norm\n",
    "- Mathematically, for a vector $\\mathbf{v}$ with components $\\mathbf{v} = [v_1, v_2, \\ldots, v_n]$, the L2 norm is defined as:\n",
    "$$\n",
    "\\| \\mathbf{v} \\|_2 = \\sqrt{v_1^2 + v_2^2 + \\ldots + v_n^2}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d44838a6-4322-47b2-a935-c00d3a88355f",
   "metadata": {},
   "source": [
    "- The L2 norm is calculated similarly for matrices.\n",
    "- Let's assume our gradient matrix is:\n",
    "$$\n",
    "G = \\begin{bmatrix}\n",
    "1 & 2 \\\\\n",
    "2 & 4\n",
    "\\end{bmatrix}\n",
    "$$\n",
    "\n",
    "- And we want to clip these gradients with a `max_norm` of 1.\n",
    "\n",
    "- First, we calculate the L2 norm of these gradients:\n",
    "$$\n",
    "\\|G\\|_2 = \\sqrt{1^2 + 2^2 + 2^2 + 4^2} = \\sqrt{25} = 5\n",
    "$$\n",
    "\n",
    "- Since $\\|G\\|_2 = 5$ is greater than our `max_norm` of 1, we need to scale down the gradients so that their norm is exactly 1. The scaling factor is calculated as $\\frac{max\\_norm}{\\|G\\|_2} = \\frac{1}{5}$.\n",
    "\n",
    "- Therefore, the scaled gradient matrix $G'$ will be as follows:\n",
    "$$\n",
    "G' = \\frac{1}{5} \\times G = \\begin{bmatrix}\n",
    "\\frac{1}{5} & \\frac{2}{5} \\\\\n",
    "\\frac{2}{5} & \\frac{4}{5}\n",
    "\\end{bmatrix}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eeb0c3c1-2cff-46f5-8127-24412184428c",
   "metadata": {},
   "source": [
    "- Let's see this in action\n",
    "- First, we initialize a new model and calculate the loss for a training batch like we would do in the regular training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "e199e1ff-58c4-413a-855e-5edbe9292649",
   "metadata": {},
   "outputs": [],
   "source": [
    "from previous_chapters import calc_loss_batch\n",
    "# Alternatively:\n",
    "# from llms_from_scratch.ch05 import calc_loss_batch\n",
    "\n",
    "\n",
    "torch.manual_seed(123)\n",
    "model = GPTModel(GPT_CONFIG_124M)\n",
    "model.to(device)\n",

Download .txt

gitextract_2il4qejt/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── ask-a-question.md
│   │   └── bug-report.yaml
│   ├── scripts/
│   │   └── check_double_quotes.py
│   └── workflows/
│       ├── basic-tests-latest-python.yml
│       ├── basic-tests-linux-uv.yml
│       ├── basic-tests-macos-uv.yml
│       ├── basic-tests-old-pytorch.yml
│       ├── basic-tests-pip.yml
│       ├── basic-tests-pixi.yml
│       ├── basic-tests-pytorch-rc.yml
│       ├── basic-tests-windows-uv-pip.yml
│       ├── basic-tests-windows-uv-pip.yml.disabled
│       ├── basic-tests-windows-uv.yml.disabled
│       ├── check-links.yml
│       ├── check-spelling-errors.yml
│       └── pep8-linter.yml
├── .gitignore
├── .gitmodules
├── CITATION.cff
├── LICENSE.txt
├── README.md
├── appendix-A/
│   ├── 01_main-chapter-code/
│   │   ├── DDP-script-torchrun.py
│   │   ├── DDP-script.py
│   │   ├── README.md
│   │   ├── code-part1.ipynb
│   │   ├── code-part2.ipynb
│   │   └── exercise-solutions.ipynb
│   ├── 02_setup-recommendations/
│   │   └── README.md
│   └── README.md
├── appendix-B/
│   └── README.md
├── appendix-C/
│   └── README.md
├── appendix-D/
│   ├── 01_main-chapter-code/
│   │   ├── appendix-D.ipynb
│   │   └── previous_chapters.py
│   └── README.md
├── appendix-E/
│   ├── 01_main-chapter-code/
│   │   ├── appendix-E.ipynb
│   │   ├── gpt_download.py
│   │   └── previous_chapters.py
│   └── README.md
├── ch01/
│   ├── README.md
│   └── reading-recommendations.md
├── ch02/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch02.ipynb
│   │   ├── dataloader.ipynb
│   │   └── exercise-solutions.ipynb
│   ├── 02_bonus_bytepair-encoder/
│   │   ├── README.md
│   │   ├── bpe_openai_gpt2.py
│   │   ├── compare-bpe-tiktoken.ipynb
│   │   └── requirements-extra.txt
│   ├── 03_bonus_embedding-vs-matmul/
│   │   ├── README.md
│   │   └── embeddings-and-linear-layers.ipynb
│   ├── 04_bonus_dataloader-intuition/
│   │   ├── README.md
│   │   └── dataloader-intuition.ipynb
│   ├── 05_bpe-from-scratch/
│   │   ├── README.md
│   │   ├── bpe-from-scratch-simple.ipynb
│   │   ├── bpe-from-scratch.ipynb
│   │   └── tests.py
│   └── README.md
├── ch03/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch03.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── multihead-attention.ipynb
│   │   └── small-text-sample.txt
│   ├── 02_bonus_efficient-multihead-attention/
│   │   ├── README.md
│   │   ├── mha-implementations.ipynb
│   │   └── tests/
│   │       └── test_mha_implementations.py
│   ├── 03_understanding-buffers/
│   │   ├── README.md
│   │   └── understanding-buffers.ipynb
│   └── README.md
├── ch04/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch04.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_performance-analysis/
│   │   ├── README.md
│   │   ├── flops-analysis.ipynb
│   │   └── requirements-extra.txt
│   ├── 03_kv-cache/
│   │   ├── README.md
│   │   ├── gpt_ch04.py
│   │   ├── gpt_with_kv_cache.py
│   │   ├── gpt_with_kv_cache_optimized.py
│   │   └── tests.py
│   ├── 04_gqa/
│   │   ├── README.md
│   │   ├── gpt_with_kv_gqa.py
│   │   ├── gpt_with_kv_mha.py
│   │   ├── memory_estimator_gqa.py
│   │   └── plot_memory_estimates_gqa.py
│   ├── 05_mla/
│   │   ├── README.md
│   │   ├── gpt_with_kv_mha.py
│   │   ├── gpt_with_kv_mla.py
│   │   ├── memory_estimator_mla.py
│   │   └── plot_memory_estimates_mla.py
│   ├── 06_swa/
│   │   ├── README.md
│   │   ├── gpt_with_kv_mha.py
│   │   ├── gpt_with_kv_swa.py
│   │   ├── memory_estimator_swa.py
│   │   └── plot_memory_estimates_swa.py
│   ├── 07_moe/
│   │   ├── README.md
│   │   ├── gpt_with_kv_ffn.py
│   │   ├── gpt_with_kv_moe.py
│   │   ├── memory_estimator_moe.py
│   │   └── plot_memory_estimates_moe.py
│   ├── 08_deltanet/
│   │   ├── README.md
│   │   └── plot_memory_estimates_gated_deltanet.py
│   └── README.md
├── ch05/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch05.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt_download.py
│   │   ├── gpt_generate.py
│   │   ├── gpt_train.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_alternative_weight_loading/
│   │   ├── README.md
│   │   ├── weight-loading-hf-safetensors.ipynb
│   │   ├── weight-loading-hf-transformers.ipynb
│   │   └── weight-loading-pytorch.ipynb
│   ├── 03_bonus_pretraining_on_gutenberg/
│   │   ├── README.md
│   │   ├── prepare_dataset.py
│   │   ├── pretraining_simple.py
│   │   └── tests.py
│   ├── 04_learning_rate_schedulers/
│   │   └── README.md
│   ├── 05_bonus_hparam_tuning/
│   │   ├── README.md
│   │   └── hparam_search.py
│   ├── 06_user_interface/
│   │   ├── README.md
│   │   ├── app_orig.py
│   │   ├── app_own.py
│   │   └── requirements-extra.txt
│   ├── 07_gpt_to_llama/
│   │   ├── README.md
│   │   ├── converting-gpt-to-llama2.ipynb
│   │   ├── converting-llama2-to-llama3.ipynb
│   │   ├── previous_chapters.py
│   │   ├── requirements-extra.txt
│   │   ├── standalone-llama32.ipynb
│   │   └── tests/
│   │       ├── test-requirements-extra.txt
│   │       ├── test_llama32_nb.py
│   │       └── tests_rope_and_parts.py
│   ├── 08_memory_efficient_weight_loading/
│   │   ├── README.md
│   │   ├── memory-efficient-state-dict.ipynb
│   │   └── previous_chapters.py
│   ├── 09_extending-tokenizers/
│   │   ├── README.md
│   │   └── extend-tiktoken.ipynb
│   ├── 10_llm-training-speed/
│   │   ├── 00_orig.py
│   │   ├── 01_opt_single_gpu.py
│   │   ├── 02_opt_multi_gpu_ddp.py
│   │   └── README.md
│   ├── 11_qwen3/
│   │   ├── README.md
│   │   ├── qwen3-chat-interface/
│   │   │   ├── README.md
│   │   │   ├── qwen3-chat-interface-multiturn.py
│   │   │   ├── qwen3-chat-interface.py
│   │   │   └── requirements-extra.txt
│   │   ├── standalone-qwen3-moe-plus-kvcache.ipynb
│   │   ├── standalone-qwen3-moe.ipynb
│   │   ├── standalone-qwen3-plus-kvcache.ipynb
│   │   ├── standalone-qwen3.ipynb
│   │   └── tests/
│   │       ├── test_qwen3_kvcache_nb.py
│   │       └── test_qwen3_nb.py
│   ├── 12_gemma3/
│   │   ├── README.md
│   │   ├── standalone-gemma3-plus-kvcache.ipynb
│   │   ├── standalone-gemma3.ipynb
│   │   └── tests/
│   │       ├── test_gemma3_kv_nb.py
│   │       └── test_gemma3_nb.py
│   ├── 13_olmo3/
│   │   ├── README.md
│   │   ├── standalone-olmo3-plus-kv-cache.ipynb
│   │   ├── standalone-olmo3.ipynb
│   │   └── tests/
│   │       ├── olmo3_layer_debugger.py
│   │       ├── test_olmo3_kvcache_nb.py
│   │       └── test_olmo3_nb.py
│   ├── 14_ch05_with_other_llms/
│   │   ├── README.md
│   │   ├── ch05-llama32.ipynb
│   │   └── ch05-qwen3.ipynb
│   ├── 15_tiny-aya/
│   │   ├── README.md
│   │   ├── standalone-tiny-aya-plus-kv-cache.ipynb
│   │   ├── standalone-tiny-aya.ipynb
│   │   └── tests/
│   │       ├── test_tiny_aya_kvcache_nb.py
│   │       ├── test_tiny_aya_nb.py
│   │       └── tiny_aya_layer_debugger.py
│   ├── 16_qwen3.5/
│   │   ├── README.md
│   │   ├── qwen3.5-plus-kv-cache.ipynb
│   │   ├── qwen3.5.ipynb
│   │   ├── qwen3_5_transformers.py
│   │   └── tests/
│   │       ├── qwen3_5_layer_debugger.py
│   │       └── test_qwen3_5_nb.py
│   └── README.md
├── ch06/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch06.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── gpt_class_finetune.py
│   │   ├── gpt_download.py
│   │   ├── load-finetuned-model.ipynb
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_bonus_additional-experiments/
│   │   ├── README.md
│   │   ├── additional_experiments.py
│   │   ├── gpt_download.py
│   │   └── previous_chapters.py
│   ├── 03_bonus_imdb-classification/
│   │   ├── README.md
│   │   ├── download_prepare_dataset.py
│   │   ├── gpt_download.py
│   │   ├── previous_chapters.py
│   │   ├── requirements-extra.txt
│   │   ├── sklearn-baseline.ipynb
│   │   ├── train_bert_hf.py
│   │   ├── train_bert_hf_spam.py
│   │   ├── train_gpt.py
│   │   └── train_sklearn_logreg.py
│   ├── 04_user_interface/
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements-extra.txt
│   └── README.md
├── ch07/
│   ├── 01_main-chapter-code/
│   │   ├── README.md
│   │   ├── ch07.ipynb
│   │   ├── exercise-solutions.ipynb
│   │   ├── exercise_experiments.py
│   │   ├── gpt_download.py
│   │   ├── gpt_instruction_finetuning.py
│   │   ├── instruction-data-with-response.json
│   │   ├── instruction-data.json
│   │   ├── load-finetuned-model.ipynb
│   │   ├── ollama_evaluate.py
│   │   ├── previous_chapters.py
│   │   └── tests.py
│   ├── 02_dataset-utilities/
│   │   ├── README.md
│   │   ├── create-passive-voice-entries.ipynb
│   │   ├── find-near-duplicates.py
│   │   ├── instruction-examples.json
│   │   └── requirements-extra.txt
│   ├── 03_model-evaluation/
│   │   ├── README.md
│   │   ├── eval-example-data.json
│   │   ├── llm-instruction-eval-ollama.ipynb
│   │   ├── llm-instruction-eval-openai.ipynb
│   │   ├── requirements-extra.txt
│   │   └── scores/
│   │       ├── correlation-analysis.ipynb
│   │       ├── gpt4-model-1-response.json
│   │       ├── gpt4-model-2-response.json
│   │       ├── llama3-8b-model-1-response.json
│   │       └── llama3-8b-model-2-response.json
│   ├── 04_preference-tuning-with-dpo/
│   │   ├── README.md
│   │   ├── create-preference-data-ollama.ipynb
│   │   ├── dpo-from-scratch.ipynb
│   │   ├── instruction-data-with-preference.json
│   │   └── previous_chapters.py
│   ├── 05_dataset-generation/
│   │   ├── README.md
│   │   ├── instruction-data-llama3-7b.json
│   │   ├── llama3-ollama.ipynb
│   │   ├── reflection-gpt4.ipynb
│   │   └── requirements-extra.txt
│   ├── 06_user_interface/
│   │   ├── README.md
│   │   ├── app.py
│   │   └── requirements-extra.txt
│   └── README.md
├── conftest.py
├── pixi.toml
├── pkg/
│   └── llms_from_scratch/
│       ├── README.md
│       ├── __init__.py
│       ├── appendix_a.py
│       ├── appendix_d.py
│       ├── appendix_e.py
│       ├── ch02.py
│       ├── ch03.py
│       ├── ch04.py
│       ├── ch05.py
│       ├── ch06.py
│       ├── ch07.py
│       ├── generate.py
│       ├── kv_cache/
│       │   ├── __init__.py
│       │   ├── generate.py
│       │   ├── gpt2.py
│       │   ├── llama3.py
│       │   ├── qwen3.py
│       │   └── utils.py
│       ├── kv_cache_batched/
│       │   ├── __init__.py
│       │   ├── generate.py
│       │   ├── qwen3.py
│       │   └── utils.py
│       ├── llama3.py
│       ├── qwen3.py
│       ├── tests/
│       │   ├── test_appendix_a.py
│       │   ├── test_appendix_d.py
│       │   ├── test_appendix_e.py
│       │   ├── test_ch02.py
│       │   ├── test_ch03.py
│       │   ├── test_ch04.py
│       │   ├── test_ch05.py
│       │   ├── test_ch06.py
│       │   ├── test_ch07.py
│       │   ├── test_generate.py
│       │   ├── test_llama3.py
│       │   └── test_qwen3.py
│       └── utils.py
├── pyproject.toml
├── requirements.txt
└── setup/
    ├── 01_optional-python-setup-preferences/
    │   ├── README.md
    │   ├── native-pixi.md
    │   └── native-uv.md
    ├── 02_installing-python-libraries/
    │   ├── README.md
    │   ├── python_environment_check.ipynb
    │   ├── python_environment_check.py
    │   └── tests.py
    ├── 03_optional-docker-environment/
    │   ├── .devcontainer/
    │   │   ├── Dockerfile
    │   │   ├── README.md
    │   │   └── devcontainer.json
    │   └── README.md
    ├── 04_optional-aws-sagemaker-notebook/
    │   ├── README.md
    │   └── cloudformation-template.yml
    └── README.md

Download .txt

SYMBOL INDEX (1376 symbols across 126 files)

FILE: .github/scripts/check_double_quotes.py
  function should_skip (line 37) | def should_skip(path):
  function collect_fstring_expr_string_positions (line 42) | def collect_fstring_expr_string_positions(source):
  function check_quotes_in_source (line 76) | def check_quotes_in_source(source, path):
  function check_file (line 104) | def check_file(path):
  function check_notebook (line 115) | def check_notebook(path):
  function parse_args (line 126) | def parse_args():
  function main (line 136) | def main():

FILE: appendix-A/01_main-chapter-code/DDP-script-torchrun.py
  function ddp_setup (line 22) | def ddp_setup(rank, world_size):
  class ToyDataset (line 48) | class ToyDataset(Dataset):
    method __init__ (line 49) | def __init__(self, X, y):
    method __getitem__ (line 53) | def __getitem__(self, index):
    method __len__ (line 58) | def __len__(self):
  class NeuralNetwork (line 62) | class NeuralNetwork(torch.nn.Module):
    method __init__ (line 63) | def __init__(self, num_inputs, num_outputs):
    method forward (line 79) | def forward(self, x):
  function prepare_dataset (line 84) | def prepare_dataset():
  function main (line 128) | def main(rank, world_size, num_epochs):
  function compute_accuracy (line 181) | def compute_accuracy(model, dataloader, device):

FILE: appendix-A/01_main-chapter-code/DDP-script.py
  function ddp_setup (line 23) | def ddp_setup(rank, world_size):
  class ToyDataset (line 49) | class ToyDataset(Dataset):
    method __init__ (line 50) | def __init__(self, X, y):
    method __getitem__ (line 54) | def __getitem__(self, index):
    method __len__ (line 59) | def __len__(self):
  class NeuralNetwork (line 63) | class NeuralNetwork(torch.nn.Module):
    method __init__ (line 64) | def __init__(self, num_inputs, num_outputs):
    method forward (line 80) | def forward(self, x):
  function prepare_dataset (line 85) | def prepare_dataset():
  function main (line 129) | def main(rank, world_size, num_epochs):
  function compute_accuracy (line 182) | def compute_accuracy(model, dataloader, device):

FILE: appendix-D/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 21) | class GPTDatasetV1(Dataset):
    method __init__ (line 22) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 36) | def __len__(self):
    method __getitem__ (line 39) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 43) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 62) | class MultiHeadAttention(nn.Module):
    method __init__ (line 63) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 78) | def forward(self, x):
  class LayerNorm (line 122) | class LayerNorm(nn.Module):
    method __init__ (line 123) | def __init__(self, emb_dim):
    method forward (line 129) | def forward(self, x):
  class GELU (line 136) | class GELU(nn.Module):
    method __init__ (line 137) | def __init__(self):
    method forward (line 140) | def forward(self, x):
  class FeedForward (line 147) | class FeedForward(nn.Module):
    method __init__ (line 148) | def __init__(self, cfg):
    method forward (line 156) | def forward(self, x):
  class TransformerBlock (line 160) | class TransformerBlock(nn.Module):
    method __init__ (line 161) | def __init__(self, cfg):
    method forward (line 175) | def forward(self, x):
  class GPTModel (line 193) | class GPTModel(nn.Module):
    method __init__ (line 194) | def __init__(self, cfg):
    method forward (line 206) | def forward(self, in_idx):
  function generate_text_simple (line 218) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function calc_loss_batch (line 249) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 256) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 273) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 282) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function plot_losses (line 295) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function text_to_token_ids (line 314) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 320) | def token_ids_to_text(token_ids, tokenizer):

FILE: appendix-E/01_main-chapter-code/gpt_download.py
  function download_and_load_gpt2 (line 17) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 49) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 131) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: appendix-E/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 29) | class GPTDatasetV1(Dataset):
    method __init__ (line 30) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 44) | def __len__(self):
    method __getitem__ (line 47) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 51) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 69) | class MultiHeadAttention(nn.Module):
    method __init__ (line 70) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 85) | def forward(self, x):
  class LayerNorm (line 128) | class LayerNorm(nn.Module):
    method __init__ (line 129) | def __init__(self, emb_dim):
    method forward (line 135) | def forward(self, x):
  class GELU (line 142) | class GELU(nn.Module):
    method __init__ (line 143) | def __init__(self):
    method forward (line 146) | def forward(self, x):
  class FeedForward (line 153) | class FeedForward(nn.Module):
    method __init__ (line 154) | def __init__(self, cfg):
    method forward (line 162) | def forward(self, x):
  class TransformerBlock (line 166) | class TransformerBlock(nn.Module):
    method __init__ (line 167) | def __init__(self, cfg):
    method forward (line 181) | def forward(self, x):
  class GPTModel (line 199) | class GPTModel(nn.Module):
    method __init__ (line 200) | def __init__(self, cfg):
    method forward (line 212) | def forward(self, in_idx):
  function generate_text_simple (line 224) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function assign (line 253) | def assign(left, right):
  function load_weights_into_gpt (line 259) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 320) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 326) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_loader (line 331) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 350) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function download_and_unzip_spam_data (line 364) | def download_and_unzip_spam_data(url, zip_path, extracted_path, data_fil...
  function create_balanced_dataset (line 387) | def create_balanced_dataset(df):
  function random_split (line 401) | def random_split(df, train_frac, validation_frac):
  class SpamDataset (line 417) | class SpamDataset(Dataset):
    method __init__ (line 418) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 442) | def __getitem__(self, index):
    method __len__ (line 447) | def __len__(self):
    method _longest_encoded_length (line 450) | def _longest_encoded_length(self):
  function calc_accuracy_loader (line 463) | def calc_accuracy_loader(data_loader, model, device, num_batches=None):
  function calc_loss_batch (line 484) | def calc_loss_batch(input_batch, target_batch, model, device):
  function train_classifier_simple (line 492) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...
  function plot_values (line 530) | def plot_values(epochs_seen, examples_seen, train_values, val_values, la...

FILE: ch02/02_bonus_bytepair-encoder/bpe_openai_gpt2.py
  function bytes_to_unicode (line 37) | def bytes_to_unicode():
  function get_pairs (line 59) | def get_pairs(word):
  class Encoder (line 72) | class Encoder:
    method __init__ (line 73) | def __init__(self, encoder, bpe_merges, errors="replace"):
    method bpe (line 85) | def bpe(self, token):
    method encode (line 126) | def encode(self, text):
    method decode (line 133) | def decode(self, tokens):
  function get_encoder (line 139) | def get_encoder(model_name, models_dir):
  function download_vocab (line 148) | def download_vocab():

FILE: ch02/05_bpe-from-scratch/tests.py
  function import_definitions_from_notebook (line 11) | def import_definitions_from_notebook(fullname, names):
  function imported_module (line 39) | def imported_module():
  function verdict_file (line 46) | def verdict_file(imported_module):
  function gpt2_files (line 64) | def gpt2_files(imported_module):
  function test_tokenizer_training (line 79) | def test_tokenizer_training(imported_module, verdict_file):
  function test_gpt2_tokenizer_openai_simple (line 108) | def test_gpt2_tokenizer_openai_simple(imported_module, gpt2_files):
  function test_gpt2_tokenizer_openai_edgecases (line 123) | def test_gpt2_tokenizer_openai_edgecases(imported_module, gpt2_files):
  function test_gpt2_newline_and_eot_ids (line 163) | def test_gpt2_newline_and_eot_ids(imported_module, gpt2_files):
  function test_no_eot_aliasing_and_disallowed_logic (line 185) | def test_no_eot_aliasing_and_disallowed_logic(imported_module, gpt2_files):
  function test_newline_roundtrip_and_equivalence (line 214) | def test_newline_roundtrip_and_equivalence(imported_module, gpt2_files, ...
  function test_space_newline_space_patterns (line 234) | def test_space_newline_space_patterns(imported_module, gpt2_files):
  function test_multiple_leading_spaces_roundtrip (line 250) | def test_multiple_leading_spaces_roundtrip(imported_module, gpt2_files):

FILE: ch03/02_bonus_efficient-multihead-attention/tests/test_mha_implementations.py
  function import_notebook_defs (line 10) | def import_notebook_defs():
  function copy_weights (line 16) | def copy_weights(from_mha, to_mha):
  function test_mha_einsum_matches_ch03 (line 34) | def test_mha_einsum_matches_ch03(d_in, d_out, batch, seq_len, num_heads,...

FILE: ch04/01_main-chapter-code/gpt.py
  class GPTDatasetV1 (line 15) | class GPTDatasetV1(Dataset):
    method __init__ (line 16) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 30) | def __len__(self):
    method __getitem__ (line 33) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 37) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 55) | class MultiHeadAttention(nn.Module):
    method __init__ (line 56) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 71) | def forward(self, x):
  class LayerNorm (line 114) | class LayerNorm(nn.Module):
    method __init__ (line 115) | def __init__(self, emb_dim):
    method forward (line 121) | def forward(self, x):
  class GELU (line 128) | class GELU(nn.Module):
    method __init__ (line 129) | def __init__(self):
    method forward (line 132) | def forward(self, x):
  class FeedForward (line 139) | class FeedForward(nn.Module):
    method __init__ (line 140) | def __init__(self, cfg):
    method forward (line 148) | def forward(self, x):
  class TransformerBlock (line 152) | class TransformerBlock(nn.Module):
    method __init__ (line 153) | def __init__(self, cfg):
    method forward (line 167) | def forward(self, x):
  class GPTModel (line 185) | class GPTModel(nn.Module):
    method __init__ (line 186) | def __init__(self, cfg):
    method forward (line 198) | def forward(self, in_idx):
  function generate_text_simple (line 210) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function main (line 236) | def main():

FILE: ch04/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 12) | class GPTDatasetV1(Dataset):
    method __init__ (line 13) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 27) | def __len__(self):
    method __getitem__ (line 30) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 34) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 49) | class MultiHeadAttention(nn.Module):
    method __init__ (line 50) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 65) | def forward(self, x):

FILE: ch04/01_main-chapter-code/tests.py
  function test_main (line 31) | def test_main(capsys):

FILE: ch04/03_kv-cache/gpt_ch04.py
  class MultiHeadAttention (line 14) | class MultiHeadAttention(nn.Module):
    method __init__ (line 15) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 34) | def forward(self, x):
  class LayerNorm (line 77) | class LayerNorm(nn.Module):
    method __init__ (line 78) | def __init__(self, emb_dim):
    method forward (line 84) | def forward(self, x):
  class GELU (line 91) | class GELU(nn.Module):
    method __init__ (line 92) | def __init__(self):
    method forward (line 95) | def forward(self, x):
  class FeedForward (line 102) | class FeedForward(nn.Module):
    method __init__ (line 103) | def __init__(self, cfg):
    method forward (line 111) | def forward(self, x):
  class TransformerBlock (line 115) | class TransformerBlock(nn.Module):
    method __init__ (line 116) | def __init__(self, cfg):
    method forward (line 130) | def forward(self, x):
  class GPTModel (line 148) | class GPTModel(nn.Module):
    method __init__ (line 149) | def __init__(self, cfg):
    method forward (line 161) | def forward(self, in_idx):
  function generate_text_simple (line 173) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function main (line 200) | def main():

FILE: ch04/03_kv-cache/gpt_with_kv_cache.py
  class MultiHeadAttention (line 14) | class MultiHeadAttention(nn.Module):
    method __init__ (line 15) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 41) | def forward(self, x, use_cache=False):
    method reset_cache (line 106) | def reset_cache(self):
  class LayerNorm (line 115) | class LayerNorm(nn.Module):
    method __init__ (line 116) | def __init__(self, emb_dim):
    method forward (line 122) | def forward(self, x):
  class GELU (line 129) | class GELU(nn.Module):
    method __init__ (line 130) | def __init__(self):
    method forward (line 133) | def forward(self, x):
  class FeedForward (line 140) | class FeedForward(nn.Module):
    method __init__ (line 141) | def __init__(self, cfg):
    method forward (line 149) | def forward(self, x):
  class TransformerBlock (line 153) | class TransformerBlock(nn.Module):
    method __init__ (line 154) | def __init__(self, cfg):
    method forward (line 168) | def forward(self, x, use_cache=False):
  class GPTModel (line 192) | class GPTModel(nn.Module):
    method __init__ (line 193) | def __init__(self, cfg):
    method forward (line 212) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 245) | def reset_kv_cache(self):
  function generate_text_simple (line 252) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function generate_text_simple_cached (line 280) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 308) | def main():

FILE: ch04/03_kv-cache/gpt_with_kv_cache_optimized.py
  class MultiHeadAttention (line 14) | class MultiHeadAttention(nn.Module):
    method __init__ (line 15) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 37) | def forward(self, x, use_cache=False):
    method reset_cache (line 124) | def reset_cache(self):
  class LayerNorm (line 132) | class LayerNorm(nn.Module):
    method __init__ (line 133) | def __init__(self, emb_dim):
    method forward (line 139) | def forward(self, x):
  class GELU (line 146) | class GELU(nn.Module):
    method __init__ (line 147) | def __init__(self):
    method forward (line 150) | def forward(self, x):
  class FeedForward (line 157) | class FeedForward(nn.Module):
    method __init__ (line 158) | def __init__(self, cfg):
    method forward (line 166) | def forward(self, x):
  class TransformerBlock (line 170) | class TransformerBlock(nn.Module):
    method __init__ (line 171) | def __init__(self, cfg):
    method forward (line 187) | def forward(self, x, use_cache=False):
  class GPTModel (line 211) | class GPTModel(nn.Module):
    method __init__ (line 212) | def __init__(self, cfg):
    method forward (line 232) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 271) | def reset_kv_cache(self):
  function generate_text_simple (line 278) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function generate_text_simple_cached (line 306) | def generate_text_simple_cached(model, idx, max_new_tokens, context_size...
  function main (line 343) | def main():

FILE: ch04/03_kv-cache/tests.py
  function test_gpt_model_equivalence_not_cached (line 32) | def test_gpt_model_equivalence_not_cached(ModelClass):
  function test_gpt_model_equivalence_cached (line 66) | def test_gpt_model_equivalence_cached(ModelClass):
  function test_context_overflow_bug (line 113) | def test_context_overflow_bug():
  function test_prefill_chunking_basic (line 150) | def test_prefill_chunking_basic():

FILE: ch04/04_gqa/gpt_with_kv_gqa.py
  class GroupedQueryAttention (line 20) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 21) | def __init__(
    method forward (line 45) | def forward(self, x, use_cache=False):
    method reset_cache (line 121) | def reset_cache(self):
  class LayerNorm (line 129) | class LayerNorm(nn.Module):
    method __init__ (line 130) | def __init__(self, emb_dim):
    method forward (line 136) | def forward(self, x):
  class GELU (line 143) | class GELU(nn.Module):
    method __init__ (line 144) | def __init__(self):
    method forward (line 147) | def forward(self, x):
  class FeedForward (line 154) | class FeedForward(nn.Module):
    method __init__ (line 155) | def __init__(self, cfg):
    method forward (line 163) | def forward(self, x):
  class TransformerBlock (line 167) | class TransformerBlock(nn.Module):
    method __init__ (line 168) | def __init__(self, cfg):
    method forward (line 182) | def forward(self, x, use_cache=False):
  class GPTModel (line 206) | class GPTModel(nn.Module):
    method __init__ (line 207) | def __init__(self, cfg):
    method forward (line 226) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 258) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 265) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 292) | def main():

FILE: ch04/04_gqa/gpt_with_kv_mha.py
  class MultiHeadAttention (line 20) | class MultiHeadAttention(nn.Module):
    method __init__ (line 21) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False):
    method forward (line 42) | def forward(self, x, use_cache=False):
    method reset_cache (line 110) | def reset_cache(self):
  class LayerNorm (line 118) | class LayerNorm(nn.Module):
    method __init__ (line 119) | def __init__(self, emb_dim):
    method forward (line 125) | def forward(self, x):
  class GELU (line 132) | class GELU(nn.Module):
    method __init__ (line 133) | def __init__(self):
    method forward (line 136) | def forward(self, x):
  class FeedForward (line 143) | class FeedForward(nn.Module):
    method __init__ (line 144) | def __init__(self, cfg):
    method forward (line 152) | def forward(self, x):
  class TransformerBlock (line 156) | class TransformerBlock(nn.Module):
    method __init__ (line 157) | def __init__(self, cfg):
    method forward (line 170) | def forward(self, x, use_cache=False):
  class GPTModel (line 194) | class GPTModel(nn.Module):
    method __init__ (line 195) | def __init__(self, cfg):
    method forward (line 214) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 246) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 253) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 280) | def main():

FILE: ch04/04_gqa/memory_estimator_gqa.py
  function convert_bytes (line 21) | def convert_bytes(n):
  function calc_kv_bytes_total (line 26) | def calc_kv_bytes_total(batch, context_length, emb_dim, n_heads,
  function main (line 33) | def main():

FILE: ch04/04_gqa/plot_memory_estimates_gqa.py
  function bytes_convert (line 14) | def bytes_convert(n):
  function savings_percent (line 19) | def savings_percent(total_mha, total_gqa):
  function plot_abs_kv_vs_context_multi_groups (line 23) | def plot_abs_kv_vs_context_multi_groups():

FILE: ch04/05_mla/gpt_with_kv_mha.py
  class MultiHeadAttention (line 20) | class MultiHeadAttention(nn.Module):
    method __init__ (line 21) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False):
    method forward (line 42) | def forward(self, x, use_cache=False):
    method reset_cache (line 110) | def reset_cache(self):
  class LayerNorm (line 118) | class LayerNorm(nn.Module):
    method __init__ (line 119) | def __init__(self, emb_dim):
    method forward (line 125) | def forward(self, x):
  class GELU (line 132) | class GELU(nn.Module):
    method __init__ (line 133) | def __init__(self):
    method forward (line 136) | def forward(self, x):
  class FeedForward (line 143) | class FeedForward(nn.Module):
    method __init__ (line 144) | def __init__(self, cfg):
    method forward (line 152) | def forward(self, x):
  class TransformerBlock (line 156) | class TransformerBlock(nn.Module):
    method __init__ (line 157) | def __init__(self, cfg):
    method forward (line 170) | def forward(self, x, use_cache=False):
  class GPTModel (line 194) | class GPTModel(nn.Module):
    method __init__ (line 195) | def __init__(self, cfg):
    method forward (line 214) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 246) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 253) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 280) | def main():

FILE: ch04/05_mla/gpt_with_kv_mla.py
  class MultiHeadLatentAttention (line 24) | class MultiHeadLatentAttention(nn.Module):
    method __init__ (line 25) | def __init__(self, d_in, d_out, dropout, num_heads,
    method reset_cache (line 50) | def reset_cache(self):
    method _reshape_to_heads (line 55) | def _reshape_to_heads(x, num_heads, head_dim):
    method forward (line 60) | def forward(self, x, use_cache=False):
  class LayerNorm (line 124) | class LayerNorm(nn.Module):
    method __init__ (line 125) | def __init__(self, emb_dim):
    method forward (line 131) | def forward(self, x):
  class GELU (line 138) | class GELU(nn.Module):
    method __init__ (line 139) | def __init__(self):
    method forward (line 142) | def forward(self, x):
  class FeedForward (line 149) | class FeedForward(nn.Module):
    method __init__ (line 150) | def __init__(self, cfg):
    method forward (line 158) | def forward(self, x):
  class TransformerBlock (line 162) | class TransformerBlock(nn.Module):
    method __init__ (line 163) | def __init__(self, cfg):
    method forward (line 178) | def forward(self, x, use_cache=False):
  class GPTModel (line 202) | class GPTModel(nn.Module):
    method __init__ (line 203) | def __init__(self, cfg):
    method forward (line 222) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 254) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 261) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 288) | def main():

FILE: ch04/05_mla/memory_estimator_mla.py
  function convert_bytes (line 20) | def convert_bytes(n):
  function calc_kv_bytes_total (line 25) | def calc_kv_bytes_total(batch, context_length, emb_dim, n_heads,
  function calc_mla_bytes_total (line 33) | def calc_mla_bytes_total(batch, context_length, n_layers, latent_dim, by...
  function main (line 39) | def main():

FILE: ch04/05_mla/plot_memory_estimates_mla.py
  function convert_bytes_to_gb (line 18) | def convert_bytes_to_gb(n_bytes):
  function calc_kv_bytes_total_mha (line 22) | def calc_kv_bytes_total_mha(batch, context_length, emb_dim, n_heads,
  function calc_kv_bytes_total_mla (line 29) | def calc_kv_bytes_total_mla(batch, context_length, n_layers, latent_dim,...
  function plot_abs_kv_vs_context_multiple (line 33) | def plot_abs_kv_vs_context_multiple():

FILE: ch04/06_swa/gpt_with_kv_mha.py
  class MultiHeadAttention (line 20) | class MultiHeadAttention(nn.Module):
    method __init__ (line 21) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False):
    method forward (line 42) | def forward(self, x, use_cache=False):
    method reset_cache (line 110) | def reset_cache(self):
  class LayerNorm (line 118) | class LayerNorm(nn.Module):
    method __init__ (line 119) | def __init__(self, emb_dim):
    method forward (line 125) | def forward(self, x):
  class GELU (line 132) | class GELU(nn.Module):
    method __init__ (line 133) | def __init__(self):
    method forward (line 136) | def forward(self, x):
  class FeedForward (line 143) | class FeedForward(nn.Module):
    method __init__ (line 144) | def __init__(self, cfg):
    method forward (line 152) | def forward(self, x):
  class TransformerBlock (line 156) | class TransformerBlock(nn.Module):
    method __init__ (line 157) | def __init__(self, cfg):
    method forward (line 170) | def forward(self, x, use_cache=False):
  class GPTModel (line 194) | class GPTModel(nn.Module):
    method __init__ (line 195) | def __init__(self, cfg):
    method forward (line 214) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 246) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 253) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 280) | def main():

FILE: ch04/06_swa/gpt_with_kv_swa.py
  class MultiHeadAttentionWithSWA (line 20) | class MultiHeadAttentionWithSWA(nn.Module):
    method __init__ (line 21) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False, sl...
    method forward (line 43) | def forward(self, x, use_cache=False):
    method reset_cache (line 127) | def reset_cache(self):
  class LayerNorm (line 135) | class LayerNorm(nn.Module):
    method __init__ (line 136) | def __init__(self, emb_dim):
    method forward (line 142) | def forward(self, x):
  class GELU (line 149) | class GELU(nn.Module):
    method __init__ (line 150) | def __init__(self):
    method forward (line 153) | def forward(self, x):
  class FeedForward (line 160) | class FeedForward(nn.Module):
    method __init__ (line 161) | def __init__(self, cfg):
    method forward (line 169) | def forward(self, x):
  class TransformerBlock (line 173) | class TransformerBlock(nn.Module):
    method __init__ (line 174) | def __init__(self, cfg):
    method forward (line 189) | def forward(self, x, use_cache=False):
  class GPTModel (line 213) | class GPTModel(nn.Module):
    method __init__ (line 214) | def __init__(self, cfg):
    method forward (line 247) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 279) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 286) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 313) | def main():

FILE: ch04/06_swa/memory_estimator_swa.py
  function convert_bytes (line 20) | def convert_bytes(n):
  function calc_kv_bytes_per_layer (line 25) | def calc_kv_bytes_per_layer(batch, context_length, head_dim, n_kv_heads,...
  function parse_ratio (line 30) | def parse_ratio(ratio_str):
  function distribute_layers (line 41) | def distribute_layers(n_layers, a, b):
  function estimate_totals (line 50) | def estimate_totals(context_length, sliding_window_size, emb_dim, n_head...
  function main (line 92) | def main():

FILE: ch04/06_swa/plot_memory_estimates_swa.py
  function convert_bytes_to_gb (line 27) | def convert_bytes_to_gb(n_bytes):
  function parse_ratio (line 31) | def parse_ratio(ratio_str):
  function calc_kv_bytes_total_mha (line 42) | def calc_kv_bytes_total_mha(batch, context_length, emb_dim, n_layers, by...
  function calc_kv_bytes_total_gqa (line 48) | def calc_kv_bytes_total_gqa(
  function calc_kv_bytes_total_mha_swa (line 57) | def calc_kv_bytes_total_mha_swa(
  function calc_kv_bytes_total_gqa_swa (line 75) | def calc_kv_bytes_total_gqa_swa(
  function main (line 104) | def main():

FILE: ch04/07_moe/gpt_with_kv_ffn.py
  class MultiHeadAttention (line 23) | class MultiHeadAttention(nn.Module):
    method __init__ (line 24) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False):
    method forward (line 45) | def forward(self, x, use_cache=False):
    method reset_cache (line 113) | def reset_cache(self):
  class LayerNorm (line 121) | class LayerNorm(nn.Module):
    method __init__ (line 122) | def __init__(self, emb_dim):
    method forward (line 128) | def forward(self, x):
  class GELU (line 135) | class GELU(nn.Module):
    method __init__ (line 136) | def __init__(self):
    method forward (line 139) | def forward(self, x):
  class FeedForward (line 159) | class FeedForward(nn.Module):
    method __init__ (line 160) | def __init__(self, cfg):
    method forward (line 166) | def forward(self, x):
  class TransformerBlock (line 170) | class TransformerBlock(nn.Module):
    method __init__ (line 171) | def __init__(self, cfg):
    method forward (line 185) | def forward(self, x, use_cache=False):
  class GPTModel (line 220) | class GPTModel(nn.Module):
    method __init__ (line 221) | def __init__(self, cfg):
    method forward (line 240) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 272) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 279) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 343) | def main():

FILE: ch04/07_moe/gpt_with_kv_moe.py
  class MultiHeadAttention (line 23) | class MultiHeadAttention(nn.Module):
    method __init__ (line 24) | def __init__(self, d_in, d_out, dropout, num_heads, qkv_bias=False):
    method forward (line 45) | def forward(self, x, use_cache=False):
    method reset_cache (line 113) | def reset_cache(self):
  class LayerNorm (line 121) | class LayerNorm(nn.Module):
    method __init__ (line 122) | def __init__(self, emb_dim):
    method forward (line 128) | def forward(self, x):
  class GELU (line 135) | class GELU(nn.Module):
    method __init__ (line 136) | def __init__(self):
    method forward (line 139) | def forward(self, x):
  class FeedForward (line 146) | class FeedForward(nn.Module):
    method __init__ (line 147) | def __init__(self, cfg):
    method forward (line 155) | def forward(self, x):
  class MoEFeedForward (line 159) | class MoEFeedForward(nn.Module):
    method __init__ (line 160) | def __init__(self, cfg):
    method forward (line 186) | def forward(self, x):
  class TransformerBlock (line 230) | class TransformerBlock(nn.Module):
    method __init__ (line 231) | def __init__(self, cfg):
    method forward (line 245) | def forward(self, x, use_cache=False):
  class GPTModel (line 280) | class GPTModel(nn.Module):
    method __init__ (line 281) | def __init__(self, cfg):
    method forward (line 300) | def forward(self, in_idx, use_cache=False):
    method reset_kv_cache (line 332) | def reset_kv_cache(self):
  function generate_text_simple_cached (line 339) | def generate_text_simple_cached(model, idx, max_new_tokens,
  function main (line 403) | def main():

FILE: ch04/07_moe/memory_estimator_moe.py
  function convert_bytes (line 17) | def convert_bytes(n):
  function get_num_param_matrices (line 22) | def get_num_param_matrices(ffn_type):
  function calc_ffn_params (line 31) | def calc_ffn_params(emb_dim, hidden_dim, ffn_type):
  function calc_router_params (line 35) | def calc_router_params(emb_dim, num_experts):
  function estimate_params_and_hidden (line 39) | def estimate_params_and_hidden(
  function main (line 67) | def main():

FILE: ch04/07_moe/plot_memory_estimates_moe.py
  function calc_moe_active_and_total (line 16) | def calc_moe_active_and_total(
  function plot_active_params_vs_experts (line 42) | def plot_active_params_vs_experts(
  function main (line 93) | def main():

FILE: ch04/08_deltanet/plot_memory_estimates_gated_deltanet.py
  function calc_kv_bytes_total_mha (line 20) | def calc_kv_bytes_total_mha(batch, context_length, emb_dim, n_layers, by...
  function calc_kv_bytes_total_deltanet_no_conv (line 27) | def calc_kv_bytes_total_deltanet_no_conv(batch, emb_dim, n_layers, bytes...
  function convert_to_gb (line 34) | def convert_to_gb(x):
  function main (line 38) | def main():

FILE: ch05/01_main-chapter-code/gpt_download.py
  function download_and_load_gpt2 (line 16) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 48) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 126) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: ch05/01_main-chapter-code/gpt_generate.py
  function text_to_token_ids (line 21) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 27) | def token_ids_to_text(token_ids, tokenizer):
  function download_and_load_gpt2 (line 32) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 62) | def download_file(url, destination):
  function load_gpt2_params_from_tf_ckpt (line 91) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):
  function assign (line 120) | def assign(left, right):
  function load_weights_into_gpt (line 126) | def load_weights_into_gpt(gpt, params):
  function generate (line 187) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...
  function main (line 230) | def main(gpt_config, input_prompt, model_size, device):

FILE: ch05/01_main-chapter-code/gpt_train.py
  function text_to_token_ids (line 17) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 23) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 28) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 35) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 52) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 61) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function train_model_simple (line 75) | def train_model_simple(model, train_loader, val_loader, optimizer, devic...
  function plot_losses (line 112) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function main (line 131) | def main(gpt_config, settings):

FILE: ch05/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 20) | class GPTDatasetV1(Dataset):
    method __init__ (line 21) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 35) | def __len__(self):
    method __getitem__ (line 38) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 42) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 60) | class MultiHeadAttention(nn.Module):
    method __init__ (line 61) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 76) | def forward(self, x):
  class LayerNorm (line 119) | class LayerNorm(nn.Module):
    method __init__ (line 120) | def __init__(self, emb_dim):
    method forward (line 126) | def forward(self, x):
  class GELU (line 133) | class GELU(nn.Module):
    method __init__ (line 134) | def __init__(self):
    method forward (line 137) | def forward(self, x):
  class FeedForward (line 144) | class FeedForward(nn.Module):
    method __init__ (line 145) | def __init__(self, cfg):
    method forward (line 153) | def forward(self, x):
  class TransformerBlock (line 157) | class TransformerBlock(nn.Module):
    method __init__ (line 158) | def __init__(self, cfg):
    method forward (line 172) | def forward(self, x):
  class GPTModel (line 190) | class GPTModel(nn.Module):
    method __init__ (line 191) | def __init__(self, cfg):
    method forward (line 203) | def forward(self, in_idx):
  function generate_text_simple (line 215) | def generate_text_simple(model, idx, max_new_tokens, context_size):

FILE: ch05/01_main-chapter-code/tests.py
  function gpt_config (line 13) | def gpt_config():
  function other_settings (line 26) | def other_settings():
  function test_main (line 35) | def test_main(gpt_config, other_settings):
  function check_file_size (line 43) | def check_file_size(url, expected_size):
  function test_model_files (line 63) | def test_model_files():

FILE: ch05/03_bonus_pretraining_on_gutenberg/prepare_dataset.py
  function is_english (line 17) | def is_english(text, threshold=0.9):
  function combine_files (line 22) | def combine_files(file_paths, target_dir, max_size_mb=500, separator="<|...

FILE: ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py
  function read_text_file (line 28) | def read_text_file(file_path):
  function create_dataloaders (line 34) | def create_dataloaders(text_data, train_ratio, batch_size, max_length, s...
  function convert_time (line 57) | def convert_time(seconds):
  function print_eta (line 63) | def print_eta(start_time, book_start_time, index, total_files):
  function train_model_simple (line 80) | def train_model_simple(model, optimizer, device, n_epochs,

FILE: ch05/03_bonus_pretraining_on_gutenberg/tests.py
  function test_pretraining (line 13) | def test_pretraining():

FILE: ch05/05_bonus_hparam_tuning/hparam_search.py
  function calc_loss_loader (line 31) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function calc_loss_batch (line 48) | def calc_loss_batch(input_batch, target_batch, model, device):
  function evaluate_model (line 57) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function train_model (line 66) | def train_model(model, train_loader, val_loader, optimizer, device,

FILE: ch05/06_user_interface/app_orig.py
  function get_model_and_tokenizer (line 24) | def get_model_and_tokenizer():
  function main (line 68) | async def main(message: chainlit.Message):

FILE: ch05/06_user_interface/app_own.py
  function get_model_and_tokenizer (line 26) | def get_model_and_tokenizer():
  function main (line 62) | async def main(message: chainlit.Message):

FILE: ch05/07_gpt_to_llama/previous_chapters.py
  function text_to_token_ids (line 16) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 22) | def token_ids_to_text(token_ids, tokenizer):
  function generate (line 27) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...

FILE: ch05/07_gpt_to_llama/tests/test_llama32_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_llama3_forward (line 54) | def test_dummy_llama3_forward(dummy_cfg_base, dummy_input, import_notebo...
  function test_llama3_base_equivalence_with_transformers (line 63) | def test_llama3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/07_gpt_to_llama/tests/tests_rope_and_parts.py
  function litgpt_build_rope_cache (line 27) | def litgpt_build_rope_cache(
  function litgpt_apply_rope (line 79) | def litgpt_apply_rope(x: torch.Tensor, cos: torch.Tensor, sin: torch.Ten...
  function notebook (line 96) | def notebook():
  function set_seed (line 139) | def set_seed():
  function test_rope_llama2 (line 143) | def test_rope_llama2(notebook):
  function test_rope_llama3 (line 207) | def test_rope_llama3(notebook):
  function test_rope_llama3_12 (line 277) | def test_rope_llama3_12(notebook):
  function test_silu (line 371) | def test_silu(notebook):
  function test_rmsnorm (line 378) | def test_rmsnorm(notebook):

FILE: ch05/08_memory_efficient_weight_loading/previous_chapters.py
  class MultiHeadAttention (line 18) | class MultiHeadAttention(nn.Module):
    method __init__ (line 19) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 34) | def forward(self, x):
  class LayerNorm (line 77) | class LayerNorm(nn.Module):
    method __init__ (line 78) | def __init__(self, emb_dim):
    method forward (line 84) | def forward(self, x):
  class GELU (line 91) | class GELU(nn.Module):
    method __init__ (line 92) | def __init__(self):
    method forward (line 95) | def forward(self, x):
  class FeedForward (line 102) | class FeedForward(nn.Module):
    method __init__ (line 103) | def __init__(self, cfg):
    method forward (line 111) | def forward(self, x):
  class TransformerBlock (line 115) | class TransformerBlock(nn.Module):
    method __init__ (line 116) | def __init__(self, cfg):
    method forward (line 130) | def forward(self, x):
  class GPTModel (line 148) | class GPTModel(nn.Module):
    method __init__ (line 149) | def __init__(self, cfg):
    method forward (line 161) | def forward(self, in_idx):

FILE: ch05/10_llm-training-speed/00_orig.py
  class GPTDatasetV1 (line 22) | class GPTDatasetV1(Dataset):
    method __init__ (line 23) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 37) | def __len__(self):
    method __getitem__ (line 40) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 44) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 62) | class MultiHeadAttention(nn.Module):
    method __init__ (line 63) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 78) | def forward(self, x):
  class LayerNorm (line 121) | class LayerNorm(nn.Module):
    method __init__ (line 122) | def __init__(self, emb_dim):
    method forward (line 128) | def forward(self, x):
  class GELU (line 135) | class GELU(nn.Module):
    method __init__ (line 136) | def __init__(self):
    method forward (line 139) | def forward(self, x):
  class FeedForward (line 146) | class FeedForward(nn.Module):
    method __init__ (line 147) | def __init__(self, cfg):
    method forward (line 155) | def forward(self, x):
  class TransformerBlock (line 159) | class TransformerBlock(nn.Module):
    method __init__ (line 160) | def __init__(self, cfg):
    method forward (line 174) | def forward(self, x):
  class GPTModel (line 192) | class GPTModel(nn.Module):
    method __init__ (line 193) | def __init__(self, cfg):
    method forward (line 205) | def forward(self, in_idx):
  function generate_text_simple (line 217) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function text_to_token_ids (line 247) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 253) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 258) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 265) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 282) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 291) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function train_model_simple_with_timing (line 305) | def train_model_simple_with_timing(model, train_loader, val_loader, opti...
  function plot_losses (line 387) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function main (line 410) | def main(gpt_config, settings):

FILE: ch05/10_llm-training-speed/01_opt_single_gpu.py
  class GPTDatasetV1 (line 22) | class GPTDatasetV1(Dataset):
    method __init__ (line 23) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 37) | def __len__(self):
    method __getitem__ (line 40) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 44) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class PyTorchMultiHeadAttention (line 64) | class PyTorchMultiHeadAttention(nn.Module):
    method __init__ (line 65) | def __init__(self, d_in, d_out, num_heads, dropout=0.0, qkv_bias=False):
    method forward (line 78) | def forward(self, x):
  class FeedForward (line 111) | class FeedForward(nn.Module):
    method __init__ (line 112) | def __init__(self, cfg):
    method forward (line 120) | def forward(self, x):
  class TransformerBlock (line 124) | class TransformerBlock(nn.Module):
    method __init__ (line 125) | def __init__(self, cfg):
    method forward (line 138) | def forward(self, x):
  class GPTModel (line 156) | class GPTModel(nn.Module):
    method __init__ (line 157) | def __init__(self, cfg):
    method forward (line 169) | def forward(self, in_idx):
  function generate_text_simple (line 181) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function text_to_token_ids (line 211) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 217) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 222) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 229) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 246) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 255) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function train_model_simple_with_timing (line 269) | def train_model_simple_with_timing(model, train_loader, val_loader, opti...
  function plot_losses (line 351) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function main (line 374) | def main(gpt_config, settings):

FILE: ch05/10_llm-training-speed/02_opt_multi_gpu_ddp.py
  function ddp_setup (line 27) | def ddp_setup(rank, world_size):
  class GPTDatasetV1 (line 58) | class GPTDatasetV1(Dataset):
    method __init__ (line 59) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 73) | def __len__(self):
    method __getitem__ (line 76) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 82) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class PyTorchMultiHeadAttention (line 107) | class PyTorchMultiHeadAttention(nn.Module):
    method __init__ (line 108) | def __init__(self, d_in, d_out, num_heads, dropout=0.0, qkv_bias=False):
    method forward (line 121) | def forward(self, x):
  class FeedForward (line 154) | class FeedForward(nn.Module):
    method __init__ (line 155) | def __init__(self, cfg):
    method forward (line 163) | def forward(self, x):
  class TransformerBlock (line 167) | class TransformerBlock(nn.Module):
    method __init__ (line 168) | def __init__(self, cfg):
    method forward (line 181) | def forward(self, x):
  class GPTModel (line 199) | class GPTModel(nn.Module):
    method __init__ (line 200) | def __init__(self, cfg):
    method forward (line 212) | def forward(self, in_idx):
  function generate_text_simple (line 224) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function text_to_token_ids (line 254) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 260) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 265) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 272) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 289) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 298) | def generate_and_print_sample(model, device, start_context):
  function train_model_simple_with_timing (line 314) | def train_model_simple_with_timing(model, train_loader, val_loader, opti...
  function plot_losses (line 416) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function main (line 440) | def main(gpt_config, settings, rank, world_size):

FILE: ch05/11_qwen3/qwen3-chat-interface/qwen3-chat-interface-multiturn.py
  function get_qwen_config (line 33) | def get_qwen_config(name):
  function build_repo_and_local (line 53) | def build_repo_and_local(model_name, reasoning, local_dir_arg):
  function get_device (line 60) | def get_device(name):
  function get_model_and_tokenizer (line 76) | def get_model_and_tokenizer(qwen3_config, repo_id, local_dir, device, us...
  function build_prompt_from_history (line 99) | def build_prompt_from_history(history, add_assistant_header=True):
  function on_start (line 125) | async def on_start():
  function main (line 133) | async def main(message: chainlit.Message):

FILE: ch05/11_qwen3/qwen3-chat-interface/qwen3-chat-interface.py
  function get_qwen_config (line 32) | def get_qwen_config(name):
  function build_repo_and_local (line 52) | def build_repo_and_local(model_name, reasoning, local_dir_arg):
  function get_device (line 59) | def get_device(name):
  function get_model_and_tokenizer (line 75) | def get_model_and_tokenizer(qwen3_config, repo_id, local_dir, device, us...
  function on_start (line 105) | async def on_start():
  function main (line 113) | async def main(message: chainlit.Message):

FILE: ch05/11_qwen3/tests/test_qwen3_kvcache_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function dummy_cfg_moe (line 50) | def dummy_cfg_moe(dummy_cfg_base):
  function test_dummy_qwen3_forward (line 61) | def test_dummy_qwen3_forward(dummy_cfg_base, dummy_input, import_noteboo...
  function test_qwen3_base_equivalence_with_transformers (line 71) | def test_qwen3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/11_qwen3/tests/test_qwen3_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function dummy_cfg_moe (line 50) | def dummy_cfg_moe(dummy_cfg_base):
  function test_dummy_qwen3_forward (line 61) | def test_dummy_qwen3_forward(dummy_cfg_base, dummy_input, import_noteboo...
  function test_qwen3_base_equivalence_with_transformers (line 71) | def test_qwen3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/12_gemma3/tests/test_gemma3_kv_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_gemma3_forward (line 53) | def test_dummy_gemma3_forward(dummy_cfg_base, dummy_input, import_notebo...
  function test_gemma3_base_equivalence_with_transformers (line 62) | def test_gemma3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/12_gemma3/tests/test_gemma3_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_gemma3_forward (line 53) | def test_dummy_gemma3_forward(dummy_cfg_base, dummy_input, import_notebo...
  function test_gemma3_base_equivalence_with_transformers (line 62) | def test_gemma3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/13_olmo3/tests/olmo3_layer_debugger.py
  function tiny_debug_config (line 20) | def tiny_debug_config():
  function yarn_debug_config (line 46) | def yarn_debug_config():
  function _hf_config_from_dict (line 74) | def _hf_config_from_dict(cfg):
  function load_notebook_defs (line 114) | def load_notebook_defs(nb_name="standalone-olmo3.ipynb"):
  function build_olmo3_pair (line 119) | def build_olmo3_pair(import_notebook_defs, cfg, hf_checkpoint=None):
  function _attach_debug_hooks (line 143) | def _attach_debug_hooks(model, is_hf):
  function _layer_sort_key (line 169) | def _layer_sort_key(name):
  function layerwise_differences (line 182) | def layerwise_differences(ours, hf_model, input_ids, rtol=1e-5, atol=1e-5):
  function first_mismatch (line 244) | def first_mismatch(differences):
  function format_report (line 251) | def format_report(differences):

FILE: ch05/13_olmo3/tests/test_olmo3_kvcache_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_olmo3_forward (line 58) | def test_dummy_olmo3_forward(dummy_cfg_base, dummy_input, import_noteboo...
  function test_olmo3_base_equivalence_with_transformers (line 68) | def test_olmo3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/13_olmo3/tests/test_olmo3_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_olmo3_forward (line 58) | def test_dummy_olmo3_forward(dummy_cfg_base, dummy_input, import_noteboo...
  function test_olmo3_base_equivalence_with_transformers (line 68) | def test_olmo3_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/15_tiny-aya/tests/test_tiny_aya_kvcache_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_tiny_aya_forward (line 55) | def test_dummy_tiny_aya_forward(dummy_cfg_base, dummy_input, import_note...
  function test_tiny_aya_base_equivalence_with_transformers (line 65) | def test_tiny_aya_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/15_tiny-aya/tests/test_tiny_aya_nb.py
  function import_notebook_defs (line 19) | def import_notebook_defs():
  function dummy_input (line 26) | def dummy_input():
  function dummy_cfg_base (line 32) | def dummy_cfg_base():
  function test_dummy_tiny_aya_forward (line 54) | def test_dummy_tiny_aya_forward(dummy_cfg_base, dummy_input, import_note...
  function test_tiny_aya_base_equivalence_with_transformers (line 64) | def test_tiny_aya_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch05/15_tiny-aya/tests/tiny_aya_layer_debugger.py
  function tiny_debug_config (line 19) | def tiny_debug_config():
  function _hf_config_from_dict (line 41) | def _hf_config_from_dict(cfg):
  function load_notebook_defs (line 65) | def load_notebook_defs(nb_name="standalone-tiny-aya.ipynb"):
  function build_tiny_aya_pair (line 70) | def build_tiny_aya_pair(import_notebook_defs, cfg, hf_checkpoint=None):
  function _attach_debug_hooks (line 93) | def _attach_debug_hooks(model, is_hf):
  function _layer_sort_key (line 125) | def _layer_sort_key(name):
  function layerwise_differences (line 138) | def layerwise_differences(ours, hf_model, input_ids, rtol=1e-5, atol=1e-5):
  function format_report (line 199) | def format_report(differences):

FILE: ch05/16_qwen3.5/qwen3_5_transformers.py
  class _NotebookLogger (line 25) | class _NotebookLogger:
    method __init__ (line 26) | def __init__(self):
    method warning_once (line 29) | def warning_once(self, msg):
  class Qwen3_5Config (line 40) | class Qwen3_5Config:
  class Qwen3_5DynamicCache (line 44) | class Qwen3_5DynamicCache:
  class Qwen3_5RMSNormGated (line 48) | class Qwen3_5RMSNormGated(nn.Module):
    method __init__ (line 49) | def __init__(self, hidden_size, eps=1e-6, **kwargs):
    method forward (line 54) | def forward(self, hidden_states, gate=None):
  function apply_mask_to_padding_states (line 66) | def apply_mask_to_padding_states(hidden_states, attention_mask):
  function torch_causal_conv1d_update (line 78) | def torch_causal_conv1d_update(
  function l2norm (line 96) | def l2norm(x, dim=-1, eps=1e-6):
  function torch_chunk_gated_delta_rule (line 102) | def torch_chunk_gated_delta_rule(
  function torch_recurrent_gated_delta_rule (line 182) | def torch_recurrent_gated_delta_rule(
  class Qwen3_5GatedDeltaNet (line 226) | class Qwen3_5GatedDeltaNet(nn.Module):
    method __init__ (line 227) | def __init__(self, config, layer_idx):
    method forward (line 296) | def forward(

FILE: ch05/16_qwen3.5/tests/qwen3_5_layer_debugger.py
  function _import_qwen3_5_classes (line 14) | def _import_qwen3_5_classes():
  function tiny_debug_config (line 44) | def tiny_debug_config():
  function _hf_config_from_dict (line 68) | def _hf_config_from_dict(cfg):
  function load_notebook_defs (line 105) | def load_notebook_defs(nb_name="qwen3.5.ipynb"):
  function build_qwen3_5_pair (line 112) | def build_qwen3_5_pair(import_notebook_defs, cfg, hf_checkpoint=None):
  function _attach_debug_hooks (line 140) | def _attach_debug_hooks(model, is_hf):
  function _layer_sort_key (line 174) | def _layer_sort_key(name):
  function layerwise_differences (line 187) | def layerwise_differences(ours, hf_model, input_ids, rtol=1e-5, atol=1e-5):
  function format_report (line 248) | def format_report(differences):

FILE: ch05/16_qwen3.5/tests/test_qwen3_5_nb.py
  function _import_qwen3_5_classes (line 16) | def _import_qwen3_5_classes():
  function import_notebook_defs (line 51) | def import_notebook_defs():
  function dummy_input (line 61) | def dummy_input():
  function dummy_cfg_base (line 67) | def dummy_cfg_base():
  function test_dummy_qwen3_5_forward (line 92) | def test_dummy_qwen3_5_forward(dummy_cfg_base, dummy_input, import_noteb...
  function test_qwen3_5_base_equivalence_with_transformers (line 103) | def test_qwen3_5_base_equivalence_with_transformers(import_notebook_defs):

FILE: ch06/01_main-chapter-code/gpt_class_finetune.py
  function download_and_unzip_spam_data (line 24) | def download_and_unzip_spam_data(url, zip_path, extracted_path, data_fil...
  function create_balanced_dataset (line 47) | def create_balanced_dataset(df):
  function random_split (line 60) | def random_split(df, train_frac, validation_frac):
  class SpamDataset (line 76) | class SpamDataset(Dataset):
    method __init__ (line 77) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 101) | def __getitem__(self, index):
    method __len__ (line 109) | def __len__(self):
    method _longest_encoded_length (line 112) | def _longest_encoded_length(self):
  function calc_accuracy_loader (line 124) | def calc_accuracy_loader(data_loader, model, device, num_batches=None):
  function calc_loss_batch (line 147) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 154) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 171) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function train_classifier_simple (line 180) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...
  function plot_values (line 218) | def plot_values(epochs_seen, examples_seen, train_values, val_values, la...

FILE: ch06/01_main-chapter-code/gpt_download.py
  function download_and_load_gpt2 (line 17) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 49) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 131) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: ch06/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 21) | class GPTDatasetV1(Dataset):
    method __init__ (line 22) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 36) | def __len__(self):
    method __getitem__ (line 39) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 43) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 61) | class MultiHeadAttention(nn.Module):
    method __init__ (line 62) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 77) | def forward(self, x):
  class LayerNorm (line 120) | class LayerNorm(nn.Module):
    method __init__ (line 121) | def __init__(self, emb_dim):
    method forward (line 127) | def forward(self, x):
  class GELU (line 134) | class GELU(nn.Module):
    method __init__ (line 135) | def __init__(self):
    method forward (line 138) | def forward(self, x):
  class FeedForward (line 145) | class FeedForward(nn.Module):
    method __init__ (line 146) | def __init__(self, cfg):
    method forward (line 154) | def forward(self, x):
  class TransformerBlock (line 158) | class TransformerBlock(nn.Module):
    method __init__ (line 159) | def __init__(self, cfg):
    method forward (line 173) | def forward(self, x):
  class GPTModel (line 191) | class GPTModel(nn.Module):
    method __init__ (line 192) | def __init__(self, cfg):
    method forward (line 204) | def forward(self, in_idx):
  function generate_text_simple (line 216) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function assign (line 245) | def assign(left, right):
  function load_weights_into_gpt (line 251) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 312) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 318) | def token_ids_to_text(token_ids, tokenizer):

FILE: ch06/01_main-chapter-code/tests.py
  function test_gpt_class_finetune (line 12) | def test_gpt_class_finetune():

FILE: ch06/02_bonus_additional-experiments/additional_experiments.py
  class LoRALayer (line 32) | class LoRALayer(torch.nn.Module):
    method __init__ (line 33) | def __init__(self, in_dim, out_dim, rank, alpha):
    method forward (line 40) | def forward(self, x):
  class LinearWithLoRA (line 45) | class LinearWithLoRA(torch.nn.Module):
    method __init__ (line 46) | def __init__(self, linear, rank, alpha):
    method forward (line 53) | def forward(self, x):
  class LinearWithLoRAMerged (line 58) | class LinearWithLoRAMerged(torch.nn.Module):
    method __init__ (line 59) | def __init__(self, linear, rank, alpha):
    method forward (line 66) | def forward(self, x):
  class SpamDataset (line 72) | class SpamDataset(Dataset):
    method __init__ (line 73) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 90) | def __getitem__(self, index):
    method __len__ (line 95) | def __len__(self):
    method _longest_encoded_length (line 98) | def _longest_encoded_length(self, tokenizer):
  function download_and_unzip (line 110) | def download_and_unzip(url, zip_path, extract_to, new_file_path):
  function random_split (line 133) | def random_split(df, train_frac, val_frac):
  function create_dataset_csvs (line 149) | def create_dataset_csvs(new_file_path):
  function instantiate_model (line 166) | def instantiate_model(choose_model, load_weights):
  function calc_loss_batch (line 197) | def calc_loss_batch(input_batch, target_batch, model, device,
  function calc_loss_loader (line 231) | def calc_loss_loader(data_loader, model, device,
  function calc_accuracy_loader (line 257) | def calc_accuracy_loader(data_loader, model, device, num_batches=None,
  function evaluate_model (line 310) | def evaluate_model(model, train_loader, val_loader, device,
  function train_classifier_simple (line 329) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...
  function replace_linear_with_lora (line 398) | def replace_linear_with_lora(model, rank, alpha, alternative=False):

FILE: ch06/02_bonus_additional-experiments/gpt_download.py
  function download_and_load_gpt2 (line 17) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 49) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 131) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: ch06/02_bonus_additional-experiments/previous_chapters.py
  class GPTDatasetV1 (line 21) | class GPTDatasetV1(Dataset):
    method __init__ (line 22) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 36) | def __len__(self):
    method __getitem__ (line 39) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 43) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 61) | class MultiHeadAttention(nn.Module):
    method __init__ (line 62) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 80) | def forward(self, x):
  class LayerNorm (line 124) | class LayerNorm(nn.Module):
    method __init__ (line 125) | def __init__(self, emb_dim):
    method forward (line 131) | def forward(self, x):
  class GELU (line 138) | class GELU(nn.Module):
    method __init__ (line 139) | def __init__(self):
    method forward (line 142) | def forward(self, x):
  class FeedForward (line 149) | class FeedForward(nn.Module):
    method __init__ (line 150) | def __init__(self, cfg):
    method forward (line 158) | def forward(self, x):
  class TransformerBlock (line 162) | class TransformerBlock(nn.Module):
    method __init__ (line 163) | def __init__(self, cfg, disable_causal_mask=False):
    method forward (line 179) | def forward(self, x):
  class GPTModel (line 197) | class GPTModel(nn.Module):
    method __init__ (line 198) | def __init__(self, cfg, disable_causal_mask=False):
    method forward (line 210) | def forward(self, in_idx):
  function generate_text_simple (line 222) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function assign (line 251) | def assign(left, right):
  function load_weights_into_gpt (line 257) | def load_weights_into_gpt(gpt, params):
  function generate (line 318) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...

FILE: ch06/03_bonus_imdb-classification/download_prepare_dataset.py
  function reporthook (line 14) | def reporthook(count, block_size, total_size):
  function download_and_extract_dataset (line 31) | def download_and_extract_dataset(dataset_url, target_file, directory):
  function load_dataset_to_dataframe (line 51) | def load_dataset_to_dataframe(basepath="aclImdb", labels={"pos": 1, "neg...
  function partition_and_save (line 66) | def partition_and_save(df, sizes=(35000, 5000, 10000)):

FILE: ch06/03_bonus_imdb-classification/gpt_download.py
  function download_and_load_gpt2 (line 17) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 49) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 131) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: ch06/03_bonus_imdb-classification/previous_chapters.py
  class GPTDatasetV1 (line 21) | class GPTDatasetV1(Dataset):
    method __init__ (line 22) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 37) | def __len__(self):
    method __getitem__ (line 40) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 44) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 62) | class MultiHeadAttention(nn.Module):
    method __init__ (line 63) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 78) | def forward(self, x):
  class LayerNorm (line 121) | class LayerNorm(nn.Module):
    method __init__ (line 122) | def __init__(self, emb_dim):
    method forward (line 128) | def forward(self, x):
  class GELU (line 135) | class GELU(nn.Module):
    method __init__ (line 136) | def __init__(self):
    method forward (line 139) | def forward(self, x):
  class FeedForward (line 146) | class FeedForward(nn.Module):
    method __init__ (line 147) | def __init__(self, cfg):
    method forward (line 155) | def forward(self, x):
  class TransformerBlock (line 159) | class TransformerBlock(nn.Module):
    method __init__ (line 160) | def __init__(self, cfg):
    method forward (line 174) | def forward(self, x):
  class GPTModel (line 192) | class GPTModel(nn.Module):
    method __init__ (line 193) | def __init__(self, cfg):
    method forward (line 205) | def forward(self, in_idx):
  function generate_text_simple (line 217) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function assign (line 246) | def assign(left, right):
  function load_weights_into_gpt (line 252) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 313) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 319) | def token_ids_to_text(token_ids, tokenizer):

FILE: ch06/03_bonus_imdb-classification/train_bert_hf.py
  class IMDbDataset (line 18) | class IMDbDataset(Dataset):
    method __init__ (line 19) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method _create_attention_mask (line 43) | def _create_attention_mask(self, encoded_text):
    method __getitem__ (line 46) | def __getitem__(self, index):
    method __len__ (line 61) | def __len__(self):
    method _longest_encoded_length (line 64) | def _longest_encoded_length(self, tokenizer):
  function calc_loss_batch (line 73) | def calc_loss_batch(input_batch, attention_mask_batch, target_batch, mod...
  function calc_loss_loader (line 83) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function calc_accuracy_loader (line 101) | def calc_accuracy_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 123) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function train_classifier_simple (line 132) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...

FILE: ch06/03_bonus_imdb-classification/train_bert_hf_spam.py
  class SpamDataset (line 21) | class SpamDataset(Dataset):
    method __init__ (line 22) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 39) | def __getitem__(self, index):
    method __len__ (line 44) | def __len__(self):
    method _longest_encoded_length (line 47) | def _longest_encoded_length(self, tokenizer):
  function download_and_unzip (line 59) | def download_and_unzip(url, zip_path, extract_to, new_file_path):
  function random_split (line 82) | def random_split(df, train_frac, val_frac):
  function create_dataset_csvs (line 98) | def create_dataset_csvs(new_file_path):
  class SPAMDataset (line 115) | class SPAMDataset(Dataset):
    method __init__ (line 116) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method _create_attention_mask (line 140) | def _create_attention_mask(self, encoded_text):
    method __getitem__ (line 143) | def __getitem__(self, index):
    method __len__ (line 158) | def __len__(self):
    method _longest_encoded_length (line 161) | def _longest_encoded_length(self, tokenizer):
  function calc_loss_batch (line 170) | def calc_loss_batch(input_batch, attention_mask_batch, target_batch, mod...
  function calc_loss_loader (line 180) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function calc_accuracy_loader (line 198) | def calc_accuracy_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 220) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function train_classifier_simple (line 229) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...

FILE: ch06/03_bonus_imdb-classification/train_gpt.py
  class IMDbDataset (line 20) | class IMDbDataset(Dataset):
    method __init__ (line 21) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 36) | def __getitem__(self, index):
    method __len__ (line 41) | def __len__(self):
    method _longest_encoded_length (line 44) | def _longest_encoded_length(self, tokenizer):
  function instantiate_model (line 53) | def instantiate_model(choose_model, load_weights):
  function calc_loss_batch (line 84) | def calc_loss_batch(input_batch, target_batch, model, device,
  function calc_loss_loader (line 100) | def calc_loss_loader(data_loader, model, device,
  function calc_accuracy_loader (line 125) | def calc_accuracy_loader(data_loader, model, device,
  function evaluate_model (line 156) | def evaluate_model(model, train_loader, val_loader, device, eval_iter,
  function train_classifier_simple (line 172) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...

FILE: ch06/03_bonus_imdb-classification/train_sklearn_logreg.py
  function load_dataframes (line 14) | def load_dataframes():
  function eval_model (line 22) | def eval_model(model, X_train, y_train, X_val, y_val, X_test, y_test):

FILE: ch06/04_user_interface/app.py
  function get_model_and_tokenizer (line 22) | def get_model_and_tokenizer():
  function main (line 69) | async def main(message: chainlit.Message):

FILE: ch07/01_main-chapter-code/exercise_experiments.py
  class InstructionDataset (line 37) | class InstructionDataset(Dataset):
    method __init__ (line 38) | def __init__(self, data, tokenizer):
    method __getitem__ (line 51) | def __getitem__(self, index):
    method __len__ (line 54) | def __len__(self):
  class InstructionDatasetWithMasking (line 58) | class InstructionDatasetWithMasking(Dataset):
    method __init__ (line 59) | def __init__(self, data, tokenizer):
    method __getitem__ (line 79) | def __getitem__(self, index):
    method __len__ (line 83) | def __len__(self):
  class InstructionDatasetPhi (line 87) | class InstructionDatasetPhi(Dataset):
    method __init__ (line 88) | def __init__(self, data, tokenizer):
    method __getitem__ (line 105) | def __getitem__(self, index):
    method __len__ (line 108) | def __len__(self):
  class LinearWithLoRA (line 112) | class LinearWithLoRA(torch.nn.Module):
    method __init__ (line 113) | def __init__(self, linear, rank, alpha):
    method forward (line 120) | def forward(self, x):
  class LoRALayer (line 124) | class LoRALayer(torch.nn.Module):
    method __init__ (line 125) | def __init__(self, in_dim, out_dim, rank, alpha):
    method forward (line 132) | def forward(self, x):
  function replace_linear_with_lora (line 137) | def replace_linear_with_lora(model, rank, alpha):
  function custom_collate_fn (line 147) | def custom_collate_fn(
  function custom_collate_with_masking_fn (line 190) | def custom_collate_with_masking_fn(
  function download_and_load_file (line 236) | def download_and_load_file(file_path, url):
  function format_input_phi (line 253) | def format_input_phi(entry):
  function format_input (line 263) | def format_input(entry):
  function plot_losses (line 275) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses, plot...
  function main (line 297) | def main(mask_instructions=False, alpaca52k=False, phi3_prompt=False, lo...

FILE: ch07/01_main-chapter-code/gpt_download.py
  function download_and_load_gpt2 (line 16) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 48) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 95) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: ch07/01_main-chapter-code/gpt_instruction_finetuning.py
  class InstructionDataset (line 35) | class InstructionDataset(Dataset):
    method __init__ (line 36) | def __init__(self, data, tokenizer):
    method __getitem__ (line 49) | def __getitem__(self, index):
    method __len__ (line 52) | def __len__(self):
  function custom_collate_fn (line 56) | def custom_collate_fn(
  function download_and_load_file (line 99) | def download_and_load_file(file_path, url):
  function format_input (line 113) | def format_input(entry):
  function plot_losses (line 125) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function main (line 147) | def main(test_mode=False):

FILE: ch07/01_main-chapter-code/ollama_evaluate.py
  function query_model (line 14) | def query_model(prompt, model="llama3", url="http://localhost:11434/api/...
  function check_if_running (line 42) | def check_if_running(process_name):
  function format_input (line 51) | def format_input(entry):
  function main (line 63) | def main(file_path):
  function generate_model_scores (line 79) | def generate_model_scores(json_data, json_key, model="llama3"):

FILE: ch07/01_main-chapter-code/previous_chapters.py
  class GPTDatasetV1 (line 25) | class GPTDatasetV1(Dataset):
    method __init__ (line 26) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 41) | def __len__(self):
    method __getitem__ (line 44) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 48) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 66) | class MultiHeadAttention(nn.Module):
    method __init__ (line 67) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 82) | def forward(self, x):
  class LayerNorm (line 125) | class LayerNorm(nn.Module):
    method __init__ (line 126) | def __init__(self, emb_dim):
    method forward (line 132) | def forward(self, x):
  class GELU (line 139) | class GELU(nn.Module):
    method __init__ (line 140) | def __init__(self):
    method forward (line 143) | def forward(self, x):
  class FeedForward (line 150) | class FeedForward(nn.Module):
    method __init__ (line 151) | def __init__(self, cfg):
    method forward (line 159) | def forward(self, x):
  class TransformerBlock (line 163) | class TransformerBlock(nn.Module):
    method __init__ (line 164) | def __init__(self, cfg):
    method forward (line 178) | def forward(self, x):
  class GPTModel (line 196) | class GPTModel(nn.Module):
    method __init__ (line 197) | def __init__(self, cfg):
    method forward (line 209) | def forward(self, in_idx):
  function generate_text_simple (line 221) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function generate (line 250) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...
  function train_model_simple (line 293) | def train_model_simple(model, train_loader, val_loader, optimizer, devic...
  function evaluate_model (line 329) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 338) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function assign (line 352) | def assign(left, right):
  function load_weights_into_gpt (line 358) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 419) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 425) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 430) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 437) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function plot_losses (line 456) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):

FILE: ch07/01_main-chapter-code/tests.py
  function test_gpt_class_finetune (line 12) | def test_gpt_class_finetune():

FILE: ch07/02_dataset-utilities/find-near-duplicates.py
  function preprocess_text (line 33) | def preprocess_text(text):
  function find_near_duplicates (line 41) | def find_near_duplicates(json_data, threshold=0.75, key="instruction"):
  function find_print_and_remove_near_duplicates (line 76) | def find_print_and_remove_near_duplicates(json_data, remove_duplicates=F...

FILE: ch07/04_preference-tuning-with-dpo/previous_chapters.py
  class GPTDatasetV1 (line 25) | class GPTDatasetV1(Dataset):
    method __init__ (line 26) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 41) | def __len__(self):
    method __getitem__ (line 44) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 48) | def create_dataloader_v1(txt, batch_size=4, max_length=256,
  class MultiHeadAttention (line 66) | class MultiHeadAttention(nn.Module):
    method __init__ (line 67) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 82) | def forward(self, x):
  class LayerNorm (line 125) | class LayerNorm(nn.Module):
    method __init__ (line 126) | def __init__(self, emb_dim):
    method forward (line 132) | def forward(self, x):
  class GELU (line 139) | class GELU(nn.Module):
    method __init__ (line 140) | def __init__(self):
    method forward (line 143) | def forward(self, x):
  class FeedForward (line 150) | class FeedForward(nn.Module):
    method __init__ (line 151) | def __init__(self, cfg):
    method forward (line 159) | def forward(self, x):
  class TransformerBlock (line 163) | class TransformerBlock(nn.Module):
    method __init__ (line 164) | def __init__(self, cfg):
    method forward (line 178) | def forward(self, x):
  class GPTModel (line 196) | class GPTModel(nn.Module):
    method __init__ (line 197) | def __init__(self, cfg):
    method forward (line 209) | def forward(self, in_idx):
  function generate_text_simple (line 221) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  function generate (line 250) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...
  function train_model_simple (line 294) | def train_model_simple(model, train_loader, val_loader, optimizer, devic...
  function evaluate_model (line 330) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 339) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function assign (line 353) | def assign(left, right):
  function load_weights_into_gpt (line 359) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 420) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 426) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 431) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 438) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function plot_losses (line 457) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses, labe...

FILE: ch07/06_user_interface/app.py
  function get_model_and_tokenizer (line 26) | def get_model_and_tokenizer():
  function extract_response (line 60) | def extract_response(response_text, input_text):
  function main (line 69) | async def main(message: chainlit.Message):

FILE: conftest.py
  function _get_env_number (line 8) | def _get_env_number(name, default, cast):
  function pytest_configure (line 19) | def pytest_configure(config):

FILE: pkg/llms_from_scratch/appendix_a.py
  class NeuralNetwork (line 10) | class NeuralNetwork(torch.nn.Module):
    method __init__ (line 11) | def __init__(self, num_inputs, num_outputs):
    method forward (line 28) | def forward(self, x):
  class ToyDataset (line 33) | class ToyDataset(Dataset):
    method __init__ (line 34) | def __init__(self, X, y):
    method __getitem__ (line 38) | def __getitem__(self, index):
    method __len__ (line 43) | def __len__(self):

FILE: pkg/llms_from_scratch/appendix_d.py
  function find_highest_gradient (line 12) | def find_highest_gradient(model):
  function train_model (line 23) | def train_model(model, train_loader, val_loader, optimizer, device,

FILE: pkg/llms_from_scratch/appendix_e.py
  class LoRALayer (line 10) | class LoRALayer(torch.nn.Module):
    method __init__ (line 11) | def __init__(self, in_dim, out_dim, rank, alpha):
    method forward (line 19) | def forward(self, x):
  class LinearWithLoRA (line 25) | class LinearWithLoRA(torch.nn.Module):
    method __init__ (line 26) | def __init__(self, linear, rank, alpha):
    method forward (line 33) | def forward(self, x):
  function replace_linear_with_lora (line 37) | def replace_linear_with_lora(model, rank, alpha):

FILE: pkg/llms_from_scratch/ch02.py
  class GPTDatasetV1 (line 11) | class GPTDatasetV1(Dataset):
    method __init__ (line 12) | def __init__(self, txt, tokenizer, max_length, stride):
    method __len__ (line 27) | def __len__(self):
    method __getitem__ (line 30) | def __getitem__(self, idx):
  function create_dataloader_v1 (line 34) | def create_dataloader_v1(txt, batch_size=4, max_length=256,

FILE: pkg/llms_from_scratch/ch03.py
  class SelfAttention_v1 (line 10) | class SelfAttention_v1(nn.Module):
    method __init__ (line 12) | def __init__(self, d_in, d_out):
    method forward (line 18) | def forward(self, x):
  class SelfAttention_v2 (line 32) | class SelfAttention_v2(nn.Module):
    method __init__ (line 34) | def __init__(self, d_in, d_out, qkv_bias=False):
    method forward (line 40) | def forward(self, x):
  class CausalAttention (line 52) | class CausalAttention(nn.Module):
    method __init__ (line 54) | def __init__(self, d_in, d_out, context_length,
    method forward (line 64) | def forward(self, x):
  class MultiHeadAttentionWrapper (line 86) | class MultiHeadAttentionWrapper(nn.Module):
    method __init__ (line 87) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 94) | def forward(self, x):
  class MultiHeadAttention (line 98) | class MultiHeadAttention(nn.Module):
    method __init__ (line 99) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 114) | def forward(self, x):
  class PyTorchMultiHeadAttention (line 159) | class PyTorchMultiHeadAttention(nn.Module):
    method __init__ (line 160) | def __init__(self, d_in, d_out, num_heads, dropout=0.0, qkv_bias=False):
    method forward (line 173) | def forward(self, x):

FILE: pkg/llms_from_scratch/ch04.py
  class LayerNorm (line 11) | class LayerNorm(nn.Module):
    method __init__ (line 12) | def __init__(self, emb_dim):
    method forward (line 18) | def forward(self, x):
  class GELU (line 25) | class GELU(nn.Module):
    method __init__ (line 26) | def __init__(self):
    method forward (line 29) | def forward(self, x):
  class FeedForward (line 36) | class FeedForward(nn.Module):
    method __init__ (line 37) | def __init__(self, cfg):
    method forward (line 45) | def forward(self, x):
  class TransformerBlock (line 49) | class TransformerBlock(nn.Module):
    method __init__ (line 50) | def __init__(self, cfg):
    method forward (line 64) | def forward(self, x):
  class GPTModel (line 82) | class GPTModel(nn.Module):
    method __init__ (line 83) | def __init__(self, cfg):
    method forward (line 95) | def forward(self, in_idx):
  function generate_text_simple (line 107) | def generate_text_simple(model, idx, max_new_tokens, context_size):
  class FeedForwardFast (line 137) | class FeedForwardFast(nn.Module):
    method __init__ (line 138) | def __init__(self, cfg):
    method forward (line 146) | def forward(self, x):
  class TransformerBlockFast (line 150) | class TransformerBlockFast(nn.Module):
    method __init__ (line 151) | def __init__(self, cfg):
    method forward (line 164) | def forward(self, x):
  class GPTModelFast (line 182) | class GPTModelFast(nn.Module):
    method __init__ (line 196) | def __init__(self, cfg):
    method forward (line 208) | def forward(self, in_idx):

FILE: pkg/llms_from_scratch/ch05.py
  function generate (line 19) | def generate(model, idx, max_new_tokens, context_size, temperature=0.0, ...
  function train_model_simple (line 62) | def train_model_simple(model, train_loader, val_loader, optimizer, devic...
  function evaluate_model (line 98) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function generate_and_print_sample (line 107) | def generate_and_print_sample(model, tokenizer, device, start_context):
  function assign (line 121) | def assign(left, right):
  function load_weights_into_gpt (line 127) | def load_weights_into_gpt(gpt, params):
  function text_to_token_ids (line 188) | def text_to_token_ids(text, tokenizer):
  function token_ids_to_text (line 194) | def token_ids_to_text(token_ids, tokenizer):
  function calc_loss_batch (line 199) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 206) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function plot_losses (line 225) | def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):
  function download_and_load_gpt2 (line 246) | def download_and_load_gpt2(model_size, models_dir):
  function download_file (line 280) | def download_file(url, destination, backup_url=None):
  function load_gpt2_params_from_tf_ckpt (line 327) | def load_gpt2_params_from_tf_ckpt(ckpt_path, settings):

FILE: pkg/llms_from_scratch/ch06.py
  function download_and_unzip_spam_data (line 18) | def download_and_unzip_spam_data(url, zip_path, extracted_path, data_fil...
  function create_balanced_dataset (line 41) | def create_balanced_dataset(df):
  function random_split (line 55) | def random_split(df, train_frac, validation_frac):
  class SpamDataset (line 71) | class SpamDataset(Dataset):
    method __init__ (line 72) | def __init__(self, csv_file, tokenizer, max_length=None, pad_token_id=...
    method __getitem__ (line 96) | def __getitem__(self, index):
    method __len__ (line 104) | def __len__(self):
    method _longest_encoded_length (line 107) | def _longest_encoded_length(self):
  function calc_accuracy_loader (line 119) | def calc_accuracy_loader(data_loader, model, device, num_batches=None):
  function calc_loss_batch (line 142) | def calc_loss_batch(input_batch, target_batch, model, device):
  function calc_loss_loader (line 149) | def calc_loss_loader(data_loader, model, device, num_batches=None):
  function evaluate_model (line 168) | def evaluate_model(model, train_loader, val_loader, device, eval_iter):
  function train_classifier_simple (line 177) | def train_classifier_simple(model, train_loader, val_loader, optimizer, ...
  function plot_values (line 215) | def plot_values(epochs_seen, examples_seen, train_values, val_values, la...
  function classify_review (line 235) | def classify_review(text, model, tokenizer, device, max_length=None, pad...

FILE: pkg/llms_from_scratch/ch07.py
  function download_and_load_file (line 16) | def download_and_load_file(file_path, url):
  function format_input (line 57) | def format_input(entry):
  class InstructionDataset (line 69) | class InstructionDataset(Dataset):
    method __init__ (line 70) | def __init__(self, data, tokenizer):
    method __getitem__ (line 83) | def __getitem__(self, index):
    method __len__ (line 86) | def __len__(self):
  function custom_collate_draft_1 (line 90) | def custom_collate_draft_1(
  function custom_collate_draft_2 (line 123) | def custom_collate_draft_2(
  function custom_collate_fn (line 154) | def custom_collate_fn(
  function check_if_running (line 200) | def check_if_running(process_name):
  function query_model (line 209) | def query_model(
  function generate_model_scores (line 241) | def generate_model_scores(json_data, json_key, model="llama3"):

FILE: pkg/llms_from_scratch/generate.py
  function trim_input_tensor (line 9) | def trim_input_tensor(input_ids_tensor, context_len, max_new_tokens):

FILE: pkg/llms_from_scratch/kv_cache/generate.py
  function generate_text_simple (line 11) | def generate_text_simple(model, idx, max_new_tokens, context_size=None, ...
  function generate_text_simple_stream (line 34) | def generate_text_simple_stream(model, token_ids, max_new_tokens, eos_to...

FILE: pkg/llms_from_scratch/kv_cache/gpt2.py
  class MultiHeadAttention (line 15) | class MultiHeadAttention(nn.Module):
    method __init__ (line 16) | def __init__(self, d_in, d_out, context_length, dropout, num_heads, qk...
    method forward (line 30) | def forward(self, x, use_cache=False, start_pos=0, cache=None):
  class LayerNorm (line 82) | class LayerNorm(nn.Module):
    method __init__ (line 83) | def __init__(self, emb_dim):
    method forward (line 89) | def forward(self, x):
  class GELU (line 96) | class GELU(nn.Module):
    method __init__ (line 97) | def __init__(self):
    method forward (line 100) | def forward(self, x):
  class FeedForward (line 107) | class FeedForward(nn.Module):
    method __init__ (line 108) | def __init__(self, cfg):
    method forward (line 116) | def forward(self, x):
  class TransformerBlock (line 120) | class TransformerBlock(nn.Module):
    method __init__ (line 121) | def __init__(self, cfg):
    method forward (line 135) | def forward(self, x, use_cache=False, start_pos=0, cache=None):
  class GPTModel (line 153) | class GPTModel(nn.Module):
    method __init__ (line 154) | def __init__(self, cfg):
    method forward (line 167) | def forward(self, in_idx, use_cache=False, cache=None):

FILE: pkg/llms_from_scratch/kv_cache/llama3.py
  class Llama3Model (line 54) | class Llama3Model(nn.Module):
    method __init__ (line 55) | def __init__(self, cfg):
    method forward (line 80) | def forward(self, in_idx, cache=None):
    method reset_kv_cache (line 112) | def reset_kv_cache(self):
  class TransformerBlock (line 116) | class TransformerBlock(nn.Module):
    method __init__ (line 117) | def __init__(self, cfg):
    method forward (line 130) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  class FeedForward (line 146) | class FeedForward(nn.Module):
    method __init__ (line 147) | def __init__(self, cfg):
    method forward (line 153) | def forward(self, x):
  class GroupedQueryAttention (line 160) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 161) | def __init__(
    method forward (line 180) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  function compute_rope_params (line 238) | def compute_rope_params(head_dim, theta_base=10_000, context_length=4096...
  function apply_rope (line 283) | def apply_rope(x, cos, sin, offset=0):
  class Llama3Tokenizer (line 309) | class Llama3Tokenizer:
    method __init__ (line 311) | def __init__(self, model_path):
    method encode (line 342) | def encode(self, text, bos=False, eos=False, **kwargs):
    method decode (line 349) | def decode(self, ids):
  class ChatFormat (line 353) | class ChatFormat:
    method __init__ (line 355) | def __init__(self, tokenizer: Llama3Tokenizer, *,
    method _header (line 360) | def _header(self, role):
    method encode (line 369) | def encode(self, user_message, system_message=None, allowed_special=No...
    method decode (line 389) | def decode(self, ids):
  function clean_text (line 393) | def clean_text(text, header_end="assistant<|end_header_id|>\n\n"):
  class GroupedQueryAttentionFast (line 409) | class GroupedQueryAttentionFast(nn.Module):
    method __init__ (line 415) | def __init__(self, d_in, d_out, num_heads, num_kv_groups, dtype=None):
    method forward (line 431) | def forward(self, x, cos, sin):
  class TransformerBlockFast (line 458) | class TransformerBlockFast(nn.Module):
    method __init__ (line 463) | def __init__(self, cfg):
    method forward (line 476) | def forward(self, x, cos, sin):
  class Llama3ModelFast (line 492) | class Llama3ModelFast(nn.Module):
    method __init__ (line 498) | def __init__(self, cfg):
    method forward (line 521) | def forward(self, in_idx):

FILE: pkg/llms_from_scratch/kv_cache/qwen3.py
  class Qwen3Model (line 19) | class Qwen3Model(nn.Module):
    method __init__ (line 20) | def __init__(self, cfg):
    method forward (line 47) | def forward(self, in_idx, cache=None):
    method reset_kv_cache (line 80) | def reset_kv_cache(self):
  class TransformerBlock (line 84) | class TransformerBlock(nn.Module):
    method __init__ (line 85) | def __init__(self, cfg):
    method forward (line 102) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  class FeedForward (line 118) | class FeedForward(nn.Module):
    method __init__ (line 119) | def __init__(self, cfg):
    method forward (line 125) | def forward(self, x):
  class MoEFeedForward (line 132) | class MoEFeedForward(nn.Module):
    method __init__ (line 133) | def __init__(self, cfg):
    method forward (line 147) | def forward(self, x):
  class GroupedQueryAttention (line 185) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 186) | def __init__(
    method forward (line 215) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  function compute_rope_params (line 261) | def compute_rope_params(head_dim, theta_base=10_000, context_length=4096...
  function apply_rope (line 283) | def apply_rope(x, cos, sin, offset=0):
  class RMSNorm (line 304) | class RMSNorm(nn.Module):
    method __init__ (line 305) | def __init__(self, emb_dim, eps=1e-6, bias=False, qwen3_compatible=True):
    method forward (line 312) | def forward(self, x):

FILE: pkg/llms_from_scratch/kv_cache/utils.py
  class KVCache (line 6) | class KVCache:
    method __init__ (line 7) | def __init__(self, n_layers):
    method get (line 10) | def get(self, layer_idx):
    method update (line 13) | def update(self, layer_idx, value):
    method get_all (line 16) | def get_all(self):
    method reset (line 19) | def reset(self):

FILE: pkg/llms_from_scratch/kv_cache_batched/generate.py
  function generate_text_simple (line 11) | def generate_text_simple(model, idx, max_new_tokens, context_size=None, ...

FILE: pkg/llms_from_scratch/kv_cache_batched/qwen3.py
  class Qwen3Model (line 19) | class Qwen3Model(nn.Module):
    method __init__ (line 20) | def __init__(self, cfg):
    method forward (line 47) | def forward(self, in_idx, cache=None, start_pos=None):
    method reset_kv_cache (line 80) | def reset_kv_cache(self, batch_size, device=None):
  class TransformerBlock (line 85) | class TransformerBlock(nn.Module):
    method __init__ (line 86) | def __init__(self, cfg):
    method forward (line 100) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  class FeedForward (line 116) | class FeedForward(nn.Module):
    method __init__ (line 117) | def __init__(self, cfg):
    method forward (line 123) | def forward(self, x):
  class GroupedQueryAttention (line 130) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 131) | def __init__(self, d_in, num_heads, num_kv_groups, head_dim=None, qk_n...
    method forward (line 158) | def forward(self, x, mask, cos, sin, start_pos=0, cache=None):
  function compute_rope_params (line 214) | def compute_rope_params(head_dim, theta_base=10_000, context_length=4096...
  function apply_rope (line 236) | def apply_rope(x, cos, sin, offset):
  class RMSNorm (line 266) | class RMSNorm(nn.Module):
    method __init__ (line 267) | def __init__(self, emb_dim, eps=1e-6, bias=False, qwen3_compatible=True):
    method forward (line 274) | def forward(self, x):

FILE: pkg/llms_from_scratch/kv_cache_batched/utils.py
  class KVCache (line 6) | class KVCache:
    method __init__ (line 7) | def __init__(self, n_layers, batch_size):
    method get (line 12) | def get(self, layer_idx, batch_idx):
    method update (line 15) | def update(self, layer_idx, batch_idx, value):
    method get_layer (line 18) | def get_layer(self, layer_idx):
    method reset (line 21) | def reset(self):

FILE: pkg/llms_from_scratch/llama3.py
  class Llama3Model (line 53) | class Llama3Model(nn.Module):
    method __init__ (line 54) | def __init__(self, cfg):
    method forward (line 78) | def forward(self, in_idx):
  class TransformerBlock (line 92) | class TransformerBlock(nn.Module):
    method __init__ (line 93) | def __init__(self, cfg):
    method forward (line 106) | def forward(self, x, mask, cos, sin):
  class FeedForward (line 122) | class FeedForward(nn.Module):
    method __init__ (line 123) | def __init__(self, cfg):
    method forward (line 129) | def forward(self, x):
  class GroupedQueryAttention (line 136) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 137) | def __init__(
    method forward (line 156) | def forward(self, x, mask, cos, sin):
  function compute_rope_params (line 260) | def compute_rope_params(head_dim, theta_base=10_000, context_length=4096...
  function apply_rope (line 305) | def apply_rope(x, cos, sin):
  class Llama3Tokenizer (line 331) | class Llama3Tokenizer:
    method __init__ (line 333) | def __init__(self, model_path):
    method encode (line 364) | def encode(self, text, bos=False, eos=False, **kwargs):
    method decode (line 371) | def decode(self, ids):
  class ChatFormat (line 375) | class ChatFormat:
    method __init__ (line 377) | def __init__(self, tokenizer: Llama3Tokenizer, *,
    method _header (line 382) | def _header(self, role):
    method encode (line 391) | def encode(self, user_message, system_message=None, allowed_special=No...
    method decode (line 411) | def decode(self, ids):
  function clean_text (line 415) | def clean_text(text, header_end="assistant<|end_header_id|>\n\n"):
  class GroupedQueryAttentionFast (line 431) | class GroupedQueryAttentionFast(nn.Module):
    method __init__ (line 437) | def __init__(self, d_in, d_out, num_heads, num_kv_groups, dtype=None):
    method forward (line 453) | def forward(self, x, cos, sin):
  class TransformerBlockFast (line 480) | class TransformerBlockFast(nn.Module):
    method __init__ (line 485) | def __init__(self, cfg):
    method forward (line 498) | def forward(self, x, cos, sin):
  class Llama3ModelFast (line 514) | class Llama3ModelFast(nn.Module):
    method __init__ (line 520) | def __init__(self, cfg):
    method forward (line 543) | def forward(self, in_idx):
  function assign (line 554) | def assign(left, right, tensor_name="unknown"):
  function load_weights_into_llama (line 567) | def load_weights_into_llama(model, param_config, params):

FILE: pkg/llms_from_scratch/qwen3.py
  class Qwen3Model (line 123) | class Qwen3Model(nn.Module):
    method __init__ (line 124) | def __init__(self, cfg):
    method forward (line 150) | def forward(self, in_idx):
  class TransformerBlock (line 165) | class TransformerBlock(nn.Module):
    method __init__ (line 166) | def __init__(self, cfg):
    method forward (line 183) | def forward(self, x, mask, cos, sin):
  class FeedForward (line 199) | class FeedForward(nn.Module):
    method __init__ (line 200) | def __init__(self, cfg):
    method forward (line 206) | def forward(self, x):
  class MoEFeedForward (line 213) | class MoEFeedForward(nn.Module):
    method __init__ (line 214) | def __init__(self, cfg):
    method forward (line 228) | def forward(self, x):
  class GroupedQueryAttention (line 266) | class GroupedQueryAttention(nn.Module):
    method __init__ (line 267) | def __init__(
    method forward (line 296) | def forward(self, x, mask, cos, sin):
  function compute_rope_params (line 384) | def compute_rope_params(head_dim, theta_base=10_000, context_length=4096...
  function apply_rope (line 406) | def apply_rope(x, cos, sin):
  class RMSNorm (line 427) | class RMSNorm(nn.Module):
    method __init__ (line 428) | def __init__(self, emb_dim, eps=1e-6, bias=False, qwen3_compatible=True):
    method forward (line 435) | def forward(self, x):
  function load_weights_into_qwen (line 451) | def load_weights_into_qwen(model, param_config, params):
  class Qwen3Tokenizer (line 575) | class Qwen3Tokenizer:
    method __init__ (line 588) | def __init__(self, tokenizer_file_path="tokenizer.json", repo_id=None,
    method encode (line 620) | def encode(self, text, chat_wrapped=None):
    method decode (line 639) | def decode(self, ids):
    method _wrap_chat (line 642) | def _wrap_chat(self, user_msg):
  function download_from_huggingface (line 653) | def download_from_huggingface(repo_id, filename, local_dir, revision="ma...
  function download_from_huggingface_from_snapshots (line 673) | def download_from_huggingface_from_snapshots(repo_id, local_dir):

FILE: pkg/llms_from_scratch/tests/test_appendix_a.py
  function test_dataset (line 13) | def test_dataset():

FILE: pkg/llms_from_scratch/tests/test_appendix_d.py
  function test_train (line 18) | def test_train(tmp_path):

FILE: pkg/llms_from_scratch/tests/test_appendix_e.py
  function test_train_classifier_lora (line 23) | def test_train_classifier_lora(tmp_path):

FILE: pkg/llms_from_scratch/tests/test_ch02.py
  function test_dataloader (line 16) | def test_dataloader(tmp_path, file_name):

FILE: pkg/llms_from_scratch/tests/test_ch03.py
  function test_mha (line 11) | def test_mha():

FILE: pkg/llms_from_scratch/tests/test_ch04.py
  function test_gpt_model_variants (line 29) | def test_gpt_model_variants(ModelClass, generate_fn):

FILE: pkg/llms_from_scratch/tests/test_ch05.py
  function test_train_simple (line 38) | def test_train_simple(tmp_path, ModelClass):

FILE: pkg/llms_from_scratch/tests/test_ch06.py
  function test_train_classifier (line 22) | def test_train_classifier(tmp_path):

FILE: pkg/llms_from_scratch/tests/test_ch07.py
  function test_instruction_finetune (line 19) | def test_instruction_finetune(tmp_path):

FILE: pkg/llms_from_scratch/tests/test_generate.py
  function test_dataloader (line 16) | def test_dataloader(tmp_path, file_name):

FILE: pkg/llms_from_scratch/tests/test_llama3.py
  class LitGPTRMSNorm (line 26) | class LitGPTRMSNorm(torch.nn.Module):
    method __init__ (line 36) | def __init__(self, size: int, dim: int = -1, eps: float = 1e-6, add_un...
    method forward (line 43) | def forward(self, x: torch.Tensor) -> torch.Tensor:
    method reset_parameters (line 52) | def reset_parameters(self) -> None:
  function test_rope (line 60) | def test_rope():
  function test_grouped_query_attention_equivalence (line 157) | def test_grouped_query_attention_equivalence():
  function llama3_weights_path (line 194) | def llama3_weights_path(tmp_path_factory):
  function test_model_variants (line 212) | def test_model_variants(ModelClass, generate_fn, llama3_weights_path):
  function test_rmsnorm_equivalence (line 249) | def test_rmsnorm_equivalence():
  function test_llama3_base_equivalence_with_transformers (line 273) | def test_llama3_base_equivalence_with_transformers():

FILE: pkg/llms_from_scratch/tests/test_qwen3.py
  class Qwen3RMSNorm (line 37) | class Qwen3RMSNorm(nn.Module):
    method __init__ (line 40) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 48) | def forward(self, hidden_states):
    method extra_repr (line 56) | def extra_repr(self):
  function _hf_ids (line 63) | def _hf_ids(obj):
  function dummy_input (line 94) | def dummy_input():
  function dummy_cfg_base (line 100) | def dummy_cfg_base():
  function dummy_cfg_moe (line 118) | def dummy_cfg_moe(dummy_cfg_base):
  function test_dummy_qwen3_forward (line 129) | def test_dummy_qwen3_forward(dummy_cfg_base, dummy_input):
  function test_dummy_qwen3_moe_forward (line 138) | def test_dummy_qwen3_moe_forward(dummy_cfg_moe, dummy_input):
  function test_moe_forward_matches_reference (line 149) | def test_moe_forward_matches_reference(dummy_cfg_moe):
  function test_qwen3_kvcache_equivalence (line 180) | def test_qwen3_kvcache_equivalence(cfg_name, request):
  function test_rope (line 214) | def test_rope(context_len):
  function qwen3_weights_path (line 288) | def qwen3_weights_path(tmp_path_factory):
  function test_model_variants (line 302) | def test_model_variants(ModelClass, qwen3_weights_path, generate_fn):
  function test_model_KV_noKV (line 340) | def test_model_KV_noKV(qwen3_weights_path):
  function test_model_batched_KV (line 381) | def test_model_batched_KV(qwen3_weights_path):
  function test_rmsnorm_equivalence (line 444) | def test_rmsnorm_equivalence():
  function test_all_special_tokens_roundtrip (line 471) | def test_all_special_tokens_roundtrip(repo_id, tok_file):
  function test_chat_wrap_and_equivalence (line 523) | def test_chat_wrap_and_equivalence(add_gen, add_think):
  function test_multiturn_equivalence (line 573) | def test_multiturn_equivalence(repo_id, tok_file, add_gen, add_think):
  function test_tokenizer_equivalence (line 614) | def test_tokenizer_equivalence():
  function test_multiturn_prefix_stability (line 688) | def test_multiturn_prefix_stability(repo_id, tok_file, add_gen, add_think):
  function test_qwen3_base_equivalence_with_transformers (line 764) | def test_qwen3_base_equivalence_with_transformers():

FILE: pkg/llms_from_scratch/utils.py
  function _extract_imports (line 17) | def _extract_imports(src: str):
  function _extract_defs_and_classes_from_code (line 39) | def _extract_defs_and_classes_from_code(src):
  function import_definitions_from_notebook (line 110) | def import_definitions_from_notebook(nb_dir_or_path, notebook_name=None,...
  function download_file (line 153) | def download_file(url, out_dir="."):

FILE: setup/02_installing-python-libraries/python_environment_check.py
  function get_packages (line 20) | def get_packages(pkgs):
  function get_requirements_dict (line 66) | def get_requirements_dict():
  function check_packages (line 102) | def check_packages(reqs):
  function main (line 122) | def main():

FILE: setup/02_installing-python-libraries/tests.py
  function test_main (line 11) | def test_main(capsys):

Download .json

Condensed preview — 306 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (5,540K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/ask-a-question.md",
    "chars": 290,
    "preview": "---\nname: Ask a Question\nabout: Ask questions related to the book\ntitle: ''\nlabels: [question]\nassignees: rasbt\n\n---\n\nIf"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.yaml",
    "chars": 2607,
    "preview": "name: Bug Report\ndescription: Report errors related to the book content or code\ntitle: \"Description\"\nlabels: [bug]\nassig"
  },
  {
    "path": ".github/scripts/check_double_quotes.py",
    "chars": 4709,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt)\n# Source for \"Build a Reasoning Model (From"
  },
  {
    "path": ".github/workflows/basic-tests-latest-python.yml",
    "chars": 1372,
    "preview": "name: Test latest PyTorch-compatible Python version\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'    #"
  },
  {
    "path": ".github/workflows/basic-tests-linux-uv.yml",
    "chars": 2559,
    "preview": "name: Code tests Linux\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n      - '**"
  },
  {
    "path": ".github/workflows/basic-tests-macos-uv.yml",
    "chars": 1989,
    "preview": "name: Code tests macOS\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n      - '**"
  },
  {
    "path": ".github/workflows/basic-tests-old-pytorch.yml",
    "chars": 1585,
    "preview": "name: Test PyTorch 2.3 and 2.5\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'    # Run workflow for ch"
  },
  {
    "path": ".github/workflows/basic-tests-pip.yml",
    "chars": 1866,
    "preview": "name: Code tests (plain pip)\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n     "
  },
  {
    "path": ".github/workflows/basic-tests-pixi.yml",
    "chars": 1607,
    "preview": "name: Code tests (pixi)\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n      - '*"
  },
  {
    "path": ".github/workflows/basic-tests-pytorch-rc.yml",
    "chars": 1512,
    "preview": "name: Test latest PyTorch nightly / release candidate\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'   "
  },
  {
    "path": ".github/workflows/basic-tests-windows-uv-pip.yml",
    "chars": 2014,
    "preview": "name: Code tests Windows (uv/pip)\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n"
  },
  {
    "path": ".github/workflows/basic-tests-windows-uv-pip.yml.disabled",
    "chars": 2076,
    "preview": "name: Code tests Windows (uv/pip)\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n"
  },
  {
    "path": ".github/workflows/basic-tests-windows-uv.yml.disabled",
    "chars": 1902,
    "preview": "name: Code tests Windows (uv)\n\non:\n  push:\n    branches: [ main ]\n    paths:\n      - '**/*.py'\n      - '**/*.ipynb'\n    "
  },
  {
    "path": ".github/workflows/check-links.yml",
    "chars": 1485,
    "preview": "name: Check hyperlinks\n\non:\n  push:\n    branches:\n      - main\n  pull_request:\n    branches:\n      - main\n\njobs:\n  test:"
  },
  {
    "path": ".github/workflows/check-spelling-errors.yml",
    "chars": 637,
    "preview": "name: Spell Check\n\non:\n  push:\n    branches:\n      - main\n  pull_request:\n    branches:\n      - main\n\njobs:\n  spellcheck"
  },
  {
    "path": ".github/workflows/pep8-linter.yml",
    "chars": 583,
    "preview": "name: PEP8 Style checks\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n\njobs:\n  flake8:\n    "
  },
  {
    "path": ".gitignore",
    "chars": 8895,
    "preview": "# Reports\nreports/\n\n# Configs and keys\n.chainlit\nch05/07_gpt_to_llama/config.json\nch07/02_dataset-utilities/config.json\n"
  },
  {
    "path": ".gitmodules",
    "chars": 138,
    "preview": "[submodule \"reasoning-from-scratch\"]\n\tpath = reasoning-from-scratch\n\turl = https://github.com/rasbt/reasoning-from-scrat"
  },
  {
    "path": "CITATION.cff",
    "chars": 746,
    "preview": "cff-version: 1.2.0\nmessage: \"If you use this book or its accompanying code, please cite it as follows.\"\ntitle: \"Build A "
  },
  {
    "path": "LICENSE.txt",
    "chars": 11505,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "README.md",
    "chars": 18580,
    "preview": "# Build a Large Language Model (From Scratch)\n\nThis repository contains the code for developing, pretraining, and finetu"
  },
  {
    "path": "appendix-A/01_main-chapter-code/DDP-script-torchrun.py",
    "chars": 7113,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "appendix-A/01_main-chapter-code/DDP-script.py",
    "chars": 6963,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "appendix-A/01_main-chapter-code/README.md",
    "chars": 1214,
    "preview": "# Appendix A: Introduction to PyTorch\n\n### Main Chapter Code\n\n- [code-part1.ipynb](code-part1.ipynb) contains all the se"
  },
  {
    "path": "appendix-A/01_main-chapter-code/code-part1.ipynb",
    "chars": 31619,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"f896245e-57c4-48fd-854f-9e43f22e10c9\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "appendix-A/01_main-chapter-code/code-part2.ipynb",
    "chars": 12760,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"AAAnDw04iAm4\"\n   },\n   \"source\": [\n    \"<table"
  },
  {
    "path": "appendix-A/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 5083,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"<table style=\\\"width:100%\\\">\\n\",\n  "
  },
  {
    "path": "appendix-A/02_setup-recommendations/README.md",
    "chars": 193,
    "preview": "## Python and Environment Setup Recommendations\n\n\n\nPlease see the [README.md](../../setup/README.md) in the [setup](../."
  },
  {
    "path": "appendix-A/README.md",
    "chars": 284,
    "preview": "# Appendix A: Introduction to PyTorch\n\n&nbsp;\n## Main Chapter Code\n\n- [01_main-chapter-code](01_main-chapter-code) conta"
  },
  {
    "path": "appendix-B/README.md",
    "chars": 74,
    "preview": "# Appendix B: References and Further Reading\n\n\n\n- No code in this appendix"
  },
  {
    "path": "appendix-C/README.md",
    "chars": 563,
    "preview": "# Appendix C: Exercise Solutions\n\n\n\n- [Chapter 2 exercise solutions](../ch02/01_main-chapter-code/exercise-solutions.ipy"
  },
  {
    "path": "appendix-D/01_main-chapter-code/appendix-D.ipynb",
    "chars": 131693,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9a5936bd-af17-4a7e-a4d2-e910411708ea\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "appendix-D/01_main-chapter-code/previous_chapters.py",
    "chars": 11489,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "appendix-D/README.md",
    "chars": 140,
    "preview": "# Appendix D: Adding Bells and Whistles to the Training Loop\n\n- [01_main-chapter-code](01_main-chapter-code) contains th"
  },
  {
    "path": "appendix-E/01_main-chapter-code/appendix-E.ipynb",
    "chars": 81326,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c024bfa4-1a7a-4751-b5a1-827225a3478b\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "appendix-E/01_main-chapter-code/gpt_download.py",
    "chars": 6333,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "appendix-E/01_main-chapter-code/previous_chapters.py",
    "chars": 20536,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "appendix-E/README.md",
    "chars": 134,
    "preview": "# Appendix E: Parameter-efficient Finetuning with LoRA\n\n- [01_main-chapter-code](01_main-chapter-code) contains the main"
  },
  {
    "path": "ch01/README.md",
    "chars": 753,
    "preview": "# Chapter 1: Understanding Large Language Models\n\n\n&nbsp;\n## Main Chapter Code\n\nThere is no code in this chapter.\n\n\n&nbs"
  },
  {
    "path": "ch01/reading-recommendations.md",
    "chars": 5044,
    "preview": "# Recommendations for Getting the Most Out of a Technical Book\n\nBelow are a few notes I previously shared when readers a"
  },
  {
    "path": "ch02/01_main-chapter-code/README.md",
    "chars": 283,
    "preview": "# Chapter 2: Working with Text Data\n\n### Main Chapter Code\n\n- [ch02.ipynb](ch02.ipynb) contains all the code as it appea"
  },
  {
    "path": "ch02/01_main-chapter-code/ch02.ipynb",
    "chars": 59030,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d95f841a-63c9-41d4-aea1-496b3d2024dd\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/01_main-chapter-code/dataloader.ipynb",
    "chars": 5837,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6e2a4891-c257-4d6b-afb3-e8fef39d0437\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 9604,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"99311e42-8467-458d-b918-632c8840b96f\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/02_bonus_bytepair-encoder/README.md",
    "chars": 249,
    "preview": "# Chapter 2: Working with Text Data\n\n\n\n- [compare-bpe-tiktoken.ipynb](compare-bpe-tiktoken.ipynb) benchmarks various byt"
  },
  {
    "path": "ch02/02_bonus_bytepair-encoder/bpe_openai_gpt2.py",
    "chars": 6628,
    "preview": "# Source: https://github.com/openai/gpt-2/blob/master/src/encoder.py\n# License:\n# Modified MIT License\n\n# Software Copyr"
  },
  {
    "path": "ch02/02_bonus_bytepair-encoder/compare-bpe-tiktoken.ipynb",
    "chars": 15961,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c503e5ef-6bb4-45c3-ac49-0e016cedd8d0\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/02_bonus_bytepair-encoder/requirements-extra.txt",
    "chars": 35,
    "preview": "requests\ntqdm\ntransformers>=4.33.2\n"
  },
  {
    "path": "ch02/03_bonus_embedding-vs-matmul/README.md",
    "chars": 254,
    "preview": "# Chapter 2: Working with Text Data\n\n- [embeddings-and-linear-layers.ipynb](embeddings-and-linear-layers.ipynb) contains"
  },
  {
    "path": "ch02/03_bonus_embedding-vs-matmul/embeddings-and-linear-layers.ipynb",
    "chars": 13727,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ec7488a4-2d2a-48eb-ad8c-534a2974154b\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/04_bonus_dataloader-intuition/README.md",
    "chars": 209,
    "preview": "# Chapter 2: Working with Text Data\n\n- [dataloader-intuition.ipynb](dataloader-intuition.ipynb) contains optional (bonus"
  },
  {
    "path": "ch02/04_bonus_dataloader-intuition/dataloader-intuition.ipynb",
    "chars": 9818,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"d95f841a-63c9-41d4-aea1-496b3d2024dd\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/05_bpe-from-scratch/README.md",
    "chars": 534,
    "preview": "# Byte Pair Encoding (BPE) Tokenizer From Scratch\n\n- [bpe-from-scratch-simple.ipynb](bpe-from-scratch-simple.ipynb) cont"
  },
  {
    "path": "ch02/05_bpe-from-scratch/bpe-from-scratch-simple.ipynb",
    "chars": 34862,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9dec0dfb-3d60-41d0-a63a-b010dce67e32\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/05_bpe-from-scratch/bpe-from-scratch.ipynb",
    "chars": 55940,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"9dec0dfb-3d60-41d0-a63a-b010dce67e32\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch02/05_bpe-from-scratch/tests.py",
    "chars": 9819,
    "preview": "import os\nimport sys\nimport io\nimport nbformat\nimport types\nimport pytest\n\nimport tiktoken\n\n\ndef import_definitions_from"
  },
  {
    "path": "ch02/README.md",
    "chars": 1098,
    "preview": "# Chapter 2: Working with Text Data\n\n&nbsp;\n## Main Chapter Code\n\n- [01_main-chapter-code](01_main-chapter-code) contain"
  },
  {
    "path": "ch03/01_main-chapter-code/README.md",
    "chars": 307,
    "preview": "# Chapter 3: Coding Attention Mechanisms\n\n### Main Chapter Code\n\n- [ch03.ipynb](ch03.ipynb) contains all the code as it "
  },
  {
    "path": "ch03/01_main-chapter-code/ch03.ipynb",
    "chars": 72129,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1ae38945-39dd-45dc-ad4f-da7a4404241f\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch03/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 9435,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"78224549-3637-44b0-aed1-8ff889c65192\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch03/01_main-chapter-code/multihead-attention.ipynb",
    "chars": 12977,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"be16f748-e12a-44a9-ad2b-81e320efdac4\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch03/01_main-chapter-code/small-text-sample.txt",
    "chars": 1962,
    "preview": "Once upon a time in a quiet village nestled among rolling hills and whispering forests, there lived a young girl named E"
  },
  {
    "path": "ch03/02_bonus_efficient-multihead-attention/README.md",
    "chars": 938,
    "preview": "# More Efficient Multi-Head Attention Implementations\n\n- [mha-implementations.ipynb](mha-implementations.ipynb) contains"
  },
  {
    "path": "ch03/02_bonus_efficient-multihead-attention/mha-implementations.ipynb",
    "chars": 346879,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e2e65c03-36d4-413f-9b23-5cdd816729ab\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch03/02_bonus_efficient-multihead-attention/tests/test_mha_implementations.py",
    "chars": 1755,
    "preview": "from pathlib import Path\nimport torch\nimport pytest\n\n\nfrom llms_from_scratch.utils import import_definitions_from_notebo"
  },
  {
    "path": "ch03/03_understanding-buffers/README.md",
    "chars": 409,
    "preview": "# Understanding PyTorch Buffers\n\n- [understanding-buffers.ipynb](understanding-buffers.ipynb) explains the idea behind P"
  },
  {
    "path": "ch03/03_understanding-buffers/understanding-buffers.ipynb",
    "chars": 27660,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"Dlv8N4uWtXcN\"\n   },\n   \"source\": [\n    \"<table"
  },
  {
    "path": "ch03/README.md",
    "chars": 759,
    "preview": "# Chapter 3: Coding Attention Mechanisms\n\n&nbsp;\n## Main Chapter Code\n\n- [01_main-chapter-code](01_main-chapter-code) co"
  },
  {
    "path": "ch04/01_main-chapter-code/README.md",
    "chars": 545,
    "preview": "# Chapter 4: Implementing a GPT Model from Scratch To Generate Text\n\n### Main Chapter Code\n\n- [ch04.ipynb](ch04.ipynb) c"
  },
  {
    "path": "ch04/01_main-chapter-code/ch04.ipynb",
    "chars": 90351,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"08f4321d-d32a-4a90-bfc7-e923f316b2f8\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch04/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 14705,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ba450fb1-8a26-4894-ab7a-5d7bfefe90ce\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch04/01_main-chapter-code/gpt.py",
    "chars": 9675,
    "preview": "# This file collects all the relevant code that we covered thus far\n# throughout Chapters 2-4.\n# This file can be run as"
  },
  {
    "path": "ch04/01_main-chapter-code/previous_chapters.py",
    "chars": 4199,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/01_main-chapter-code/tests.py",
    "chars": 1353,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/02_performance-analysis/README.md",
    "chars": 363,
    "preview": "# Chapter 4: Implementing a GPT Model from Scratch To Generate Text\n\n- [flops-analysis.ipynb](flops-analysis.ipynb) anal"
  },
  {
    "path": "ch04/02_performance-analysis/flops-analysis.ipynb",
    "chars": 19099,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"FtQYMbLvgzO-\"\n   },\n   \"source\": [\n    \"<table"
  },
  {
    "path": "ch04/02_performance-analysis/requirements-extra.txt",
    "chars": 4,
    "preview": "thop"
  },
  {
    "path": "ch04/03_kv-cache/README.md",
    "chars": 13670,
    "preview": "# Bonus Material: KV Cache\n\n\n\n**This folder implements the addition of a KV cache to the GPT model.** \n\n&nbsp;\n## Overvi"
  },
  {
    "path": "ch04/03_kv-cache/gpt_ch04.py",
    "chars": 8974,
    "preview": "# This file collects all the relevant code that we covered thus far\n# throughout Chapters 3-4.\n# This file can be run as"
  },
  {
    "path": "ch04/03_kv-cache/gpt_with_kv_cache.py",
    "chars": 13441,
    "preview": "# This file collects all the relevant code that we covered thus far\n# throughout Chapters 3-4.\n# This file can be run as"
  },
  {
    "path": "ch04/03_kv-cache/gpt_with_kv_cache_optimized.py",
    "chars": 15949,
    "preview": "# This file collects all the relevant code that we covered thus far\n# throughout Chapters 3-4.\n# This file can be run as"
  },
  {
    "path": "ch04/03_kv-cache/tests.py",
    "chars": 6047,
    "preview": "# Code to test the GPT model implementation against the KV cache variants\n\nimport pytest\nimport torch\nimport tiktoken\n\nf"
  },
  {
    "path": "ch04/04_gqa/README.md",
    "chars": 5336,
    "preview": "# Grouped-Query Attention (GQA)\n\nThis bonus material illustrates the memory savings when using Grouped-Query Attention ("
  },
  {
    "path": "ch04/04_gqa/gpt_with_kv_gqa.py",
    "chars": 13853,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/04_gqa/gpt_with_kv_mha.py",
    "chars": 12901,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/04_gqa/memory_estimator_gqa.py",
    "chars": 3109,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/04_gqa/plot_memory_estimates_gqa.py",
    "chars": 2452,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/05_mla/README.md",
    "chars": 6196,
    "preview": "# Multi-Head Latent Attention (MLA)\n\nThis bonus material illustrates the memory savings when using Multi-Head Latent Att"
  },
  {
    "path": "ch04/05_mla/gpt_with_kv_mha.py",
    "chars": 12902,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/05_mla/gpt_with_kv_mla.py",
    "chars": 13217,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/05_mla/memory_estimator_mla.py",
    "chars": 4196,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/05_mla/plot_memory_estimates_mla.py",
    "chars": 2818,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/06_swa/README.md",
    "chars": 6064,
    "preview": "# Sliding Window Attention (SWA)\n\nThis bonus material illustrates the memory savings when using Sliding Window Attention"
  },
  {
    "path": "ch04/06_swa/gpt_with_kv_mha.py",
    "chars": 12901,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/06_swa/gpt_with_kv_swa.py",
    "chars": 15095,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/06_swa/memory_estimator_swa.py",
    "chars": 5781,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/06_swa/plot_memory_estimates_swa.py",
    "chars": 7140,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/07_moe/README.md",
    "chars": 7935,
    "preview": "# Mixture of Experts (MoE)\n\nThis bonus material illustrates the memory savings (per token) when using Mixture-of-Experts"
  },
  {
    "path": "ch04/07_moe/gpt_with_kv_ffn.py",
    "chars": 15493,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/07_moe/gpt_with_kv_moe.py",
    "chars": 18227,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/07_moe/memory_estimator_moe.py",
    "chars": 4275,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/07_moe/plot_memory_estimates_moe.py",
    "chars": 3911,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/08_deltanet/README.md",
    "chars": 18156,
    "preview": "# Gated DeltaNet for Linear Attention\n\nRecently, [Qwen3-Next](https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb"
  },
  {
    "path": "ch04/08_deltanet/plot_memory_estimates_gated_deltanet.py",
    "chars": 3528,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch04/README.md",
    "chars": 1942,
    "preview": "# Chapter 4: Implementing a GPT Model from Scratch to Generate Text\n\n&nbsp;\n## Main Chapter Code\n\n- [01_main-chapter-cod"
  },
  {
    "path": "ch05/01_main-chapter-code/README.md",
    "chars": 1022,
    "preview": "# Chapter 5: Pretraining on Unlabeled Data\n\n### Main Chapter Code\n\n- [ch05.ipynb](ch05.ipynb) contains all the code as i"
  },
  {
    "path": "ch05/01_main-chapter-code/ch05.ipynb",
    "chars": 140310,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"45398736-7e89-4263-89c8-92153baff553\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 33750,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ba450fb1-8a26-4894-ab7a-5d7bfefe90ce\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/01_main-chapter-code/gpt_download.py",
    "chars": 5972,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/01_main-chapter-code/gpt_generate.py",
    "chars": 11144,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/01_main-chapter-code/gpt_train.py",
    "chars": 8362,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/01_main-chapter-code/previous_chapters.py",
    "chars": 9905,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/01_main-chapter-code/tests.py",
    "chars": 3275,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/02_alternative_weight_loading/README.md",
    "chars": 792,
    "preview": "# Alternative Approaches to Loading Pretrained Weights\n\nThis folder contains alternative weight loading strategies in ca"
  },
  {
    "path": "ch05/02_alternative_weight_loading/weight-loading-hf-safetensors.ipynb",
    "chars": 11404,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6d6bc54f-2b16-4b0f-be69-957eed5d112f\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/02_alternative_weight_loading/weight-loading-hf-transformers.ipynb",
    "chars": 10773,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6d6bc54f-2b16-4b0f-be69-957eed5d112f\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/02_alternative_weight_loading/weight-loading-pytorch.ipynb",
    "chars": 10833,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"6d6bc54f-2b16-4b0f-be69-957eed5d112f\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/03_bonus_pretraining_on_gutenberg/README.md",
    "chars": 9593,
    "preview": "# Pretraining GPT on the Project Gutenberg Dataset\n\nThe code in this directory contains code for training a small GPT mo"
  },
  {
    "path": "ch05/03_bonus_pretraining_on_gutenberg/prepare_dataset.py",
    "chars": 3612,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/03_bonus_pretraining_on_gutenberg/pretraining_simple.py",
    "chars": 9547,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/03_bonus_pretraining_on_gutenberg/tests.py",
    "chars": 902,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/04_learning_rate_schedulers/README.md",
    "chars": 506,
    "preview": "# Adding Bells and Whistles to the Training Loop\n\nThe main chapter used a relatively simple training function to keep th"
  },
  {
    "path": "ch05/05_bonus_hparam_tuning/README.md",
    "chars": 488,
    "preview": "# Optimizing Hyperparameters for Pretraining\n\nThe [hparam_search.py](hparam_search.py) script, based on the extended tra"
  },
  {
    "path": "ch05/05_bonus_hparam_tuning/hparam_search.py",
    "chars": 7881,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/06_user_interface/README.md",
    "chars": 1523,
    "preview": "# Building a User Interface to Interact With the Pretrained LLM\n\n\n\nThis bonus folder contains code for running a ChatGPT"
  },
  {
    "path": "ch05/06_user_interface/app_orig.py",
    "chars": 2825,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/06_user_interface/app_own.py",
    "chars": 2618,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/06_user_interface/requirements-extra.txt",
    "chars": 15,
    "preview": "chainlit>=1.2.0"
  },
  {
    "path": "ch05/07_gpt_to_llama/README.md",
    "chars": 9396,
    "preview": "# Converting GPT to Llama\n\n\n\nThis folder contains code for converting the GPT implementation from chapter 4 and 5 to Met"
  },
  {
    "path": "ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb",
    "chars": 57706,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0_xya1nyDHfY\",\n   \"metadata\": {\n    \"id\": \"0_xya1nyDHfY\"\n   },\n "
  },
  {
    "path": "ch05/07_gpt_to_llama/converting-llama2-to-llama3.ipynb",
    "chars": 137047,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"0_xya1nyDHfY\",\n   \"metadata\": {\n    \"id\": \"0_xya1nyDHfY\"\n   },\n "
  },
  {
    "path": "ch05/07_gpt_to_llama/previous_chapters.py",
    "chars": 2607,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/07_gpt_to_llama/requirements-extra.txt",
    "chars": 99,
    "preview": "blobfile>=3.0.0\nhuggingface_hub>=0.24.7\nipywidgets>=8.1.2\nsafetensors>=0.4.4\nsentencepiece>=0.1.99\n"
  },
  {
    "path": "ch05/07_gpt_to_llama/standalone-llama32.ipynb",
    "chars": 67724,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/07_gpt_to_llama/tests/test-requirements-extra.txt",
    "chars": 35,
    "preview": "pytest>=8.1.1\ntransformers>=4.44.2\n"
  },
  {
    "path": "ch05/07_gpt_to_llama/tests/test_llama32_nb.py",
    "chars": 3751,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/07_gpt_to_llama/tests/tests_rope_and_parts.py",
    "chars": 13628,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/08_memory_efficient_weight_loading/README.md",
    "chars": 291,
    "preview": "# Memory-efficient Model Weight Loading\n\nThis folder contains code to illustrate how to load model weights more efficien"
  },
  {
    "path": "ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb",
    "chars": 29385,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"id\": \"1E_HhLEeYqFG\"\n   },\n   \"source\": [\n    \"<table"
  },
  {
    "path": "ch05/08_memory_efficient_weight_loading/previous_chapters.py",
    "chars": 6166,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/09_extending-tokenizers/README.md",
    "chars": 256,
    "preview": "# Extending the Tiktoken BPE Tokenizer with New Tokens\n\n- [extend-tiktoken.ipynb](extend-tiktoken.ipynb) contains option"
  },
  {
    "path": "ch05/09_extending-tokenizers/extend-tiktoken.ipynb",
    "chars": 23509,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"cbbc1fe3-bff1-4631-bf35-342e19c54cc0\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/10_llm-training-speed/00_orig.py",
    "chars": 18978,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/10_llm-training-speed/01_opt_single_gpu.py",
    "chars": 17877,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/10_llm-training-speed/02_opt_multi_gpu_ddp.py",
    "chars": 21836,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/10_llm-training-speed/README.md",
    "chars": 10001,
    "preview": "# PyTorch Performance Tips for Faster LLM Training\n\n\n\nNote that the book is written for education purposes, meaning the "
  },
  {
    "path": "ch05/11_qwen3/README.md",
    "chars": 17008,
    "preview": "# Qwen3 From Scratch\n\nThis [standalone-qwen3.ipynb](standalone-qwen3.ipynb) Jupyter notebook in this folder contains a f"
  },
  {
    "path": "ch05/11_qwen3/qwen3-chat-interface/README.md",
    "chars": 1588,
    "preview": "# Qwen3 From Scratch with Chat Interface\n\n\n\nThis bonus folder contains code for running a ChatGPT-like user interface to"
  },
  {
    "path": "ch05/11_qwen3/qwen3-chat-interface/qwen3-chat-interface-multiturn.py",
    "chars": 6071,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/11_qwen3/qwen3-chat-interface/qwen3-chat-interface.py",
    "chars": 4696,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/11_qwen3/qwen3-chat-interface/requirements-extra.txt",
    "chars": 136,
    "preview": "chainlit>=1.2.0\nhuggingface_hub>=0.34.4\nllms_from_scratch>=1.0.18  # to import code from this repo\nsafetensors>=0.6.2\nto"
  },
  {
    "path": "ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb",
    "chars": 46284,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/11_qwen3/standalone-qwen3-moe.ipynb",
    "chars": 43098,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/11_qwen3/standalone-qwen3-plus-kvcache.ipynb",
    "chars": 48814,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/11_qwen3/standalone-qwen3.ipynb",
    "chars": 46231,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/11_qwen3/tests/test_qwen3_kvcache_nb.py",
    "chars": 3832,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/11_qwen3/tests/test_qwen3_nb.py",
    "chars": 3819,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/12_gemma3/README.md",
    "chars": 2665,
    "preview": "# Gemma 3 270M From Scratch\n\nThis [standalone-gemma3.ipynb](standalone-gemma3.ipynb) Jupyter notebook in this folder con"
  },
  {
    "path": "ch05/12_gemma3/standalone-gemma3-plus-kvcache.ipynb",
    "chars": 50071,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/12_gemma3/standalone-gemma3.ipynb",
    "chars": 44507,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/12_gemma3/tests/test_gemma3_kv_nb.py",
    "chars": 3814,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/12_gemma3/tests/test_gemma3_nb.py",
    "chars": 3801,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/13_olmo3/README.md",
    "chars": 2796,
    "preview": "# Olmo 3 7B and 32B From Scratch\n\nThis [standalone-olmo3.ipynb](standalone-olmo3.ipynb) Jupyter notebook in this folder "
  },
  {
    "path": "ch05/13_olmo3/standalone-olmo3-plus-kv-cache.ipynb",
    "chars": 50412,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/13_olmo3/standalone-olmo3.ipynb",
    "chars": 45779,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/13_olmo3/tests/olmo3_layer_debugger.py",
    "chars": 9472,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/13_olmo3/tests/test_olmo3_kvcache_nb.py",
    "chars": 4387,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/13_olmo3/tests/test_olmo3_nb.py",
    "chars": 4373,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/14_ch05_with_other_llms/README.md",
    "chars": 151,
    "preview": "# Chapter 5 With Other LLMs\n\nThis folder contains code notebooks that swap in other LLMs (for example, Qwen3 and Llama 3"
  },
  {
    "path": "ch05/14_ch05_with_other_llms/ch05-llama32.ipynb",
    "chars": 92277,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"45398736-7e89-4263-89c8-92153baff553\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/14_ch05_with_other_llms/ch05-qwen3.ipynb",
    "chars": 91824,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"45398736-7e89-4263-89c8-92153baff553\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch05/15_tiny-aya/README.md",
    "chars": 4309,
    "preview": "# Tiny Aya 3.35B From Scratch\n\nTiny Aya is a new, \"small\" LLM by Cohere that is said to be the \"most capable multi-lingu"
  },
  {
    "path": "ch05/15_tiny-aya/standalone-tiny-aya-plus-kv-cache.ipynb",
    "chars": 70907,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/15_tiny-aya/standalone-tiny-aya.ipynb",
    "chars": 67742,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/15_tiny-aya/tests/test_tiny_aya_kvcache_nb.py",
    "chars": 3874,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/15_tiny-aya/tests/test_tiny_aya_nb.py",
    "chars": 3859,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/15_tiny-aya/tests/tiny_aya_layer_debugger.py",
    "chars": 7950,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/16_qwen3.5/README.md",
    "chars": 1435,
    "preview": "# Qwen3.5 0.8B From Scratch\n\nThis folder contains a from-scratch style implementation of [Qwen/Qwen3.5-0.8B](https://hug"
  },
  {
    "path": "ch05/16_qwen3.5/qwen3.5-plus-kv-cache.ipynb",
    "chars": 68585,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/16_qwen3.5/qwen3.5.ipynb",
    "chars": 63961,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"e1b280ab-b61f-4d1a-bf7e-44e5f9ed3a5c\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch05/16_qwen3.5/qwen3_5_transformers.py",
    "chars": 15501,
    "preview": "\"\"\"Qwen3.5 helper blocks copied from Hugging Face Transformers\n\nSource file:\nhttps://github.com/huggingface/transformers"
  },
  {
    "path": "ch05/16_qwen3.5/tests/qwen3_5_layer_debugger.py",
    "chars": 9657,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/16_qwen3.5/tests/test_qwen3_5_nb.py",
    "chars": 5649,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch05/README.md",
    "chars": 2397,
    "preview": "# Chapter 5: Pretraining on Unlabeled Data\n\n&nbsp;\n## Main Chapter Code\n\n- [01_main-chapter-code](01_main-chapter-code) "
  },
  {
    "path": "ch06/01_main-chapter-code/README.md",
    "chars": 952,
    "preview": "# Chapter 6: Finetuning for Classification\n\n### Main Chapter Code\n\n- [ch06.ipynb](ch06.ipynb) contains all the code as i"
  },
  {
    "path": "ch06/01_main-chapter-code/ch06.ipynb",
    "chars": 147630,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"c024bfa4-1a7a-4751-b5a1-827225a3478b\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch06/01_main-chapter-code/exercise-solutions.ipynb",
    "chars": 5129,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"ba450fb1-8a26-4894-ab7a-5d7bfefe90ce\",\n   \"metadata\": {},\n   \"so"
  },
  {
    "path": "ch06/01_main-chapter-code/gpt_class_finetune.py",
    "chars": 15379,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/01_main-chapter-code/gpt_download.py",
    "chars": 6333,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/01_main-chapter-code/load-finetuned-model.ipynb",
    "chars": 8258,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"id\": \"1545a16b-bc8d-4e49-b9a6-db6631e7483d\",\n   \"metadata\": {\n    \"id\""
  },
  {
    "path": "ch06/01_main-chapter-code/previous_chapters.py",
    "chars": 12067,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/01_main-chapter-code/tests.py",
    "chars": 597,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/02_bonus_additional-experiments/README.md",
    "chars": 11704,
    "preview": "# Additional Classification Finetuning Experiments\n\nThe table below adds experiments to answer additional questions abou"
  },
  {
    "path": "ch06/02_bonus_additional-experiments/additional_experiments.py",
    "chars": 26745,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/02_bonus_additional-experiments/gpt_download.py",
    "chars": 6333,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/02_bonus_additional-experiments/previous_chapters.py",
    "chars": 13790,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/03_bonus_imdb-classification/README.md",
    "chars": 7532,
    "preview": "# Additional Experiments Classifying the Sentiment of 50k IMDb Movie Reviews\n\n## Overview\n\nThis folder contains addition"
  },
  {
    "path": "ch06/03_bonus_imdb-classification/download_prepare_dataset.py",
    "chars": 3338,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/03_bonus_imdb-classification/gpt_download.py",
    "chars": 6333,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  },
  {
    "path": "ch06/03_bonus_imdb-classification/previous_chapters.py",
    "chars": 12102,
    "preview": "# Copyright (c) Sebastian Raschka under Apache License 2.0 (see LICENSE.txt).\n# Source for \"Build a Large Language Model"
  }
]

// ... and 106 more files (download for full content)

About this extraction

This page contains the full source code of the rasbt/LLMs-from-scratch GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 306 files (4.9 MB), approximately 1.3M tokens, and a symbol index with 1376 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo